CN111144462A

CN111144462A - Unknown individual identification method and device for radar signals

Info

Publication number: CN111144462A
Application number: CN201911296607.8A
Authority: CN
Inventors: 黄双双; 李臻; 单志林; 李立; 苏志杰; 胡佳
Original assignee: CETC 38 Research Institute
Current assignee: CETC 38 Research Institute
Priority date: 2019-12-16
Filing date: 2019-12-16
Publication date: 2020-05-12
Anticipated expiration: 2039-12-16
Also published as: CN111144462B

Abstract

The invention discloses a method and a device for identifying unknown individuals of radar signals, wherein the method comprises the following steps: constructing and storing a known class sample set N and an unknown class sample set UN; inputting each sample to be identified in a known class sample set N and each sample to be identified in an unknown class sample set UN into a coding network, extracting a characteristic vector of the sample to be identified and classifying the sample to be identified; generating an attention probability distribution vector according to the feature vector of the sample to be identified by using a DDPG algorithm; generating a conditional feature vector according to the feature vector and the attention probability distribution vector of the sample to be identified, and inputting the conditional feature vector into a decoding network to judge the category of the sample to be identified; carrying out network training in an encoding network, a decoding network and a DDPG algorithm; the invention has the advantages that: the unknown class signals are prevented from being recognized as the known class signals by mistake, and the known class signals and the unknown class signals are accurately separated.

Description

Unknown individual identification method and device for radar signals

Technical Field

The invention relates to the field of electronic reconnaissance, in particular to an unknown individual identification method and device for radar signals.

Background

In the deep learning-based classification network, the output type of the model is usually fixed, so that the class of the test data is also known when the model is trained. In a real application scenario, unknown classes which do not exist in the training process usually occur, and the traditional classification network cannot correctly classify the unknown classes which do not exist in the training process. The defect can cause the identification precision of the classification model under the real application environment to be greatly reduced, so that the problem of solving the identification of unknown classes is a key factor for improving the identification precision of the classification network. The existing solution methods for the problem of unknown class identification aiming at deep learning mainly comprise two methods, namely, adding an unknown sample class into a training set, and distinguishing the unknown class through the similarity between data to be detected and a known sample. And the second method is to separate unknown classes by extracting the intermediate layer characteristics of the classification network and performing cluster analysis by combining common machine learning methods (KNN, PCA, TSNE and other clustering methods).

The two methods have good effect on the identification of unknown classes in some application scenes, but have certain defects. The method is characterized in that the known classes and the unknown classes are distinguished by adding the unknown sample classes into the training set, the dependence on the unknown samples is large, the sample set is difficult to collect, all potential unknown classes are difficult to be included in the unknown classes, and when the model meets the classes which do not exist in the training set, the model cannot be correctly classified. In the second method, the intermediate layer characteristics of the classification network are extracted, and then the clustering analysis is performed by combining a common machine learning method, so that the unknown classes are distinguished, the good effect can be achieved only under the condition that the similarity between the unknown classes and the known classes is not high, the individual identification of radar signals is distinguished through slight changes of signals generated by the difference of radar hardware, the similarity of different radar individual signals is high, and different individuals are difficult to be completely distinguished by performing the clustering analysis on the intermediate characteristics of the classification network. Therefore, in summary, the existing solution to the problem of unknown class identification for deep learning has the problem that accurate classification cannot be performed, the advanced technology in the field of deep learning is deeply researched, and a better unknown class identification method is explored, so that the method has important significance for improving the accuracy of radar signal individual identification.

Disclosure of Invention

The technical problem to be solved by the present invention is how to provide a method and a device for identifying an unknown individual of a radar signal, so as to accurately separate a known signal from an unknown signal.

The invention solves the technical problems through the following technical means: a method of unknown individual identification of radar signals, the method comprising:

the method comprises the following steps: constructing and storing a known class sample set N and an unknown class sample set UN;

step two: inputting each sample to be identified in a known class sample set N and each sample to be identified in an unknown class sample set UN into a coding network, extracting a characteristic vector of the sample to be identified and classifying the sample to be identified;

step three: generating an attention probability distribution vector according to the feature vector of the sample to be identified by using a DDPG algorithm;

step four: generating a conditional feature vector according to the feature vector and the attention probability distribution vector of the sample to be identified, and inputting the conditional feature vector into a decoding network to judge the category of the sample to be identified;

step five: and carrying out network training in an encoding network, a decoding network and a DDPG algorithm.

The traditional encoding network and decoding network finish the correct recovery of the input signal, and the invention leads the encoding network and the decoding network to carry out selective recovery on the input signal by introducing an attention mechanism. Generating an attention probability distribution vector through a DDPG algorithm, generating a conditional feature vector according to the feature vector and the attention probability distribution vector of a sample to be identified, inputting the conditional feature vector into a decoding network for signal recovery, calculating the similarity between the output result of the decoding network and an input signal, wherein if the similarity is high, the input signal is a known signal, the class of the input signal is the class output by a classifier, and if the similarity is low, the input signal is an unknown class signal. By assigning different attentions to the feature vectors output by the coding network through an attentive mechanism, for the signals of the known class, the main features of the feature vectors can be strengthened, and the secondary features can be weakened, so that the signals of the known class can be correctly recovered. While for signals of unknown class their main characteristics are weakened, so that the unknown class signals cannot be recovered. Finally, the input unknown signals are effectively identified, the unknown signals are prevented from being identified as known signals by mistake, and the known signals and the unknown signals are accurately separated.

Preferably, the constructing the known class sample set N and the unknown class sample set UN includes: signals of N radars are collected, signals of K radars are used as known classes to form a known class sample set N, and signals of N-K radars are used as unknown classes to form an unknown class sample set UN.

Preferably, the second step includes: the encoding network comprises an input layer, an intermediate layer and an output layer, wherein a sample to be identified is input into the input layer, the intermediate layer consists of a one-dimensional convolution layer and a pooling layer, the convolution layer performs feature extraction on the sample to be identified, the pooling layer performs dimension reduction on the sample to be identified, the output layer outputs a feature vector h of the sample to be identified, the h is f (wx + b), w is the weight of the encoding network, b is the bias of the encoding network, x is the sample to be identified of an input one-dimensional sequence, after the feature vector h is obtained, the feature vector h is input into a classifier, and the category of the sample to be identified is obtained, wherein the classifier consists of a full-connection layer and a softmax classifier.

Preferably, the third step includes: constructing an actor network u (s; theta) based on a convolutional neural network by utilizing a DDPG algorithm^u) A critic network Q (s, a; theta^Q) Wherein s is the input of the operator network and the criticc network and is equal to the characteristic vector h, theta extracted by the coding network^uAs a weight parameter of the actor network, theta^QFor criticic networksThe weight parameter of (2); and obtaining an action a through action and random noise sampling generated by the operator network, wherein the action a is an attention probability distribution vector.

Preferably, the fourth step includes: and acquiring a condition characteristic vector according to a formula c h a, wherein c is the condition characteristic vector, inputting the acquired condition characteristic vector c into a decoding network, wherein the decoding network structure is symmetrical to the encoding network, the output result of the decoding network is x 'g (w' c + b '), w' is the weight of the decoding network, b 'is the offset of the decoding network, and x' is the output of the decoding network, and the type of the sample to be identified is judged according to the output of the decoding network.

Preferably, the step five includes: training the coding network and the decoding network by using a known class sample set N, and obtaining a loss function formula

Wherein, y_i

For the input i-th sample to be recognized, y_i' class of i-th sample to be recognized, x, for encoding network output_iIs the ith sample to be identified, m is the number of the samples to be identified, x_i' is the output of the ith sample to be identified in the decoding network; in the training process, the loss value L of m input samples is calculated every time₁Then updating the network parameters through a back propagation algorithm until the loss value L of the network₁If the number of the channels is not reduced, the training of the coding network and the decoding network is finished;

training of an actor network and a critic network, a target network, i.e. u (s'; theta), is first defined for the actor network and the critic network, respectively^u') and Q (s ', a '; theta^Q') s ' is the input of the target operator network and the target critical network, a ' is the action of the target operator network output, theta^u' is a weight parameter, θ, of the target actor network^QThe method comprises the steps that weight parameters of a target critic network are maintained to be fixed in a training process, a training sample set comprises an N sample set and an UN sample set, and an operator network generates an action according to an input state s provided by the coding network and a decoding networka, the coding network and the decoding network calculate a reward r according to a and provide a next input state s ', store the transition state (s, a, r, s') in a buffer, and collect m samples(s) from the buffer_i，a_i，r_i，s_i'), m 1, 2 …. m, critic network based on s in the transition state_iAnd the output a of the operator network_iCalculating a Q value, calculating a strategy gradient according to the Q value by the operator network and finishing updating the weight parameter of the operator network, wherein the formula is as follows:

where M is the number of samples taken from the buffer,

in order to be a strategy gradient, the gradient is determined,

is the sign of the gradient; the target actor network based on s in the transition state_i' obtaining action a_i', target critic network according to s_i' and a_iObtaining a target critic network output Q ', obtaining Q according to the critic network and Q ' obtained by the target critic network, calculating a time difference error L of the Q and the Q ', calculating a gradient according to the time difference error L, and updating a critic network weight according to the gradient, wherein the difference error expression is as follows:

wherein y is_iCalculated according to the following formula:

y_i＝r_i+γQ′(s_i+1，u′(s_i+1|Θ^u′)|Θ^Q′)

wherein r is_iThe reward value of the ith sample to be identified is y, the preset weight coefficient is y, Q is the critic network output value, and Q' is the target critic network output value.

Updating target operator network and target criti through soft update algorithmc, updating the weight parameters of the network in the following way: theta^Q′＝τΘ^Q+(1-τ)Θ^Q′

Θ^u′＝τΘ^u+(1-τ)Θ^u′

Tau is a weight coefficient and takes the value of 0.001;

and finishing the training of the network in the DDPG algorithm until the difference error is smaller than a preset value and is stable.

The invention also provides a device for identifying unknown individuals of radar signals, which comprises:

the sample set construction module is used for constructing and storing a known class sample set N and an unknown class sample set UN;

the extraction and classification module is used for inputting each sample to be identified in the known class sample set N and each sample to be identified in the unknown class sample set UN into a coding network, extracting the characteristic vector of the sample to be identified and classifying the sample to be identified;

the vector generation module is used for generating an attention probability distribution vector according to the feature vector of the sample to be identified by using a DDPG algorithm;

the class judgment module is used for generating a condition characteristic vector according to the characteristic vector and the attention probability distribution vector of the sample to be identified and inputting the condition characteristic vector into a decoding network to judge the class of the sample to be identified;

and the training module is used for carrying out network training in the coding network, the decoding network and the DDPG algorithm.

Preferably, the extraction and classification module is further configured to: the encoding network comprises an input layer, an intermediate layer and an output layer, wherein a sample to be identified is input into the input layer, the intermediate layer consists of a one-dimensional convolution layer and a pooling layer, the convolution layer performs feature extraction on the sample to be identified, the pooling layer performs dimension reduction on the sample to be identified, the output layer outputs a feature vector h of the sample to be identified, the h is f (wx + b), w is the weight of the encoding network, b is the bias of the encoding network, x is the sample to be identified of an input one-dimensional sequence, after the feature vector h is obtained, the feature vector h is input into a classifier, and the category of the sample to be identified is obtained, wherein the classifier consists of a full-connection layer and a softmax classifier.

Preferably, the vector generation module is further configured to: constructing an actor network u (s; theta) based on a convolutional neural network by utilizing a DDPG algorithm^u) A critic network Q (s, a; theta^Q) Wherein s is the input of the operator network and the criticc network and is equal to the characteristic vector h, theta extracted by the coding network^uAs a weight parameter of the actor network, theta^QA weight parameter for the critic network; and obtaining an action a through action and random noise sampling generated by the operator network, wherein the action a is an attention probability distribution vector.

Preferably, the category determination module further includes: and acquiring a condition characteristic vector according to a formula c h a, wherein c is the condition characteristic vector, inputting the acquired condition characteristic vector c into a decoding network, wherein the decoding network structure is symmetrical to the encoding network, the output result of the decoding network is x 'g (w' c + b '), w' is the weight of the decoding network, b 'is the offset of the decoding network, and x' is the output of the decoding network, and the type of the sample to be identified is judged according to the output of the decoding network.

Preferably, the training module is further configured to: training the coding network and the decoding network by using a known class sample set N, and obtaining a loss function formula

Wherein, y_iFor the input i-th sample to be recognized, y_i' class of i-th sample to be recognized, x, for encoding network output_iIs the ith sample to be identified, m is the number of the samples to be identified, x_i' is the output of the ith sample to be identified in the decoding network; in the training process, the loss value L of m input samples is calculated every time₁Then updating network parameters through a back propagation algorithm until the loss of the networkLoss value L₁If the number of the channels is not reduced, the training of the coding network and the decoding network is finished;

training of an actor network and a critic network, a target network, i.e. u (s'; theta), is first defined for the actor network and the critic network, respectively^u') and Q (s ', a '; theta^Q') s ' is the input of the target operator network and the target critical network, a ' is the action of the target operator network output, theta^u' is a weight parameter, θ, of the target actor network^QThe method comprises the steps of ' keeping parameters of a coding network fixed during training for weight parameters of a target critic network, enabling a training sample set to comprise an N sample set and an UN sample set, enabling an operator network to generate an action a according to an input state s provided by the coding network and a decoding network, enabling the coding network and the decoding network to calculate a reward r according to the action a and provide a next input state s ', storing a conversion state (s, a, r, s ') into a buffer, and collecting m samples(s) from the buffer_i，a_i，r_i，s_i'), m 1, 2 …. m, critic network based on s in the transition state_iAnd the output a of the operator network_iCalculating a Q value, calculating a strategy gradient according to the Q value by the operator network and finishing updating the weight parameter of the operator network, wherein the formula is as follows:

where M is the number of samples taken from the buffer,

in order to be a strategy gradient, the gradient is determined,

wherein y is_iCalculated according to the following formula:

y_i＝r_i+γQ′(s_i+1，u′(s_i+1|Θ^u′)|Θ^Q′)

wherein r is_iThe reward value of the ith sample to be identified is gamma, the gamma is a preset weight coefficient, Q is a criticic network output value, and Q' is a target criticic network output value;

updating the weight parameters of the target operator network and the target critical network by a soft update algorithm in the following way: theta^Q′＝τΘ^Q+(1-τ)Θ^Q′

Θ^u′＝τΘ^u+(1-τ)Θ^u′

Tau is a weight coefficient and takes the value of 0.001;

The invention has the advantages that: the traditional encoding network and decoding network finish the correct recovery of the input signal, and the invention leads the encoding network and the decoding network to carry out selective recovery on the input signal by introducing an attention mechanism. Generating an attention probability distribution vector through a DDPG algorithm, generating a conditional feature vector according to the feature vector and the attention probability distribution vector of a sample to be identified, inputting the conditional feature vector into a decoding network for signal recovery, calculating the similarity between the output result of the decoding network and an input signal, wherein if the similarity is high, the input signal is a known signal, the class of the input signal is the class output by a classifier, and if the similarity is low, the input signal is an unknown class signal. By assigning different attentions to the feature vectors output by the coding network through an attentive mechanism, for the known class of signals, the main features of the feature vectors are strengthened, and the secondary features are weakened, so that the known class of signals can be correctly recovered. While for signals of unknown class their main characteristics are weakened, so that the unknown class signals cannot be recovered. Finally, the input unknown signals are effectively identified, the unknown signals are prevented from being identified as known signals by mistake, and the known signals and the unknown signals are accurately separated.

Drawings

FIG. 1 is a flowchart of a method for identifying an unknown individual of a radar signal according to an embodiment of the present invention;

fig. 2 is a schematic block diagram of an unknown individual identification method of a radar signal according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the embodiments of the present invention, and it is obvious that the described embodiments are some embodiments of the present invention, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Example 1

As shown in fig. 1 and 2, a method of unknown individual identification of radar signals, the method comprising: step S1: constructing and storing a known class sample set N and an unknown class sample set UN, which specifically comprises the following steps: signals of N radars are collected, signals of K radars are used as known classes to form a known class sample set N, and signals of N-K radars are used as unknown classes to form an unknown class sample set UN.

Step S2: inputting each sample to be identified in a known class sample set N and each sample to be identified in an unknown class sample set UN into a coding network, extracting a characteristic vector of the sample to be identified and classifying the sample to be identified; the coding network extracts and classifies feature vectors, and belongs to the prior art, the coding network comprises an input layer, an intermediate layer and an output layer, a sample to be identified is input into the input layer, the intermediate layer consists of a one-dimensional convolution layer and a pooling layer, the convolution layer extracts features of the sample to be identified, the pooling layer reduces the dimension of the sample to be identified, the output layer outputs a feature vector h of the sample to be identified, h is f (wx + b), wherein f () is a function expression mode, h is f (wx + b) which represents h is a function related to x, w is the weight of the coding network, b is the offset of the coding network, x is the sample to be identified of an input one-dimensional sequence, and after the feature vector h is obtained, the feature vector h is input into a classifier to obtain the class of the sample to be identified, wherein the classifier consists of a full connection layer and a softmax classifier.

Step S3: generating an attention probability distribution vector according to a feature vector of a sample to be identified by using a DDPG (Deep Learning) algorithm of Deep Reinforcement Learning (Deep Learning); the main process is as follows: constructing an actor network u (s; theta) based on a convolutional neural network by utilizing a DDPG algorithm^u) A critic network Q (s, a; theta^Q) Wherein s is the input of the operator network and the criticc network and is equal to the characteristic vector h, theta extracted by the coding network^uAs a weight parameter of the actor network, theta^QA weight parameter for the critic network; and obtaining an action a through action and random noise sampling generated by the operator network, wherein the action a is an attention probability distribution vector. And the coding network calculates a corresponding reward r according to the a, feeds the reward r back to the criticc network, transfers the reward r to the next input state s ', and stores the conversion state (s, a, r, s') in a cache for network training.

Step S4: generating a conditional feature vector according to the feature vector and the attention probability distribution vector of the sample to be identified, and inputting the conditional feature vector into a decoding network to judge the category of the sample to be identified; the specific process is as follows: obtaining a condition characteristic vector according to a formula c h a, wherein c is the condition characteristic vector, inputting the obtained condition characteristic vector c into a decoding network, the structure of the decoding network is symmetrical to that of the encoding network, the output result of the decoding network is x '═ g (w' c + b '), wherein g () is a function expression mode, x' ═ g (w 'c + b') represents that x 'is a function related to c, w' is the weight of the decoding network, b 'is the offset of the decoding network, x' is the output of the decoding network, and the type of the sample to be identified is judged according to the output of the decoding network. Note that the attention probability distribution vector a corresponding to the maximum Q value obtained in step S3 is selected and multiplied by the feature vector h to obtain a conditional feature vector.

Step S5: and carrying out network training in an encoding network, a decoding network and a DDPG algorithm. The training process belongs to the existing common training mode, and the training process is simply introduced as follows: training the coding network and the decoding network by using a known class sample set N, and obtaining a loss function formula

Wherein, y_iFor the input i-th sample to be recognized, y_i' class of i-th sample to be recognized, x, for encoding network output_iIs the ith sample to be identified, m is the number of the samples to be identified, x_i' is the output of the ith sample to be identified in the decoding network; in the training process, the loss value L of m input samples is calculated every time₁Then updating the network parameters through a back propagation algorithm until the loss value L of the network₁If the number of the channels is not reduced, the training of the coding network and the decoding network is finished;

training of an actor network and a critic network, a target network, i.e. u (s'; theta), is first defined for the actor network and the critic network, respectively^u') and Q (s ', a '; theta^Q') s ' is the input of the target operator network and the target critical network, a ' is the action of the target operator network output, theta^u' is a weight parameter, θ, of the target actor network^QThe method comprises the steps of ' keeping parameters of a coding network fixed during training for weight parameters of a target critic network, enabling a training sample set to comprise an N sample set and an UN sample set, enabling an operator network to generate an action a according to an input state s provided by the coding network and a decoding network, enabling the coding network and the decoding network to calculate a reward r according to the action a and provide a next input state s ', storing a conversion state (s, a, r, s ') into a buffer, and collecting m samples(s) from the buffer_i，a_i，r_i，s_i'), m 1, 2 …. m, critic network based on s in the transition state_iAnd the output a of the operator network_iCalculating Q value, and calculating strategy by operator network according to Q valueGraduating and completing the updating of the weighting parameters of the operator network, and the formula is as follows:

where i represents the ith sample, si is the input state of the ith sample, a_iRepresenting the output of the operator network in the ith sample, M is the number of samples taken from the buffer,

in order to be a strategy gradient, the gradient is determined,

is the sign of the gradient; the target actor network based on s in the transition state_i' obtaining action a_i', target critic network according to s_i' and a_iObtaining a target critic network output Q', calculating a time difference error L of the Q obtained by the critic network and the Q obtained by the target critic network, calculating a gradient according to the time difference error L, and updating a critic network weight according to the gradient L, wherein the difference error expression is as follows:

wherein y is_iCalculated according to the following formula:

y_i＝r_i+γQ′(s_i+1，u′(s_i+1|Θ^u′)|Θ^Q′)

Updating the weight parameters of the target operator network and the target critical network by a soft update algorithm in the following way:

Θ^Q′＝τΘ^Q+(1-τ)Θ^Q′

Θ^u′＝τΘ^u+(1-τ)Θ^u′

tau is a weight coefficient and generally takes a value of 0.001;

The working principle of the invention is as follows: as shown in fig. 2, a signal of a sample to be recognized first passes through a coding network, implicit features of the signal are automatically extracted, an obtained feature vector of an input signal is input to a classifier on one hand, and the signal is classified, which is consistent with a conventional classification network process. On the other hand, the conditional feature vector is obtained by multiplying the attention probability distribution vector generated by reinforcement learning, and the generated conditional feature vector is input to a decoding network to recover the input signal. And calculating the similarity between the output result of the decoding network and the original input signal, wherein if the similarity is high, the input signal is a known signal, the class of the input signal is the class output by the classifier, and if the similarity is low, the input signal is an unknown class signal. The reinforcement learning agent generates an attention probability distribution vector according to the characteristic vector of the input signal, judges the type of the input signal according to the output of the decoding network, feeds a reward of reinforcement learning back to +1 if the judgment is correct, otherwise, the reinforcement learning agent adjusts parameters dynamically according to the obtained reward to achieve the optimal generated attention probability distribution vector, wherein the reward is-1.

According to the technical scheme, the unknown individual identification method and the unknown individual identification device for the radar signal, provided by the invention, have the advantages that the output of the decoding network is effectively controlled by performing weighted output on the characteristic vector of the intermediate hidden layer between the traditional coding network and the traditional decoding network, when the input sample is of a known type existing in the training sample set, the decoding network can correctly decode the input sample, and when the input sample is of an unknown type not existing in the training sample set, the decoding network cannot correctly decode the input sample. Therefore, the input samples are classified through the coding network, and whether the input samples are unknown classes which do not exist in the training sample set is judged through the output result of the decoding network. Firstly, the signal to be classified automatically extracts the implicit characteristics of the signal through a coding network, and the obtained characteristic vector of the input sample is input into a classifier to classify the sample on one hand, and the process is consistent with the flow of the traditional classification network. On the other hand, the conditional feature vector is obtained by multiplying the attention probability distribution vector generated by reinforcement learning, and the generated conditional feature vector is input into a decoding network to recover the input sample. And calculating the similarity of the output result of the decoding network and the original input sample, wherein if the similarity is high, the input sample is a known signal, the class of the input sample is the class output by the classifier, and if the similarity is low, the input sample is an unknown class signal. The reinforcement learning agent generates an attention probability distribution vector according to the characteristic vector of the input signal, judges the type of the input sample according to the output of the decoding network, feeds back a reward of reinforcement learning to be +1 if the judgment is correct, otherwise, the reinforcement learning agent adjusts parameters dynamically according to the obtained reward to achieve the optimal generated attention probability distribution vector, wherein the reward is-1.

Example 2

Corresponding to embodiment 1 of the present invention, embodiment 2 of the present invention further provides an unknown individual identification device for a radar signal, including:

Specifically, the constructing of the known class sample set N and the unknown class sample set UN includes: signals of N radars are collected, signals of K radars are used as known classes to form a known class sample set N, and signals of N-K radars are used as unknown classes to form an unknown class sample set UN.

Specifically, the extraction and classification module is further configured to: the encoding network comprises an input layer, an intermediate layer and an output layer, wherein a sample to be identified is input into the input layer, the intermediate layer consists of a one-dimensional convolution layer and a pooling layer, the convolution layer performs feature extraction on the sample to be identified, the pooling layer performs dimension reduction on the sample to be identified, the output layer outputs a feature vector h of the sample to be identified, the h is f (wx + b), w is the weight of the encoding network, b is the bias of the encoding network, x is the sample to be identified of an input one-dimensional sequence, after the feature vector h is obtained, the feature vector h is input into a classifier, and the category of the sample to be identified is obtained, wherein the classifier consists of a full-connection layer and a softmax classifier.

Specifically, the vector generation module is further configured to: constructing an actor network u (s; theta) based on a convolutional neural network by utilizing a DDPG algorithm^u) A critic network Q (s, a; theta^Q) Wherein s is the input of the operator network and the criticc network and is equal to the characteristic vector h, theta extracted by the coding network^uAs a weight parameter of the actor network, theta^QA weight parameter for the critic network; and obtaining an action a through action and random noise sampling generated by the operator network, wherein the action a is an attention probability distribution vector.

Specifically, the category determination module further includes: and acquiring a condition characteristic vector according to a formula c h a, wherein c is the condition characteristic vector, inputting the acquired condition characteristic vector c into a decoding network, wherein the decoding network structure is symmetrical to the encoding network, the output result of the decoding network is x 'g (w' c + b '), w' is the weight of the decoding network, b 'is the offset of the decoding network, and x' is the output of the decoding network, and the type of the sample to be identified is judged according to the output of the decoding network.

Specifically, the training module is further configured to: training the coding network and the decoding network by using a known class sample set N, and obtaining a loss function formula

where M is the number of samples taken from the buffer,

in order to be a strategy gradient, the gradient is determined,

wherein y is_iCalculated according to the following formula:

y_i＝r_i+γQ′(s_i+1，u′(s_i+1|Θ^u′)|Θ^Q′)

Θ^u′＝τΘ^u+(1-τ)Θ^u′

Tau is a weight coefficient and takes the value of 0.001;

The above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A method of unknown individual identification of radar signals, the method comprising:

2. The method according to claim 1, wherein the constructing a sample set N of known classes and a sample set UN of unknown classes comprises: signals of N radars are collected, signals of K radars are used as known classes to form a known class sample set N, and signals of N-K radars are used as unknown classes to form an unknown class sample set UN.

3. The method of claim 1, wherein the second step comprises: the encoding network comprises an input layer, an intermediate layer and an output layer, wherein a sample to be identified is input into the input layer, the intermediate layer consists of a one-dimensional convolution layer and a pooling layer, the convolution layer performs feature extraction on the sample to be identified, the pooling layer performs dimension reduction on the sample to be identified, the output layer outputs a feature vector h of the sample to be identified, the h is f (wx + b), w is the weight of the encoding network, b is the bias of the encoding network, x is the sample to be identified of an input one-dimensional sequence, after the feature vector h is obtained, the feature vector h is input into a classifier, and the category of the sample to be identified is obtained, wherein the classifier consists of a full-connection layer and a softmax classifier.

4. The method of claim 3, wherein the third step comprises: constructing an actor network u (s; theta) based on a convolutional neural network by utilizing a DDPG algorithm^u) A critic network Q (s, a; theta^Q) Wherein s is the input of the operator network and the criticc network and is equal to the characteristic vector h, theta extracted by the coding network^uAs a weight parameter of the actor network, theta^QA weight parameter for the critic network; and obtaining an action a through action and random noise sampling generated by the operator network, wherein the action a is an attention probability distribution vector.

5. The method of claim 4, wherein the fourth step comprises: and acquiring a condition characteristic vector according to a formula c h a, wherein c is the condition characteristic vector, inputting the acquired condition characteristic vector c into a decoding network, wherein the decoding network structure is symmetrical to the encoding network, the output result of the decoding network is x 'g (w' c + b '), w' is the weight of the decoding network, b 'is the offset of the decoding network, and x' is the output of the decoding network, and the type of the sample to be identified is judged according to the output of the decoding network.

6. The method of claim 5, wherein the step five comprises: training the coding network and the decoding network by using a known class sample set N, and obtaining a loss function formula

Wherein, y_iFor the input i-th sample to be recognized, y_i' for the category of the ith sample to be identified of the coded network output,x_iis the ith sample to be identified, m is the number of the samples to be identified, x_i' is the output of the ith sample to be identified in the decoding network; in the training process, the loss value L of m input samples is calculated every time₁Then updating the network parameters through a back propagation algorithm until the loss value L of the network₁If the number of the channels is not reduced, the training of the coding network and the decoding network is finished;

training of an actor network and a critic network, a target network, i.e. u (s'; theta), is first defined for the actor network and the critic network, respectively^u′) And Q (s ', a'; theta^Q′) S 'is the input of the target operator network and the target critical network, a' is the action of the target operator network output, θ^u' is a weight parameter, θ, of the target actor network^QThe method comprises the steps of ' keeping parameters of a coding network fixed during training for weight parameters of a target critic network, enabling a training sample set to comprise an N sample set and an UN sample set, enabling an operator network to generate an action a according to an input state s provided by the coding network and a decoding network, enabling the coding network and the decoding network to calculate a reward r according to the action a and provide a next input state s ', storing a conversion state (s, a, r, s ') into a buffer, and collecting m samples(s) from the buffer_i，a_i，r_i，s_i'), m 1, 2 …. m, critic network based on s in the transition state_iAnd the output a of the operator network_iCalculating a Q value, calculating a strategy gradient according to the Q value by the operator network and finishing updating the weight parameter of the operator network, wherein the formula is as follows:

where M is the number of samples taken from the buffer,

in order to be a strategy gradient, the gradient is determined,

is the sign of the gradient; target operator network based on forwardingS in the change of state_i' obtaining action a_i', target critic network according to s_i' and a_iObtaining a target critic network output Q ', obtaining Q according to the critic network and Q ' obtained by the target critic network, calculating a time difference error L of the Q and the Q ', calculating a gradient according to the time difference error L, and updating a critic network weight according to the gradient, wherein the difference error expression is as follows:

wherein y is_iCalculated according to the following formula:

y_i＝r_i+γ^Q′(s_i+1，u′(s_i+1|Θ^u′)|Θ^Q′)

Θ^Q′＝τΘ^Q+(1-τ)Θ^Q′

Θ^u′＝τΘ^u+(1-τ)Θ^u′

tau is a weight coefficient and takes the value of 0.001;

7. An apparatus for unknown individual identification of radar signals, the apparatus comprising:

8. The apparatus according to claim 7, wherein the constructing of the known class sample set N and the unknown class sample set UN comprises: signals of N radars are collected, signals of K radars are used as known classes to form a known class sample set N, and signals of N-K radars are used as unknown classes to form an unknown class sample set UN.

9. The apparatus of claim 7, wherein the extraction classification module is further configured to: the encoding network comprises an input layer, an intermediate layer and an output layer, wherein a sample to be identified is input into the input layer, the intermediate layer consists of a one-dimensional convolution layer and a pooling layer, the convolution layer performs feature extraction on the sample to be identified, the pooling layer performs dimension reduction on the sample to be identified, the output layer outputs a feature vector h of the sample to be identified, the h is f (wx + b), w is the weight of the encoding network, b is the bias of the encoding network, x is the sample to be identified of an input one-dimensional sequence, after the feature vector h is obtained, the feature vector h is input into a classifier, and the category of the sample to be identified is obtained, wherein the classifier consists of a full-connection layer and a softmax classifier.

10. The apparatus of claim 9, wherein the vector generation module is further configured to: constructing an actor network u (s; theta) based on a convolutional neural network by utilizing a DDPG algorithm^u) A critic network Q (s, a; theta^Q) Wherein s is actor network and criticc network, equal to the characteristic vector h, theta extracted by the coding network^uAs a weight parameter of the actor network, theta^QA weight parameter for the critic network; and obtaining an action a through action and random noise sampling generated by the operator network, wherein the action a is an attention probability distribution vector.