CN111144462B - Unknown individual identification method and device for radar signals - Google Patents

Unknown individual identification method and device for radar signals Download PDF

Info

Publication number
CN111144462B
CN111144462B CN201911296607.8A CN201911296607A CN111144462B CN 111144462 B CN111144462 B CN 111144462B CN 201911296607 A CN201911296607 A CN 201911296607A CN 111144462 B CN111144462 B CN 111144462B
Authority
CN
China
Prior art keywords
network
sample
identified
critic
actor
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911296607.8A
Other languages
Chinese (zh)
Other versions
CN111144462A (en
Inventor
黄双双
李臻
单志林
李立
苏志杰
胡佳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CETC 38 Research Institute
Original Assignee
CETC 38 Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by CETC 38 Research Institute filed Critical CETC 38 Research Institute
Priority to CN201911296607.8A priority Critical patent/CN111144462B/en
Publication of CN111144462A publication Critical patent/CN111144462A/en
Application granted granted Critical
Publication of CN111144462B publication Critical patent/CN111144462B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a method and a device for identifying unknown individuals of radar signals, wherein the method comprises the following steps: constructing and storing a known class sample set N and an unknown class sample set UN; inputting each sample to be identified in the known type sample set N and each sample to be identified in the unknown type sample set UN into a coding network, extracting the characteristic vector of the sample to be identified and classifying the sample to be identified; generating an attention probability distribution vector according to the feature vector of the sample to be identified by using a DDPG algorithm; generating a conditional feature vector according to the feature vector and the attention probability distribution vector of the sample to be identified, and inputting the conditional feature vector into a decoding network to judge the category of the sample to be identified; performing network training in a coding network, a decoding network and a DDPG algorithm; the invention has the advantages that: the method avoids the unknown class signal from being incorrectly recognized as a known class signal, and realizes the accurate separation of the known class signal and the unknown class signal.

Description

Unknown individual identification method and device for radar signals
Technical Field
The invention relates to the field of electronic reconnaissance, in particular to a method and a device for identifying unknown individuals of radar signals.
Background
Based on deep-learning classification networks, the output type of the model is usually fixed, so that the class of test data is also known when model training is performed. In a real application scene, unknown categories which do not exist in training usually appear, and the conventional classification network cannot accurately classify the unknown categories which do not exist in the training process. The defect can cause the recognition accuracy of the classification model in a real application environment to be greatly reduced, so that the problem of recognition of unknown categories is solved, and the problem is a key factor for improving the recognition accuracy of the classification network. The existing solutions to the problem of unknown class identification of deep learning mainly include two methods, namely adding unknown sample classes into a training set, and distinguishing the unknown classes through the similarity between data to be detected and known samples. And secondly, extracting middle layer characteristics of a classification network, and carrying out cluster analysis by combining a common machine learning method (KNN, PCA, TSNE and other clustering methods) to distinguish unknown classes.
The above two methods have good effects on identifying unknown categories in some application scenes, but have certain defects. According to the method, the known type and the unknown type are distinguished by adding the unknown sample type into the training set, the dependence on the unknown sample is large, the sample set is difficult to collect, all potential unknown types are difficult to include in the unknown type, and when the model encounters a type which does not exist in the training set, correct classification cannot be performed. The second method is to extract the middle layer characteristics of the classification network, then combine the common machine learning method to perform cluster analysis, distinguish the unknown class, and have better effect only under the condition of small similarity between the unknown class and the known class, while the individual identification of the radar signals is to distinguish the unknown class and the known class through the tiny change of the signals generated by the radar hardware difference, the similarity of the individual signals of different radars is large, and the different individuals are difficult to completely distinguish by performing cluster analysis on the middle characteristics of the classification network. Therefore, in summary, the existing solution to the unknown class recognition problem of deep learning has the problem that correct classification cannot be performed, the front technology in the deep learning field is studied deeply, and the search of a better unknown class recognition method has important significance for improving the individual recognition precision of radar signals.
Disclosure of Invention
The invention aims to provide a method and a device for identifying unknown individuals of radar signals so as to accurately separate known signals from unknown signals.
The invention solves the technical problems by the following technical means: a method of identifying an unknown individual of a radar signal, the method comprising:
step one: constructing and storing a known class sample set N and an unknown class sample set UN;
step two: inputting each sample to be identified in the known type sample set N and each sample to be identified in the unknown type sample set UN into a coding network, extracting the characteristic vector of the sample to be identified and classifying the sample to be identified;
step three: generating an attention probability distribution vector according to the feature vector of the sample to be identified by using a DDPG algorithm;
step four: generating a conditional feature vector according to the feature vector and the attention probability distribution vector of the sample to be identified, and inputting the conditional feature vector into a decoding network to judge the category of the sample to be identified;
step five: network training in the encoding network, decoding network, and DDPG algorithm is performed.
The traditional coding network and decoding network finish the correct recovery of the input signal, and the invention introduces a attention mechanism to enable the coding network and the decoding network to selectively recover the input signal. Generating attention probability distribution vectors through a DDPG algorithm, generating conditional feature vectors according to feature vectors of samples to be identified and the attention probability distribution vectors, inputting the conditional feature vectors into a decoding network for signal recovery, calculating the similarity between the output result of the decoding network and an input signal, wherein the similarity is high, the input signal is a known signal, the category of the input signal is the category output by a classifier, and the input signal is an unknown category signal if the similarity is low. The feature vectors output by the coding network are distributed with different attentions through the attentiveness mechanism, and for the signals of the known class, the main features of the feature vectors are enhanced, and the secondary features are weakened, so that the signals of the known class can be correctly recovered. Whereas for signals of unknown class, their main features are weakened so that the signals of unknown class cannot be recovered. Finally, the input unknown class signals are effectively identified, the unknown class signals are prevented from being incorrectly identified as a known class signal, and the known class signals and the unknown class signals are accurately separated.
Preferably, the constructing the known category sample set N and the unknown category sample set UN includes: collecting signals of N radars, taking the signals of K radars as known categories to form a known category sample set N, and taking the signals of N-K radars as unknown categories to form an unknown category sample set UN.
Preferably, the second step includes: the coding network comprises an input layer, a middle layer and an output layer, wherein the input layer inputs a sample to be identified, the middle layer consists of a one-dimensional convolution layer and a pooling layer, the convolution layer performs feature extraction on the sample to be identified, the pooling layer performs dimension reduction on the sample to be identified, the output layer outputs a feature vector h of the sample to be identified and h=f (wx+b), wherein w is the weight of the coding network, b is the bias of the coding network, x is the sample to be identified of an input one-dimensional sequence, after the feature vector h is obtained, the feature vector h is input into a classifier, and the class of the sample to be identified is obtained, wherein the classifier consists of a full connection layer and a softmax classifier.
Preferably, the third step includes: utilizing DDPG algorithm to build actor network u (s; theta) based on convolutional neural network u ) The critic network Q (s, a; θ Q ) Wherein s is the input of the actor network and the critic network and is equal to the characteristic vector h, theta extracted by the coding network u Is the weight parameter of the actor network, θ Q The weight parameter is the weight parameter of the critic network; and obtaining an action a through action and random noise sampling generated by an actor network, wherein the action a is the attention probability distribution vector.
Preferably, the fourth step includes: obtaining a conditional feature vector according to a formula c=h×a, wherein c is a conditional feature vector, inputting the obtained conditional feature vector c into a decoding network, wherein the decoding network structure is symmetrical to the encoding network, and the output result of the decoding network is x '=g (w' c+b '), wherein w' is the weight of the decoding network, b 'is the bias of the decoding network, x' is the output of the decoding network, and determining the type of the sample to be identified according to the output of the decoding network.
Preferably, the fifth step includes: training the coding network and the decoding network by using a known class sample set N, wherein a loss function formula is as followsWherein y is i
For the input real class of the ith sample to be identified, y i ' is the class of the ith sample to be identified output by the coding network, x i For the ith sample to be identified, m is the number of samples to be identified, x i ' is the output of the ith sample to be identified in the decoding network; in the training process, the loss value L of m input samples is calculated each time 1 Then updating network parameters by a back propagation algorithm until the loss value L of the network 1 If the network is not descending any more, the training of the coding network and the decoding network is completed;
training of an actor network and a critic network, first, defining a target network, i.e., u (s'; theta, for the actor network and the critic network, respectively u ') and Q (s ', a '; θ Q ' s ' is the input of the target actor network and the target critic network, a ' is the output action of the target actor network, and theta u ' weight parameter of target actor network, θ Q The method comprises the steps of keeping parameters of a coding network fixed in a training process, wherein the parameters of the coding network are kept to be the weight parameters of the target critic network, a training sample set comprises an N sample set and a UN sample set, an actor network generates an action a according to an input state s provided by the coding network and a decoding network, the coding network and the decoding network calculate a reward r according to the a and provide a next input state s ', the conversion states (s, a, r, s') are stored in a buffer, and m samples (s i ,a i ,r i ,s i '), m=1, 2 … m, critic network according to s in the transition state i And output a of actor network i Calculating a Q value, and calculating a strategy gradient and finishing updating of weight parameters of the actor network by the actor network according to the Q value, wherein the formula is as follows:
where M is the number of samples taken from the buffer,for policy gradient +.>Is a gradient sign; the target actor network is based on s in the transition state i ' get action a i ' target critic network according to s i ' and a i 'obtaining a target critic network output Q', obtaining Q according to a critic network and Q 'obtained by the target critic network, calculating a time difference error L of the Q and the Q' according to the critic network, calculating a gradient according to the L, and updating a weight of the critic network, wherein the difference error expression is as follows:
wherein y is i Calculated according to the following formula:
y i =r i +γQ′(s i+1 ,u′(s i+1u′ )|Θ Q′ )
wherein r is i And gamma is a preset weight coefficient for the reward value of the ith sample to be identified, Q is a critic network output value, and Q' is a target critic network output value.
The weight parameters of the target actor network and the target critic network are updated through a soft update algorithm, and the updating mode is as follows: theta (theta) Q′ =τΘ Q +(1-τ)Θ Q′
Θ u′ =τΘ u +(1-τ)Θ u′
τ is a weight coefficient, and the value is 0.001;
and (3) until the differential error is smaller than a preset value and stable, completing the training of the network in the DDPG algorithm.
The invention also provides an unknown individual identification device of the radar signal, which comprises:
the sample set construction module is used for constructing and storing a known type sample set N and an unknown type sample set UN;
the extraction and classification module is used for inputting each sample to be identified in the known class sample set N and each sample to be identified in the unknown class sample set UN into the coding network, extracting the feature vector of the sample to be identified and classifying the sample to be identified;
the vector generation module is used for generating an attention probability distribution vector according to the feature vector of the sample to be identified by using a DDPG algorithm;
the class judging module is used for generating a conditional feature vector according to the feature vector and the attention probability distribution vector of the sample to be identified and inputting the conditional feature vector into the decoding network to judge the class of the sample to be identified;
and the training module is used for carrying out network training in the encoding network, the decoding network and the DDPG algorithm.
Preferably, the constructing the known category sample set N and the unknown category sample set UN includes: collecting signals of N radars, taking the signals of K radars as known categories to form a known category sample set N, and taking the signals of N-K radars as unknown categories to form an unknown category sample set UN.
Preferably, the extraction classification module is further configured to: the coding network comprises an input layer, a middle layer and an output layer, wherein the input layer inputs a sample to be identified, the middle layer consists of a one-dimensional convolution layer and a pooling layer, the convolution layer performs feature extraction on the sample to be identified, the pooling layer performs dimension reduction on the sample to be identified, the output layer outputs a feature vector h of the sample to be identified and h=f (wx+b), wherein w is the weight of the coding network, b is the bias of the coding network, x is the sample to be identified of an input one-dimensional sequence, after the feature vector h is obtained, the feature vector h is input into a classifier, and the class of the sample to be identified is obtained, wherein the classifier consists of a full connection layer and a softmax classifier.
Preferably, the vector generation module is further configured to: utilizing DDPG algorithm to build actor network u (s; theta) based on convolutional neural network u ) The critic network Q (s, a; θ Q ) Wherein s is acthe inputs of the tor network and the critic network are equal to the eigenvectors h, theta extracted by the coding network u Is the weight parameter of the actor network, θ Q The weight parameter is the weight parameter of the critic network; and obtaining an action a through action and random noise sampling generated by an actor network, wherein the action a is the attention probability distribution vector.
Preferably, the category determination module further includes: obtaining a conditional feature vector according to a formula c=h×a, wherein c is a conditional feature vector, inputting the obtained conditional feature vector c into a decoding network, wherein the decoding network structure is symmetrical to the encoding network, and the output result of the decoding network is x '=g (w' c+b '), wherein w' is the weight of the decoding network, b 'is the bias of the decoding network, x' is the output of the decoding network, and determining the type of the sample to be identified according to the output of the decoding network.
Preferably, the training module is further configured to: training the coding network and the decoding network by using a known class sample set N, wherein a loss function formula is as followsWherein y is i For the input real class of the ith sample to be identified, y i ' is the class of the ith sample to be identified output by the coding network, x i For the ith sample to be identified, m is the number of samples to be identified, x i ' is the output of the ith sample to be identified in the decoding network; in the training process, the loss value L of m input samples is calculated each time 1 Then updating network parameters by a back propagation algorithm until the loss value L of the network 1 If the network is not descending any more, the training of the coding network and the decoding network is completed;
training of an actor network and a critic network, first, defining a target network, i.e., u (s'; theta, for the actor network and the critic network, respectively u ') and Q (s ', a '; θ Q ' s ' is the input of the target actor network and the target critic network, a ' is the output action of the target actor network, and theta u ' weight parameter of target actor network, θ Q ' weight parameter of target critic network, and in training process, parameter of coding network is kept fixedThe training sample set comprises N sample sets and UN sample sets, an action network generates an action a according to an input state s provided by an encoding network and a decoding network, the encoding network and the decoding network calculate a reward r according to the a and provide a next input state s ', the conversion states (s, a, r, s') are stored in a buffer, and m samples (s i ,a i ,r i ,s i '), m=1, 2 … m, critic network according to s in the transition state i And output a of actor network i Calculating a Q value, and calculating a strategy gradient and finishing updating of weight parameters of the actor network by the actor network according to the Q value, wherein the formula is as follows:
where M is the number of samples taken from the buffer,for policy gradient +.>Is a gradient sign; the target actor network is based on s in the transition state i ' get action a i ' target critic network according to s i ' and a i 'obtaining a target critic network output Q', obtaining Q according to a critic network and Q 'obtained by the target critic network, calculating a time difference error L of the Q and the Q' according to the critic network, calculating a gradient according to the L, and updating a weight of the critic network, wherein the difference error expression is as follows:
wherein y is i Calculated according to the following formula:
y i =r i +γQ′(s i+1 ,u′(s i+1u′ )|Θ Q′ )
wherein r is i Gamma for the prize value of the ith sample to be identifiedQ is a critic network output value, and Q' is a target critic network output value;
the weight parameters of the target actor network and the target critic network are updated through a soft update algorithm, and the updating mode is as follows: theta (theta) Q′ =τΘ Q +(1-τ)Θ Q′
Θ u′ =τΘ u +(1-τ)Θ u′
τ is a weight coefficient, and the value is 0.001;
and (3) until the differential error is smaller than a preset value and stable, completing the training of the network in the DDPG algorithm.
The invention has the advantages that: the traditional coding network and decoding network finish the correct recovery of the input signal, and the invention introduces a attention mechanism to enable the coding network and the decoding network to selectively recover the input signal. Generating attention probability distribution vectors through a DDPG algorithm, generating conditional feature vectors according to feature vectors of samples to be identified and the attention probability distribution vectors, inputting the conditional feature vectors into a decoding network for signal recovery, calculating the similarity between the output result of the decoding network and an input signal, wherein the similarity is high, the input signal is a known signal, the category of the input signal is the category output by a classifier, and the input signal is an unknown category signal if the similarity is low. The feature vectors output by the coding network are assigned different attentions through the attentiveness mechanism, and for signals of known classes, the main features of the feature vectors are enhanced and the secondary features are weakened, so that the signals of known classes can be correctly recovered. Whereas for signals of unknown class, their main features are weakened so that the signals of unknown class cannot be recovered. Finally, the input unknown class signals are effectively identified, the unknown class signals are prevented from being incorrectly identified as a known class signal, and the known class signals and the unknown class signals are accurately separated.
Drawings
FIG. 1 is a flow chart of a method for identifying unknown individuals of radar signals according to an embodiment of the present invention;
fig. 2 is a schematic block diagram of a method for identifying unknown individuals of radar signals according to an embodiment of the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions in the embodiments of the present invention will be clearly and completely described in the following in conjunction with the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Example 1
As shown in fig. 1 and 2, a method for identifying an unknown individual of a radar signal, the method comprising: step S1: the method comprises the steps of constructing and storing a known category sample set N and an unknown category sample set UN, specifically: collecting signals of N radars, taking the signals of K radars as known categories to form a known category sample set N, and taking the signals of N-K radars as unknown categories to form an unknown category sample set UN.
Step S2: inputting each sample to be identified in the known type sample set N and each sample to be identified in the unknown type sample set UN into a coding network, extracting the characteristic vector of the sample to be identified and classifying the sample to be identified; the method comprises the steps of extracting and classifying feature vectors of a coding network, wherein the coding network comprises an input layer, a middle layer and an output layer, the input layer inputs samples to be identified, the middle layer consists of a one-dimensional convolution layer and a pooling layer, the convolution layer performs feature extraction on the samples to be identified, the pooling layer performs dimension reduction on the samples to be identified, the output layer outputs feature vectors h of the samples to be identified and h=f (wx+b), f () is a function expression mode, h=f (wx+b) represents h is a function related to x, w is weight of the coding network, b is bias of the coding network, x is the samples to be identified of an input one-dimensional sequence, after the feature vectors h are obtained, the feature vectors h are input into a classifier, and the classes of the samples to be identified are obtained, wherein the classifier consists of a full connection layer and a softmax classifier.
Step S3: generating an attention probability distribution vector according to the feature vector of the sample to be identified by utilizing DDPG (Deep Deterministic Policy Gradient) algorithm of deep reinforcement learning (Deep Reinforcement Learning); the main process is as follows: utilizing DDPG algorithm to build actor network u (s; theta) based on convolutional neural network u ) The critic network Q (s, a; θ Q ) Wherein s is the input of the actor network and the critic network and is equal to the characteristic vector h, theta extracted by the coding network u Is the weight parameter of the actor network, θ Q The weight parameter is the weight parameter of the critic network; and obtaining an action a through action and random noise sampling generated by an actor network, wherein the action a is the attention probability distribution vector. The coding network calculates the corresponding rewards r according to a and feeds back to the critic network, and transfers to the next input state s ', and stores the transition states (s, a, r, s') in a cache for network training.
Step S4: generating a conditional feature vector according to the feature vector and the attention probability distribution vector of the sample to be identified, and inputting the conditional feature vector into a decoding network to judge the category of the sample to be identified; the specific process is as follows: obtaining a conditional feature vector according to a formula c=h×a, wherein c is a conditional feature vector, inputting the obtained conditional feature vector c into a decoding network, wherein the decoding network structure is symmetrical to the encoding network, the decoding network output result is x '=g (w' c+b '), wherein g () is a function expression mode, x' =g (w 'c+b') represents x 'is a function related to c, w' is a weight of the decoding network, b 'is a bias of the decoding network, x' is an output of the decoding network, and the type of a sample to be identified is determined according to the output of the decoding network. Note that, the conditional feature vector is obtained by multiplying the feature vector h by the attention probability distribution vector a corresponding to the maximum Q value obtained in step S3.
Step S5: network training in the encoding network, decoding network, and DDPG algorithm is performed. The training process belongs to the existing common training mode, and the training process is briefly introduced as follows: training the coding network and the decoding network by using a known class sample set N, wherein a loss function formula is as followsWherein y is i For the input real class of the ith sample to be identified, y i ' is the class of the ith sample to be identified output by the coding network, x i For the ith sample to be identified, m is the number of samples to be identified, x i ' is the output of the ith sample to be identified in the decoding network; in the training process, the loss value L of m input samples is calculated each time 1 Then updating network parameters by a back propagation algorithm until the loss value L of the network 1 If the network is not descending any more, the training of the coding network and the decoding network is completed;
training of an actor network and a critic network, first, defining a target network, i.e., u (s'; theta, for the actor network and the critic network, respectively u ') and Q (s ', a '; θ Q ' s ' is the input of the target actor network and the target critic network, a ' is the output action of the target actor network, and theta u ' weight parameter of target actor network, θ Q The method comprises the steps of keeping parameters of a coding network fixed in a training process, wherein the parameters of the coding network are kept to be the weight parameters of the target critic network, a training sample set comprises an N sample set and a UN sample set, an actor network generates an action a according to an input state s provided by the coding network and a decoding network, the coding network and the decoding network calculate a reward r according to the a and provide a next input state s ', the conversion states (s, a, r, s') are stored in a buffer, and m samples (s i ,a i ,r i ,s i '), m=1, 2 … m, critic network according to s in the transition state i And output a of actor network i Calculating a Q value, and calculating a strategy gradient and finishing updating of weight parameters of the actor network by the actor network according to the Q value, wherein the formula is as follows:
where i represents the ith sample, si is the input state of the ith sample, a i Representing the output of the actor network in the ith sample, M is the number of samples taken from the buffer,for policy gradient +.>Is a gradient sign; the target actor network is based on s in the transition state i ' get action a i ' target critic network according to s i ' and a i ' obtaining a target critic network output Q ', calculating a time difference error L of the Q obtained by the critic network and the Q ' obtained by the target critic network, calculating a gradient according to the L, and updating the weight of the critic network, wherein the difference error expression is as follows:
wherein y is i Calculated according to the following formula:
y i =r i +γQ′(s i+1 ,u′(s i+1u′ )|Θ Q′ )
wherein r is i And gamma is a preset weight coefficient for the reward value of the ith sample to be identified, Q is a critic network output value, and Q' is a target critic network output value.
The weight parameters of the target actor network and the target critic network are updated through a soft update algorithm, and the updating mode is as follows:
Θ Q′ =τΘ Q +(1-τ)Θ Q′
Θ u′ =τΘ u +(1-τ)Θ u′
τ is a weight coefficient, and is generally 0.001;
and (3) until the differential error is smaller than a preset value and stable, completing the training of the network in the DDPG algorithm.
The working principle of the invention is as follows: as shown in fig. 2, the signal of the sample to be identified is first passed through the coding network, the implicit characteristics of the signal are automatically extracted, the obtained characteristic vector of the input signal is input into the classifier on one hand, the signal is classified, and the process is consistent with the traditional classifying network flow. On the other hand, the conditional feature vector is obtained by multiplying the conditional feature vector by the attention probability distribution vector generated by reinforcement learning, and the generated conditional feature vector is input to a decoding network for recovering an input signal. And calculating the similarity between the output result of the decoding network and the original input signal, wherein if the similarity is high, the input signal is a known signal, the class of the input signal is the class output by the classifier, and if the similarity is low, the input signal is an unknown class signal. The intelligent agent for reinforcement learning generates attention probability distribution vectors according to the feature vectors of the input signals, judges the types of the input signals according to the output of the decoding network, feeds back the judgment to the reinforcement learning reward as +1 if the judgment is correct, otherwise, is-1, and dynamically adjusts parameters according to the obtained reward so as to achieve the optimal generated attention probability distribution vectors.
According to the method and the device for identifying the unknown individual of the radar signal, disclosed by the invention, the output of the decoding network is effectively controlled by carrying out weighted output on the characteristic vectors of the hidden layers in the middle of the traditional coding network and the decoding network, when an input sample is a known type existing in a training sample set, the decoding network can correctly decode the input sample, and when the input sample is an unknown type not existing in the training sample set, the decoding network cannot correctly decode the input sample. The input samples are classified by the encoding network and the output result of the decoding network is used to determine whether the input samples are of unknown classes that do not exist in the training sample set. Firstly, the signal to be classified passes through a coding network, the implicit characteristics of the signal are automatically extracted, the obtained characteristic vector of the input sample is input into a classifier on one hand, the sample is classified, and the process is consistent with the traditional classification network flow. On the other hand, the conditional feature vector is obtained by multiplying the conditional feature vector by the attention probability distribution vector generated by reinforcement learning, and the generated conditional feature vector is input to a decoding network for recovering input samples. And calculating the similarity between the output result of the decoding network and the original input sample, wherein if the similarity is high, the input sample is a known signal, the class of the input sample is the class output by the classifier, and if the similarity is low, the input sample is an unknown class signal. The intelligent agent for reinforcement learning generates attention probability distribution vectors according to the feature vectors of the input signals, judges the types of the input samples according to the output of the decoding network, feeds back the judgment to the reinforcement learning reward as +1 if the judgment is correct, otherwise, is-1, and dynamically adjusts parameters according to the obtained reward so as to achieve the optimal generated attention probability distribution vectors.
Example 2
Corresponding to embodiment 1 of the present invention, embodiment 2 of the present invention further provides an unknown individual identification device of a radar signal, the device including:
the sample set construction module is used for constructing and storing a known type sample set N and an unknown type sample set UN;
the extraction and classification module is used for inputting each sample to be identified in the known class sample set N and each sample to be identified in the unknown class sample set UN into the coding network, extracting the feature vector of the sample to be identified and classifying the sample to be identified;
the vector generation module is used for generating an attention probability distribution vector according to the feature vector of the sample to be identified by using a DDPG algorithm;
the class judging module is used for generating a conditional feature vector according to the feature vector and the attention probability distribution vector of the sample to be identified and inputting the conditional feature vector into the decoding network to judge the class of the sample to be identified;
and the training module is used for carrying out network training in the encoding network, the decoding network and the DDPG algorithm.
Specifically, the construction of the known category sample set N and the unknown category sample set UN includes: collecting signals of N radars, taking the signals of K radars as known categories to form a known category sample set N, and taking the signals of N-K radars as unknown categories to form an unknown category sample set UN.
Specifically, the extraction classification module is further configured to: the coding network comprises an input layer, a middle layer and an output layer, wherein the input layer inputs a sample to be identified, the middle layer consists of a one-dimensional convolution layer and a pooling layer, the convolution layer performs feature extraction on the sample to be identified, the pooling layer performs dimension reduction on the sample to be identified, the output layer outputs a feature vector h of the sample to be identified and h=f (wx+b), wherein w is the weight of the coding network, b is the bias of the coding network, x is the sample to be identified of an input one-dimensional sequence, after the feature vector h is obtained, the feature vector h is input into a classifier, and the class of the sample to be identified is obtained, wherein the classifier consists of a full connection layer and a softmax classifier.
Specifically, the vector generation module is further configured to: utilizing DDPG algorithm to build actor network u (s; theta) based on convolutional neural network u ) The critic network Q (s, a; θ Q ) Wherein s is the input of the actor network and the critic network and is equal to the characteristic vector h, theta extracted by the coding network u Is the weight parameter of the actor network, θ Q The weight parameter is the weight parameter of the critic network; and obtaining an action a through action and random noise sampling generated by an actor network, wherein the action a is the attention probability distribution vector.
Specifically, the category determination module further includes: obtaining a conditional feature vector according to a formula c=h×a, wherein c is a conditional feature vector, inputting the obtained conditional feature vector c into a decoding network, wherein the decoding network structure is symmetrical to the encoding network, and the output result of the decoding network is x '=g (w' c+b '), wherein w' is the weight of the decoding network, b 'is the bias of the decoding network, x' is the output of the decoding network, and determining the type of the sample to be identified according to the output of the decoding network.
Specifically, the training module is further configured to: training the coding network and the decoding network by using a known class sample set N, wherein a loss function formula is as followsWherein y is i For the input real class of the ith sample to be identified, y i ' is the class of the ith sample to be identified output by the coding network, x i For the ith sample to be identified, m is the number of samples to be identified, x i ' is the output of the ith sample to be identified in the decoding network; during training, each time an input is calculatedLoss value L of m samples of (2) 1 Then updating network parameters by a back propagation algorithm until the loss value L of the network 1 If the network is not descending any more, the training of the coding network and the decoding network is completed;
training of an actor network and a critic network, first, defining a target network, i.e., u (s'; theta, for the actor network and the critic network, respectively u ') and Q (s ', a '; θ Q ' s ' is the input of the target actor network and the target critic network, a ' is the output action of the target actor network, and theta u ' weight parameter of target actor network, θ Q The method comprises the steps of keeping parameters of a coding network fixed in a training process, wherein the parameters of the coding network are kept to be the weight parameters of the target critic network, a training sample set comprises an N sample set and a UN sample set, an actor network generates an action a according to an input state s provided by the coding network and a decoding network, the coding network and the decoding network calculate a reward r according to the a and provide a next input state s ', the conversion states (s, a, r, s') are stored in a buffer, and m samples (s i ,a i ,r i ,s i '), m=1, 2 … m, critic network according to s in the transition state i And output a of actor network i Calculating a Q value, and calculating a strategy gradient and finishing updating of weight parameters of the actor network by the actor network according to the Q value, wherein the formula is as follows:
where M is the number of samples taken from the buffer,for policy gradient +.>Is a gradient sign; the target actor network is based on s in the transition state i ' get action a i ' target critic network according to s i ' and a i 'get target critic network output Q', get Q and target critic network according to critic networkAnd calculating the time difference error L of the obtained Q 'and the obtained Q', calculating the gradient according to the L, and updating the weight of the critic network, wherein the difference error expression is as follows:
wherein y is i Calculated according to the following formula:
y i =r i +γQ′(s i+1 ,u′(s i+1u′ )|Θ Q′ )
wherein r is i Gamma is a preset weight coefficient for the reward value of the ith sample to be identified, Q is a critic network output value, and Q' is a target critic network output value;
the weight parameters of the target actor network and the target critic network are updated through a soft update algorithm, and the updating mode is as follows: theta (theta) Q′ =τΘ Q +(1-τ)Θ Q′
Θ u′ =τΘ u +(1-τ)Θ u′
τ is a weight coefficient, and the value is 0.001;
and (3) until the differential error is smaller than a preset value and stable, completing the training of the network in the DDPG algorithm.
The above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims (9)

1. A method for identifying an unknown individual of a radar signal, the method comprising:
step one: constructing and storing a known class sample set N and an unknown class sample set UN;
step two: inputting each sample to be identified in the known type sample set N and each sample to be identified in the unknown type sample set UN into a coding network, extracting the characteristic vector of the sample to be identified and classifying the sample to be identified;
step three: generating an attention probability distribution vector according to the feature vector of the sample to be identified by using a DDPG algorithm;
step four: generating a conditional feature vector according to the feature vector and the attention probability distribution vector of the sample to be identified, and inputting the conditional feature vector into a decoding network to judge the category of the sample to be identified;
step five: the network training in the encoding network, the decoding network and the DDPG algorithm is carried out, and the specific process is as follows: training the coding network and the decoding network by using a known class sample set N, wherein a loss function formula is as followsWherein y is i For the input real class of the ith sample to be identified, y i ' is the class of the ith sample to be identified output by the coding network, x i For the ith sample to be identified, m is the number of samples to be identified, x i ' is the output of the ith sample to be identified in the decoding network; in the training process, the loss value L of m input samples is calculated each time 1 Then updating network parameters by a back propagation algorithm until the loss value L of the network 1 If the network is not descending any more, the training of the coding network and the decoding network is completed;
training of an actor network and a critic network, first, defining a target network, i.e., u (s'; theta, for the actor network and the critic network, respectively u ') and Q (s ', a '; θ Q ' s ' is the input of the target actor network and the target critic network, a ' is the output action of the target actor network, and theta u ' weight parameter of target actor network, θ Q The 'weight parameter of the target critic network' is the weight parameter of the target critic network, in the training process, the parameter of the coding network is kept fixed, the training sample set comprises an N sample set and a UN sample set, and the actor network generates an action a and codes according to the input state s provided by the coding network and the decoding networkThe network and decoding network calculate the prize r from a and provide the next input state s ', store the transition states (s, a, r, s') in a buffer, collect m samples from the buffer (s i ,a i ,r i ,s i '), m=1, 2 … m, critic network according to s in the transition state i And output a of actor network i Calculating a Q value, and calculating a strategy gradient and finishing updating of weight parameters of the actor network by the actor network according to the Q value, wherein the formula is as follows:
where M is the number of samples taken from the buffer,for policy gradient +.>Is a gradient sign; the target actor network is based on s in the transition state i ' get action a i ' target critic network according to s i ' and a i 'obtaining a target critic network output Q', obtaining Q according to a critic network and Q 'obtained by the target critic network, calculating a time difference error L of the Q and the Q' according to the critic network, calculating a gradient according to the L, and updating a weight of the critic network, wherein the difference error expression is as follows:
wherein y is i Calculated according to the following formula:
y i =r i +γQ′(s i+1 ,u′(s i+1u′ )|Θ Q′ )
wherein r is i Gamma is a preset weight coefficient for the reward value of the ith sample to be identified, Q is a critic network output value, and Q' is a target critic network output value;
the weight parameters of the target actor network and the target critic network are updated through a soft update algorithm, and the updating mode is as follows:
Θ Q′ -τΘ Q +(1-τ)Θ Q′
Θ u′ =τΘ u +(1-τ)Θ u′
τ is a weight coefficient, and the value is 0.001;
and (3) until the differential error is smaller than a preset value and stable, completing the training of the network in the DDPG algorithm.
2. The method for identifying unknown individuals of radar signals according to claim 1, wherein said constructing a known class sample set N and an unknown class sample set UN comprises: collecting signals of N radars, taking the signals of K radars as known categories to form a known category sample set N, and taking the signals of N-K radars as unknown categories to form an unknown category sample set UN.
3. The method for identifying an unknown individual of a radar signal according to claim 1, wherein the second step includes: the coding network comprises an input layer, a middle layer and an output layer, wherein the input layer inputs a sample to be identified, the middle layer consists of a one-dimensional convolution layer and a pooling layer, the convolution layer performs feature extraction on the sample to be identified, the pooling layer performs dimension reduction on the sample to be identified, the output layer outputs a feature vector h of the sample to be identified and h=f (wx+b), wherein w is the weight of the coding network, b is the bias of the coding network, x is the sample to be identified of an input one-dimensional sequence, after the feature vector h is obtained, the feature vector h is input into a classifier, and the class of the sample to be identified is obtained, wherein the classifier consists of a full connection layer and a softmax classifier.
4. A method for identifying an unknown individual of a radar signal according to claim 3, wherein said step three comprises: utilizing DDPG algorithm to build actor network u (s; theta) based on convolutional neural network u ) The critic network Q (s, a;θ Q ) Wherein s is the input of the actor network and the critic network and is equal to the characteristic vector h, theta extracted by the coding network u Is the weight parameter of the actor network, θ Q The weight parameter is the weight parameter of the critic network; and obtaining an action a through action and random noise sampling generated by an actor network, wherein the action a is the attention probability distribution vector.
5. The method for identifying an unknown individual of a radar signal according to claim 4, wherein said step four comprises: obtaining a conditional feature vector according to a formula c=h×a, wherein c is a conditional feature vector, inputting the obtained conditional feature vector c into a decoding network, wherein the decoding network structure is symmetrical to the encoding network, and the output result of the decoding network is x '=g (w' c+b '), wherein w' is the weight of the decoding network, b 'is the bias of the decoding network, x' is the output of the decoding network, and determining the type of the sample to be identified according to the output of the decoding network.
6. An unknown individual identification device for radar signals, the device comprising:
the sample set construction module is used for constructing and storing a known type sample set N and an unknown type sample set UN;
the extraction and classification module is used for inputting each sample to be identified in the known class sample set N and each sample to be identified in the unknown class sample set UN into the coding network, extracting the feature vector of the sample to be identified and classifying the sample to be identified;
the vector generation module is used for generating an attention probability distribution vector according to the feature vector of the sample to be identified by using a DDPG algorithm;
the class judging module is used for generating a conditional feature vector according to the feature vector and the attention probability distribution vector of the sample to be identified and inputting the conditional feature vector into the decoding network to judge the class of the sample to be identified;
the training module is used for carrying out network training in an encoding network, a decoding network and a DDPG algorithm, and is specifically used for: training an encoding network and a decoding network using a set of known class samples NThe loss function formula isWherein y is i For the input real class of the ith sample to be identified, y i ' is the class of the ith sample to be identified output by the coding network, x i For the ith sample to be identified, m is the number of samples to be identified, x i ' is the output of the ith sample to be identified in the decoding network; in the training process, the loss value L of m input samples is calculated each time 1 Then updating network parameters by a back propagation algorithm until the loss value L of the network 1 If the network is not descending any more, the training of the coding network and the decoding network is completed;
training of an actor network and a critic network, first, defining a target network, i.e., u (s'; theta, for the actor network and the critic network, respectively u ') and Q (s ', a '; θ Q ' s ' is the input of the target actor network and the target critic network, a ' is the output action of the target actor network, and theta u ' weight parameter of target actor network, θ Q The method comprises the steps of keeping parameters of a coding network fixed in a training process, wherein the parameters of the coding network are kept to be the weight parameters of the target critic network, a training sample set comprises an N sample set and a UN sample set, an actor network generates an action a according to an input state s provided by the coding network and a decoding network, the coding network and the decoding network calculate a reward r according to the a and provide a next input state s ', the conversion states (s, a, r, s') are stored in a buffer, and m samples (s i ,a i ,r i ,s i '), m=1, 2 … m, critic network according to s in the transition state i And output a of actor network i Calculating a Q value, and calculating a strategy gradient and finishing updating of weight parameters of the actor network by the actor network according to the Q value, wherein the formula is as follows:
where M is the number of samples taken from the buffer,for policy gradient +.>Is a gradient sign; the target actor network is based on s in the transition state i ' get action a i ' target critic network according to s i ' and a i 'obtaining a target critic network output Q', obtaining Q according to a critic network and Q 'obtained by the target critic network, calculating a time difference error L of the Q and the Q' according to the critic network, calculating a gradient according to the L, and updating a weight of the critic network, wherein the difference error expression is as follows:
wherein y is i Calculated according to the following formula:
y i =r i +γQ′(s i+1 ,u′(s i+1u′ )|Θ Q′ )
wherein r is i For the reward value of the ith sample to be identified, gamma is a preset weight coefficient, Q is a critic network output value, and Q' is a target critic network output value;
the weight parameters of the target actor network and the target critic network are updated through a soft update algorithm, and the updating mode is as follows:
Θ Q′ =τΘ Q +(1-τ)Θ Q′
Θ u′ =τΘ u +(1-τ)Θ u′ τ is a weight coefficient, and the value is 0.001;
and (3) until the differential error is smaller than a preset value and stable, completing the training of the network in the DDPG algorithm.
7. An unknown individual identification device for radar signals according to claim 6, wherein said constructing a known class sample set N and an unknown class sample set UN comprises: collecting signals of N radars, taking the signals of K radars as known categories to form a known category sample set N, and taking the signals of N-K radars as unknown categories to form an unknown category sample set UN.
8. An unknown individual identification device for radar signals according to claim 6, wherein said extraction classification module is further configured to: the coding network comprises an input layer, a middle layer and an output layer, wherein the input layer inputs a sample to be identified, the middle layer consists of a one-dimensional convolution layer and a pooling layer, the convolution layer performs feature extraction on the sample to be identified, the pooling layer performs dimension reduction on the sample to be identified, the output layer outputs a feature vector h of the sample to be identified and h=f (wx+b), wherein w is the weight of the coding network, b is the bias of the coding network, x is the sample to be identified of an input one-dimensional sequence, after the feature vector h is obtained, the feature vector h is input into a classifier, and the class of the sample to be identified is obtained, wherein the classifier consists of a full connection layer and a softmax classifier.
9. An unknown individual identification device for radar signals according to claim 6, wherein said vector generation module is further adapted to: utilizing DDPG algorithm to build actor network u (s; theta) based on convolutional neural network u ) The critic network Q (s, a; θ Q ) Wherein s is the input of the actor network and the critic network and is equal to the characteristic vector h, theta extracted by the coding network u Is the weight parameter of the actor network, θ Q The weight parameter is the weight parameter of the critic network; and obtaining an action a through action and random noise sampling generated by an actor network, wherein the action a is the attention probability distribution vector.
CN201911296607.8A 2019-12-16 2019-12-16 Unknown individual identification method and device for radar signals Active CN111144462B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911296607.8A CN111144462B (en) 2019-12-16 2019-12-16 Unknown individual identification method and device for radar signals

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911296607.8A CN111144462B (en) 2019-12-16 2019-12-16 Unknown individual identification method and device for radar signals

Publications (2)

Publication Number Publication Date
CN111144462A CN111144462A (en) 2020-05-12
CN111144462B true CN111144462B (en) 2023-10-20

Family

ID=70518457

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911296607.8A Active CN111144462B (en) 2019-12-16 2019-12-16 Unknown individual identification method and device for radar signals

Country Status (1)

Country Link
CN (1) CN111144462B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112014821B (en) * 2020-08-27 2022-05-17 电子科技大学 Unknown vehicle target identification method based on radar broadband characteristics
CN113807243B (en) * 2021-09-16 2023-12-05 上海交通大学 Water obstacle detection system and method based on attention to unknown target
CN113792733B (en) * 2021-09-17 2023-07-21 平安科技(深圳)有限公司 Vehicle part detection method, system, electronic device and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109934269A (en) * 2019-02-25 2019-06-25 中国电子科技集团公司第三十六研究所 A kind of opener recognition methods of electromagnetic signal and device
CN110109109A (en) * 2019-04-26 2019-08-09 西安电子科技大学 HRRP target identification method based on multiresolution attention convolutional network

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11327156B2 (en) * 2018-04-26 2022-05-10 Metawave Corporation Reinforcement learning engine for a radar system
CN108932480B (en) * 2018-06-08 2022-03-15 电子科技大学 Distributed optical fiber sensing signal feature learning and classifying method based on 1D-CNN

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109934269A (en) * 2019-02-25 2019-06-25 中国电子科技集团公司第三十六研究所 A kind of opener recognition methods of electromagnetic signal and device
CN110109109A (en) * 2019-04-26 2019-08-09 西安电子科技大学 HRRP target identification method based on multiresolution attention convolutional network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
贾亚飞 ; 朱永利 ; 高佳程 ; 袁博 ; .基于样本加权FCM聚类的未知类别局部放电信号识别.电力自动化设备.2018,(12),全文. *

Also Published As

Publication number Publication date
CN111144462A (en) 2020-05-12

Similar Documents

Publication Publication Date Title
CN113378632B (en) Pseudo-label optimization-based unsupervised domain adaptive pedestrian re-identification method
CN108564129B (en) Trajectory data classification method based on generation countermeasure network
CN111144462B (en) Unknown individual identification method and device for radar signals
CN111832225A (en) Method for constructing driving condition of automobile
CN111652293B (en) Vehicle weight recognition method for multi-task joint discrimination learning
CN111444951B (en) Sample recognition model generation method, device, computer equipment and storage medium
CN111832615A (en) Sample expansion method and system based on foreground and background feature fusion
CN110716792B (en) Target detector and construction method and application thereof
CN111368920A (en) Quantum twin neural network-based binary classification method and face recognition method thereof
CN113326731A (en) Cross-domain pedestrian re-identification algorithm based on momentum network guidance
CN103366367A (en) Pixel number clustering-based fuzzy C-average value gray level image splitting method
CN114841257A (en) Small sample target detection method based on self-supervision contrast constraint
CN113096169B (en) Non-rigid multimode medical image registration model establishing method and application thereof
CN112738470B (en) Method for detecting parking in highway tunnel
CN111611877A (en) Age interference resistant face recognition method based on multi-temporal-spatial information fusion
CN116363712B (en) Palmprint palm vein recognition method based on modal informativity evaluation strategy
CN114842343A (en) ViT-based aerial image identification method
CN114119966A (en) Small sample target detection method based on multi-view learning and meta-learning
CN115690549A (en) Target detection method for realizing multi-dimensional feature fusion based on parallel interaction architecture model
CN112489689B (en) Cross-database voice emotion recognition method and device based on multi-scale difference countermeasure
CN112488160B (en) Model training method for image classification task
CN113283467A (en) Weak supervision picture classification method based on average loss and category-by-category selection
CN112528554A (en) Data fusion method and system suitable for multi-launch multi-source rocket test data
CN115511012B (en) Class soft label identification training method with maximum entropy constraint
CN116433909A (en) Similarity weighted multi-teacher network model-based semi-supervised image semantic segmentation method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant