CN111144462A - Unknown individual identification method and device for radar signals - Google Patents

Unknown individual identification method and device for radar signals Download PDF

Info

Publication number
CN111144462A
CN111144462A CN201911296607.8A CN201911296607A CN111144462A CN 111144462 A CN111144462 A CN 111144462A CN 201911296607 A CN201911296607 A CN 201911296607A CN 111144462 A CN111144462 A CN 111144462A
Authority
CN
China
Prior art keywords
network
sample
identified
input
sample set
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911296607.8A
Other languages
Chinese (zh)
Other versions
CN111144462B (en
Inventor
黄双双
李臻
单志林
李立
苏志杰
胡佳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CETC 38 Research Institute
Original Assignee
CETC 38 Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by CETC 38 Research Institute filed Critical CETC 38 Research Institute
Priority to CN201911296607.8A priority Critical patent/CN111144462B/en
Publication of CN111144462A publication Critical patent/CN111144462A/en
Application granted granted Critical
Publication of CN111144462B publication Critical patent/CN111144462B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a method and a device for identifying unknown individuals of radar signals, wherein the method comprises the following steps: constructing and storing a known class sample set N and an unknown class sample set UN; inputting each sample to be identified in a known class sample set N and each sample to be identified in an unknown class sample set UN into a coding network, extracting a characteristic vector of the sample to be identified and classifying the sample to be identified; generating an attention probability distribution vector according to the feature vector of the sample to be identified by using a DDPG algorithm; generating a conditional feature vector according to the feature vector and the attention probability distribution vector of the sample to be identified, and inputting the conditional feature vector into a decoding network to judge the category of the sample to be identified; carrying out network training in an encoding network, a decoding network and a DDPG algorithm; the invention has the advantages that: the unknown class signals are prevented from being recognized as the known class signals by mistake, and the known class signals and the unknown class signals are accurately separated.

Description

Unknown individual identification method and device for radar signals
Technical Field
The invention relates to the field of electronic reconnaissance, in particular to an unknown individual identification method and device for radar signals.
Background
In the deep learning-based classification network, the output type of the model is usually fixed, so that the class of the test data is also known when the model is trained. In a real application scenario, unknown classes which do not exist in the training process usually occur, and the traditional classification network cannot correctly classify the unknown classes which do not exist in the training process. The defect can cause the identification precision of the classification model under the real application environment to be greatly reduced, so that the problem of solving the identification of unknown classes is a key factor for improving the identification precision of the classification network. The existing solution methods for the problem of unknown class identification aiming at deep learning mainly comprise two methods, namely, adding an unknown sample class into a training set, and distinguishing the unknown class through the similarity between data to be detected and a known sample. And the second method is to separate unknown classes by extracting the intermediate layer characteristics of the classification network and performing cluster analysis by combining common machine learning methods (KNN, PCA, TSNE and other clustering methods).
The two methods have good effect on the identification of unknown classes in some application scenes, but have certain defects. The method is characterized in that the known classes and the unknown classes are distinguished by adding the unknown sample classes into the training set, the dependence on the unknown samples is large, the sample set is difficult to collect, all potential unknown classes are difficult to be included in the unknown classes, and when the model meets the classes which do not exist in the training set, the model cannot be correctly classified. In the second method, the intermediate layer characteristics of the classification network are extracted, and then the clustering analysis is performed by combining a common machine learning method, so that the unknown classes are distinguished, the good effect can be achieved only under the condition that the similarity between the unknown classes and the known classes is not high, the individual identification of radar signals is distinguished through slight changes of signals generated by the difference of radar hardware, the similarity of different radar individual signals is high, and different individuals are difficult to be completely distinguished by performing the clustering analysis on the intermediate characteristics of the classification network. Therefore, in summary, the existing solution to the problem of unknown class identification for deep learning has the problem that accurate classification cannot be performed, the advanced technology in the field of deep learning is deeply researched, and a better unknown class identification method is explored, so that the method has important significance for improving the accuracy of radar signal individual identification.
Disclosure of Invention
The technical problem to be solved by the present invention is how to provide a method and a device for identifying an unknown individual of a radar signal, so as to accurately separate a known signal from an unknown signal.
The invention solves the technical problems through the following technical means: a method of unknown individual identification of radar signals, the method comprising:
the method comprises the following steps: constructing and storing a known class sample set N and an unknown class sample set UN;
step two: inputting each sample to be identified in a known class sample set N and each sample to be identified in an unknown class sample set UN into a coding network, extracting a characteristic vector of the sample to be identified and classifying the sample to be identified;
step three: generating an attention probability distribution vector according to the feature vector of the sample to be identified by using a DDPG algorithm;
step four: generating a conditional feature vector according to the feature vector and the attention probability distribution vector of the sample to be identified, and inputting the conditional feature vector into a decoding network to judge the category of the sample to be identified;
step five: and carrying out network training in an encoding network, a decoding network and a DDPG algorithm.
The traditional encoding network and decoding network finish the correct recovery of the input signal, and the invention leads the encoding network and the decoding network to carry out selective recovery on the input signal by introducing an attention mechanism. Generating an attention probability distribution vector through a DDPG algorithm, generating a conditional feature vector according to the feature vector and the attention probability distribution vector of a sample to be identified, inputting the conditional feature vector into a decoding network for signal recovery, calculating the similarity between the output result of the decoding network and an input signal, wherein if the similarity is high, the input signal is a known signal, the class of the input signal is the class output by a classifier, and if the similarity is low, the input signal is an unknown class signal. By assigning different attentions to the feature vectors output by the coding network through an attentive mechanism, for the signals of the known class, the main features of the feature vectors can be strengthened, and the secondary features can be weakened, so that the signals of the known class can be correctly recovered. While for signals of unknown class their main characteristics are weakened, so that the unknown class signals cannot be recovered. Finally, the input unknown signals are effectively identified, the unknown signals are prevented from being identified as known signals by mistake, and the known signals and the unknown signals are accurately separated.
Preferably, the constructing the known class sample set N and the unknown class sample set UN includes: signals of N radars are collected, signals of K radars are used as known classes to form a known class sample set N, and signals of N-K radars are used as unknown classes to form an unknown class sample set UN.
Preferably, the second step includes: the encoding network comprises an input layer, an intermediate layer and an output layer, wherein a sample to be identified is input into the input layer, the intermediate layer consists of a one-dimensional convolution layer and a pooling layer, the convolution layer performs feature extraction on the sample to be identified, the pooling layer performs dimension reduction on the sample to be identified, the output layer outputs a feature vector h of the sample to be identified, the h is f (wx + b), w is the weight of the encoding network, b is the bias of the encoding network, x is the sample to be identified of an input one-dimensional sequence, after the feature vector h is obtained, the feature vector h is input into a classifier, and the category of the sample to be identified is obtained, wherein the classifier consists of a full-connection layer and a softmax classifier.
Preferably, the third step includes: constructing an actor network u (s; theta) based on a convolutional neural network by utilizing a DDPG algorithmu) A critic network Q (s, a; thetaQ) Wherein s is the input of the operator network and the criticc network and is equal to the characteristic vector h, theta extracted by the coding networkuAs a weight parameter of the actor network, thetaQFor criticic networksThe weight parameter of (2); and obtaining an action a through action and random noise sampling generated by the operator network, wherein the action a is an attention probability distribution vector.
Preferably, the fourth step includes: and acquiring a condition characteristic vector according to a formula c h a, wherein c is the condition characteristic vector, inputting the acquired condition characteristic vector c into a decoding network, wherein the decoding network structure is symmetrical to the encoding network, the output result of the decoding network is x 'g (w' c + b '), w' is the weight of the decoding network, b 'is the offset of the decoding network, and x' is the output of the decoding network, and the type of the sample to be identified is judged according to the output of the decoding network.
Preferably, the step five includes: training the coding network and the decoding network by using a known class sample set N, and obtaining a loss function formula
Figure BDA0002320720800000041
Wherein, yi
For the input i-th sample to be recognized, yi' class of i-th sample to be recognized, x, for encoding network outputiIs the ith sample to be identified, m is the number of the samples to be identified, xi' is the output of the ith sample to be identified in the decoding network; in the training process, the loss value L of m input samples is calculated every time1Then updating the network parameters through a back propagation algorithm until the loss value L of the network1If the number of the channels is not reduced, the training of the coding network and the decoding network is finished;
training of an actor network and a critic network, a target network, i.e. u (s'; theta), is first defined for the actor network and the critic network, respectivelyu') and Q (s ', a '; thetaQ') s ' is the input of the target operator network and the target critical network, a ' is the action of the target operator network output, thetau' is a weight parameter, θ, of the target actor networkQThe method comprises the steps that weight parameters of a target critic network are maintained to be fixed in a training process, a training sample set comprises an N sample set and an UN sample set, and an operator network generates an action according to an input state s provided by the coding network and a decoding networka, the coding network and the decoding network calculate a reward r according to a and provide a next input state s ', store the transition state (s, a, r, s') in a buffer, and collect m samples(s) from the bufferi,ai,ri,si'), m 1, 2 …. m, critic network based on s in the transition stateiAnd the output a of the operator networkiCalculating a Q value, calculating a strategy gradient according to the Q value by the operator network and finishing updating the weight parameter of the operator network, wherein the formula is as follows:
Figure BDA0002320720800000051
where M is the number of samples taken from the buffer,
Figure BDA0002320720800000052
in order to be a strategy gradient, the gradient is determined,
Figure BDA0002320720800000053
is the sign of the gradient; the target actor network based on s in the transition statei' obtaining action ai', target critic network according to si' and aiObtaining a target critic network output Q ', obtaining Q according to the critic network and Q ' obtained by the target critic network, calculating a time difference error L of the Q and the Q ', calculating a gradient according to the time difference error L, and updating a critic network weight according to the gradient, wherein the difference error expression is as follows:
Figure BDA0002320720800000054
wherein y isiCalculated according to the following formula:
yi=ri+γQ′(si+1,u′(si+1u′)|ΘQ′)
wherein r isiThe reward value of the ith sample to be identified is y, the preset weight coefficient is y, Q is the critic network output value, and Q' is the target critic network output value.
Updating target operator network and target criti through soft update algorithmc, updating the weight parameters of the network in the following way: thetaQ′=τΘQ+(1-τ)ΘQ′
Θu′=τΘu+(1-τ)Θu′
Tau is a weight coefficient and takes the value of 0.001;
and finishing the training of the network in the DDPG algorithm until the difference error is smaller than a preset value and is stable.
The invention also provides a device for identifying unknown individuals of radar signals, which comprises:
the sample set construction module is used for constructing and storing a known class sample set N and an unknown class sample set UN;
the extraction and classification module is used for inputting each sample to be identified in the known class sample set N and each sample to be identified in the unknown class sample set UN into a coding network, extracting the characteristic vector of the sample to be identified and classifying the sample to be identified;
the vector generation module is used for generating an attention probability distribution vector according to the feature vector of the sample to be identified by using a DDPG algorithm;
the class judgment module is used for generating a condition characteristic vector according to the characteristic vector and the attention probability distribution vector of the sample to be identified and inputting the condition characteristic vector into a decoding network to judge the class of the sample to be identified;
and the training module is used for carrying out network training in the coding network, the decoding network and the DDPG algorithm.
Preferably, the constructing the known class sample set N and the unknown class sample set UN includes: signals of N radars are collected, signals of K radars are used as known classes to form a known class sample set N, and signals of N-K radars are used as unknown classes to form an unknown class sample set UN.
Preferably, the extraction and classification module is further configured to: the encoding network comprises an input layer, an intermediate layer and an output layer, wherein a sample to be identified is input into the input layer, the intermediate layer consists of a one-dimensional convolution layer and a pooling layer, the convolution layer performs feature extraction on the sample to be identified, the pooling layer performs dimension reduction on the sample to be identified, the output layer outputs a feature vector h of the sample to be identified, the h is f (wx + b), w is the weight of the encoding network, b is the bias of the encoding network, x is the sample to be identified of an input one-dimensional sequence, after the feature vector h is obtained, the feature vector h is input into a classifier, and the category of the sample to be identified is obtained, wherein the classifier consists of a full-connection layer and a softmax classifier.
Preferably, the vector generation module is further configured to: constructing an actor network u (s; theta) based on a convolutional neural network by utilizing a DDPG algorithmu) A critic network Q (s, a; thetaQ) Wherein s is the input of the operator network and the criticc network and is equal to the characteristic vector h, theta extracted by the coding networkuAs a weight parameter of the actor network, thetaQA weight parameter for the critic network; and obtaining an action a through action and random noise sampling generated by the operator network, wherein the action a is an attention probability distribution vector.
Preferably, the category determination module further includes: and acquiring a condition characteristic vector according to a formula c h a, wherein c is the condition characteristic vector, inputting the acquired condition characteristic vector c into a decoding network, wherein the decoding network structure is symmetrical to the encoding network, the output result of the decoding network is x 'g (w' c + b '), w' is the weight of the decoding network, b 'is the offset of the decoding network, and x' is the output of the decoding network, and the type of the sample to be identified is judged according to the output of the decoding network.
Preferably, the training module is further configured to: training the coding network and the decoding network by using a known class sample set N, and obtaining a loss function formula
Figure BDA0002320720800000071
Wherein, yiFor the input i-th sample to be recognized, yi' class of i-th sample to be recognized, x, for encoding network outputiIs the ith sample to be identified, m is the number of the samples to be identified, xi' is the output of the ith sample to be identified in the decoding network; in the training process, the loss value L of m input samples is calculated every time1Then updating network parameters through a back propagation algorithm until the loss of the networkLoss value L1If the number of the channels is not reduced, the training of the coding network and the decoding network is finished;
training of an actor network and a critic network, a target network, i.e. u (s'; theta), is first defined for the actor network and the critic network, respectivelyu') and Q (s ', a '; thetaQ') s ' is the input of the target operator network and the target critical network, a ' is the action of the target operator network output, thetau' is a weight parameter, θ, of the target actor networkQThe method comprises the steps of ' keeping parameters of a coding network fixed during training for weight parameters of a target critic network, enabling a training sample set to comprise an N sample set and an UN sample set, enabling an operator network to generate an action a according to an input state s provided by the coding network and a decoding network, enabling the coding network and the decoding network to calculate a reward r according to the action a and provide a next input state s ', storing a conversion state (s, a, r, s ') into a buffer, and collecting m samples(s) from the bufferi,ai,ri,si'), m 1, 2 …. m, critic network based on s in the transition stateiAnd the output a of the operator networkiCalculating a Q value, calculating a strategy gradient according to the Q value by the operator network and finishing updating the weight parameter of the operator network, wherein the formula is as follows:
Figure BDA0002320720800000081
where M is the number of samples taken from the buffer,
Figure BDA0002320720800000082
in order to be a strategy gradient, the gradient is determined,
Figure BDA0002320720800000083
is the sign of the gradient; the target actor network based on s in the transition statei' obtaining action ai', target critic network according to si' and aiObtaining a target critic network output Q ', obtaining Q according to the critic network and Q ' obtained by the target critic network, calculating a time difference error L of the Q and the Q ', calculating a gradient according to the time difference error L, and updating a critic network weight according to the gradient, wherein the difference error expression is as follows:
Figure BDA0002320720800000084
wherein y isiCalculated according to the following formula:
yi=ri+γQ′(si+1,u′(si+1u′)|ΘQ′)
wherein r isiThe reward value of the ith sample to be identified is gamma, the gamma is a preset weight coefficient, Q is a criticic network output value, and Q' is a target criticic network output value;
updating the weight parameters of the target operator network and the target critical network by a soft update algorithm in the following way: thetaQ′=τΘQ+(1-τ)ΘQ′
Θu′=τΘu+(1-τ)Θu′
Tau is a weight coefficient and takes the value of 0.001;
and finishing the training of the network in the DDPG algorithm until the difference error is smaller than a preset value and is stable.
The invention has the advantages that: the traditional encoding network and decoding network finish the correct recovery of the input signal, and the invention leads the encoding network and the decoding network to carry out selective recovery on the input signal by introducing an attention mechanism. Generating an attention probability distribution vector through a DDPG algorithm, generating a conditional feature vector according to the feature vector and the attention probability distribution vector of a sample to be identified, inputting the conditional feature vector into a decoding network for signal recovery, calculating the similarity between the output result of the decoding network and an input signal, wherein if the similarity is high, the input signal is a known signal, the class of the input signal is the class output by a classifier, and if the similarity is low, the input signal is an unknown class signal. By assigning different attentions to the feature vectors output by the coding network through an attentive mechanism, for the known class of signals, the main features of the feature vectors are strengthened, and the secondary features are weakened, so that the known class of signals can be correctly recovered. While for signals of unknown class their main characteristics are weakened, so that the unknown class signals cannot be recovered. Finally, the input unknown signals are effectively identified, the unknown signals are prevented from being identified as known signals by mistake, and the known signals and the unknown signals are accurately separated.
Drawings
FIG. 1 is a flowchart of a method for identifying an unknown individual of a radar signal according to an embodiment of the present invention;
fig. 2 is a schematic block diagram of an unknown individual identification method of a radar signal according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the embodiments of the present invention, and it is obvious that the described embodiments are some embodiments of the present invention, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Example 1
As shown in fig. 1 and 2, a method of unknown individual identification of radar signals, the method comprising: step S1: constructing and storing a known class sample set N and an unknown class sample set UN, which specifically comprises the following steps: signals of N radars are collected, signals of K radars are used as known classes to form a known class sample set N, and signals of N-K radars are used as unknown classes to form an unknown class sample set UN.
Step S2: inputting each sample to be identified in a known class sample set N and each sample to be identified in an unknown class sample set UN into a coding network, extracting a characteristic vector of the sample to be identified and classifying the sample to be identified; the coding network extracts and classifies feature vectors, and belongs to the prior art, the coding network comprises an input layer, an intermediate layer and an output layer, a sample to be identified is input into the input layer, the intermediate layer consists of a one-dimensional convolution layer and a pooling layer, the convolution layer extracts features of the sample to be identified, the pooling layer reduces the dimension of the sample to be identified, the output layer outputs a feature vector h of the sample to be identified, h is f (wx + b), wherein f () is a function expression mode, h is f (wx + b) which represents h is a function related to x, w is the weight of the coding network, b is the offset of the coding network, x is the sample to be identified of an input one-dimensional sequence, and after the feature vector h is obtained, the feature vector h is input into a classifier to obtain the class of the sample to be identified, wherein the classifier consists of a full connection layer and a softmax classifier.
Step S3: generating an attention probability distribution vector according to a feature vector of a sample to be identified by using a DDPG (Deep Learning) algorithm of Deep Reinforcement Learning (Deep Learning); the main process is as follows: constructing an actor network u (s; theta) based on a convolutional neural network by utilizing a DDPG algorithmu) A critic network Q (s, a; thetaQ) Wherein s is the input of the operator network and the criticc network and is equal to the characteristic vector h, theta extracted by the coding networkuAs a weight parameter of the actor network, thetaQA weight parameter for the critic network; and obtaining an action a through action and random noise sampling generated by the operator network, wherein the action a is an attention probability distribution vector. And the coding network calculates a corresponding reward r according to the a, feeds the reward r back to the criticc network, transfers the reward r to the next input state s ', and stores the conversion state (s, a, r, s') in a cache for network training.
Step S4: generating a conditional feature vector according to the feature vector and the attention probability distribution vector of the sample to be identified, and inputting the conditional feature vector into a decoding network to judge the category of the sample to be identified; the specific process is as follows: obtaining a condition characteristic vector according to a formula c h a, wherein c is the condition characteristic vector, inputting the obtained condition characteristic vector c into a decoding network, the structure of the decoding network is symmetrical to that of the encoding network, the output result of the decoding network is x '═ g (w' c + b '), wherein g () is a function expression mode, x' ═ g (w 'c + b') represents that x 'is a function related to c, w' is the weight of the decoding network, b 'is the offset of the decoding network, x' is the output of the decoding network, and the type of the sample to be identified is judged according to the output of the decoding network. Note that the attention probability distribution vector a corresponding to the maximum Q value obtained in step S3 is selected and multiplied by the feature vector h to obtain a conditional feature vector.
Step S5: and carrying out network training in an encoding network, a decoding network and a DDPG algorithm. The training process belongs to the existing common training mode, and the training process is simply introduced as follows: training the coding network and the decoding network by using a known class sample set N, and obtaining a loss function formula
Figure BDA0002320720800000111
Wherein, yiFor the input i-th sample to be recognized, yi' class of i-th sample to be recognized, x, for encoding network outputiIs the ith sample to be identified, m is the number of the samples to be identified, xi' is the output of the ith sample to be identified in the decoding network; in the training process, the loss value L of m input samples is calculated every time1Then updating the network parameters through a back propagation algorithm until the loss value L of the network1If the number of the channels is not reduced, the training of the coding network and the decoding network is finished;
training of an actor network and a critic network, a target network, i.e. u (s'; theta), is first defined for the actor network and the critic network, respectivelyu') and Q (s ', a '; thetaQ') s ' is the input of the target operator network and the target critical network, a ' is the action of the target operator network output, thetau' is a weight parameter, θ, of the target actor networkQThe method comprises the steps of ' keeping parameters of a coding network fixed during training for weight parameters of a target critic network, enabling a training sample set to comprise an N sample set and an UN sample set, enabling an operator network to generate an action a according to an input state s provided by the coding network and a decoding network, enabling the coding network and the decoding network to calculate a reward r according to the action a and provide a next input state s ', storing a conversion state (s, a, r, s ') into a buffer, and collecting m samples(s) from the bufferi,ai,ri,si'), m 1, 2 …. m, critic network based on s in the transition stateiAnd the output a of the operator networkiCalculating Q value, and calculating strategy by operator network according to Q valueGraduating and completing the updating of the weighting parameters of the operator network, and the formula is as follows:
Figure BDA0002320720800000121
where i represents the ith sample, si is the input state of the ith sample, aiRepresenting the output of the operator network in the ith sample, M is the number of samples taken from the buffer,
Figure BDA0002320720800000122
in order to be a strategy gradient, the gradient is determined,
Figure BDA0002320720800000123
is the sign of the gradient; the target actor network based on s in the transition statei' obtaining action ai', target critic network according to si' and aiObtaining a target critic network output Q', calculating a time difference error L of the Q obtained by the critic network and the Q obtained by the target critic network, calculating a gradient according to the time difference error L, and updating a critic network weight according to the gradient L, wherein the difference error expression is as follows:
Figure BDA0002320720800000124
wherein y isiCalculated according to the following formula:
yi=ri+γQ′(si+1,u′(si+1u′)|ΘQ′)
wherein r isiThe reward value of the ith sample to be identified is y, the preset weight coefficient is y, Q is the critic network output value, and Q' is the target critic network output value.
Updating the weight parameters of the target operator network and the target critical network by a soft update algorithm in the following way:
ΘQ′=τΘQ+(1-τ)ΘQ′
Θu′=τΘu+(1-τ)Θu′
tau is a weight coefficient and generally takes a value of 0.001;
and finishing the training of the network in the DDPG algorithm until the difference error is smaller than a preset value and is stable.
The working principle of the invention is as follows: as shown in fig. 2, a signal of a sample to be recognized first passes through a coding network, implicit features of the signal are automatically extracted, an obtained feature vector of an input signal is input to a classifier on one hand, and the signal is classified, which is consistent with a conventional classification network process. On the other hand, the conditional feature vector is obtained by multiplying the attention probability distribution vector generated by reinforcement learning, and the generated conditional feature vector is input to a decoding network to recover the input signal. And calculating the similarity between the output result of the decoding network and the original input signal, wherein if the similarity is high, the input signal is a known signal, the class of the input signal is the class output by the classifier, and if the similarity is low, the input signal is an unknown class signal. The reinforcement learning agent generates an attention probability distribution vector according to the characteristic vector of the input signal, judges the type of the input signal according to the output of the decoding network, feeds a reward of reinforcement learning back to +1 if the judgment is correct, otherwise, the reinforcement learning agent adjusts parameters dynamically according to the obtained reward to achieve the optimal generated attention probability distribution vector, wherein the reward is-1.
According to the technical scheme, the unknown individual identification method and the unknown individual identification device for the radar signal, provided by the invention, have the advantages that the output of the decoding network is effectively controlled by performing weighted output on the characteristic vector of the intermediate hidden layer between the traditional coding network and the traditional decoding network, when the input sample is of a known type existing in the training sample set, the decoding network can correctly decode the input sample, and when the input sample is of an unknown type not existing in the training sample set, the decoding network cannot correctly decode the input sample. Therefore, the input samples are classified through the coding network, and whether the input samples are unknown classes which do not exist in the training sample set is judged through the output result of the decoding network. Firstly, the signal to be classified automatically extracts the implicit characteristics of the signal through a coding network, and the obtained characteristic vector of the input sample is input into a classifier to classify the sample on one hand, and the process is consistent with the flow of the traditional classification network. On the other hand, the conditional feature vector is obtained by multiplying the attention probability distribution vector generated by reinforcement learning, and the generated conditional feature vector is input into a decoding network to recover the input sample. And calculating the similarity of the output result of the decoding network and the original input sample, wherein if the similarity is high, the input sample is a known signal, the class of the input sample is the class output by the classifier, and if the similarity is low, the input sample is an unknown class signal. The reinforcement learning agent generates an attention probability distribution vector according to the characteristic vector of the input signal, judges the type of the input sample according to the output of the decoding network, feeds back a reward of reinforcement learning to be +1 if the judgment is correct, otherwise, the reinforcement learning agent adjusts parameters dynamically according to the obtained reward to achieve the optimal generated attention probability distribution vector, wherein the reward is-1.
Example 2
Corresponding to embodiment 1 of the present invention, embodiment 2 of the present invention further provides an unknown individual identification device for a radar signal, including:
the sample set construction module is used for constructing and storing a known class sample set N and an unknown class sample set UN;
the extraction and classification module is used for inputting each sample to be identified in the known class sample set N and each sample to be identified in the unknown class sample set UN into a coding network, extracting the characteristic vector of the sample to be identified and classifying the sample to be identified;
the vector generation module is used for generating an attention probability distribution vector according to the feature vector of the sample to be identified by using a DDPG algorithm;
the class judgment module is used for generating a condition characteristic vector according to the characteristic vector and the attention probability distribution vector of the sample to be identified and inputting the condition characteristic vector into a decoding network to judge the class of the sample to be identified;
and the training module is used for carrying out network training in the coding network, the decoding network and the DDPG algorithm.
Specifically, the constructing of the known class sample set N and the unknown class sample set UN includes: signals of N radars are collected, signals of K radars are used as known classes to form a known class sample set N, and signals of N-K radars are used as unknown classes to form an unknown class sample set UN.
Specifically, the extraction and classification module is further configured to: the encoding network comprises an input layer, an intermediate layer and an output layer, wherein a sample to be identified is input into the input layer, the intermediate layer consists of a one-dimensional convolution layer and a pooling layer, the convolution layer performs feature extraction on the sample to be identified, the pooling layer performs dimension reduction on the sample to be identified, the output layer outputs a feature vector h of the sample to be identified, the h is f (wx + b), w is the weight of the encoding network, b is the bias of the encoding network, x is the sample to be identified of an input one-dimensional sequence, after the feature vector h is obtained, the feature vector h is input into a classifier, and the category of the sample to be identified is obtained, wherein the classifier consists of a full-connection layer and a softmax classifier.
Specifically, the vector generation module is further configured to: constructing an actor network u (s; theta) based on a convolutional neural network by utilizing a DDPG algorithmu) A critic network Q (s, a; thetaQ) Wherein s is the input of the operator network and the criticc network and is equal to the characteristic vector h, theta extracted by the coding networkuAs a weight parameter of the actor network, thetaQA weight parameter for the critic network; and obtaining an action a through action and random noise sampling generated by the operator network, wherein the action a is an attention probability distribution vector.
Specifically, the category determination module further includes: and acquiring a condition characteristic vector according to a formula c h a, wherein c is the condition characteristic vector, inputting the acquired condition characteristic vector c into a decoding network, wherein the decoding network structure is symmetrical to the encoding network, the output result of the decoding network is x 'g (w' c + b '), w' is the weight of the decoding network, b 'is the offset of the decoding network, and x' is the output of the decoding network, and the type of the sample to be identified is judged according to the output of the decoding network.
Specifically, the training module is further configured to: training the coding network and the decoding network by using a known class sample set N, and obtaining a loss function formula
Figure BDA0002320720800000161
Wherein, yiFor the input i-th sample to be recognized, yi' class of i-th sample to be recognized, x, for encoding network outputiIs the ith sample to be identified, m is the number of the samples to be identified, xi' is the output of the ith sample to be identified in the decoding network; in the training process, the loss value L of m input samples is calculated every time1Then updating the network parameters through a back propagation algorithm until the loss value L of the network1If the number of the channels is not reduced, the training of the coding network and the decoding network is finished;
training of an actor network and a critic network, a target network, i.e. u (s'; theta), is first defined for the actor network and the critic network, respectivelyu') and Q (s ', a '; thetaQ') s ' is the input of the target operator network and the target critical network, a ' is the action of the target operator network output, thetau' is a weight parameter, θ, of the target actor networkQThe method comprises the steps of ' keeping parameters of a coding network fixed during training for weight parameters of a target critic network, enabling a training sample set to comprise an N sample set and an UN sample set, enabling an operator network to generate an action a according to an input state s provided by the coding network and a decoding network, enabling the coding network and the decoding network to calculate a reward r according to the action a and provide a next input state s ', storing a conversion state (s, a, r, s ') into a buffer, and collecting m samples(s) from the bufferi,ai,ri,si'), m 1, 2 …. m, critic network based on s in the transition stateiAnd the output a of the operator networkiCalculating a Q value, calculating a strategy gradient according to the Q value by the operator network and finishing updating the weight parameter of the operator network, wherein the formula is as follows:
Figure BDA0002320720800000162
where M is the number of samples taken from the buffer,
Figure BDA0002320720800000163
in order to be a strategy gradient, the gradient is determined,
Figure BDA0002320720800000164
is the sign of the gradient; the target actor network based on s in the transition statei' obtaining action ai', target critic network according to si' and aiObtaining a target critic network output Q ', obtaining Q according to the critic network and Q ' obtained by the target critic network, calculating a time difference error L of the Q and the Q ', calculating a gradient according to the time difference error L, and updating a critic network weight according to the gradient, wherein the difference error expression is as follows:
Figure BDA0002320720800000171
wherein y isiCalculated according to the following formula:
yi=ri+γQ′(si+1,u′(si+1u′)|ΘQ′)
wherein r isiThe reward value of the ith sample to be identified is gamma, the gamma is a preset weight coefficient, Q is a criticic network output value, and Q' is a target criticic network output value;
updating the weight parameters of the target operator network and the target critical network by a soft update algorithm in the following way: thetaQ′=τΘQ+(1-τ)ΘQ′
Θu′=τΘu+(1-τ)Θu′
Tau is a weight coefficient and takes the value of 0.001;
and finishing the training of the network in the DDPG algorithm until the difference error is smaller than a preset value and is stable.
The above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. A method of unknown individual identification of radar signals, the method comprising:
the method comprises the following steps: constructing and storing a known class sample set N and an unknown class sample set UN;
step two: inputting each sample to be identified in a known class sample set N and each sample to be identified in an unknown class sample set UN into a coding network, extracting a characteristic vector of the sample to be identified and classifying the sample to be identified;
step three: generating an attention probability distribution vector according to the feature vector of the sample to be identified by using a DDPG algorithm;
step four: generating a conditional feature vector according to the feature vector and the attention probability distribution vector of the sample to be identified, and inputting the conditional feature vector into a decoding network to judge the category of the sample to be identified;
step five: and carrying out network training in an encoding network, a decoding network and a DDPG algorithm.
2. The method according to claim 1, wherein the constructing a sample set N of known classes and a sample set UN of unknown classes comprises: signals of N radars are collected, signals of K radars are used as known classes to form a known class sample set N, and signals of N-K radars are used as unknown classes to form an unknown class sample set UN.
3. The method of claim 1, wherein the second step comprises: the encoding network comprises an input layer, an intermediate layer and an output layer, wherein a sample to be identified is input into the input layer, the intermediate layer consists of a one-dimensional convolution layer and a pooling layer, the convolution layer performs feature extraction on the sample to be identified, the pooling layer performs dimension reduction on the sample to be identified, the output layer outputs a feature vector h of the sample to be identified, the h is f (wx + b), w is the weight of the encoding network, b is the bias of the encoding network, x is the sample to be identified of an input one-dimensional sequence, after the feature vector h is obtained, the feature vector h is input into a classifier, and the category of the sample to be identified is obtained, wherein the classifier consists of a full-connection layer and a softmax classifier.
4. The method of claim 3, wherein the third step comprises: constructing an actor network u (s; theta) based on a convolutional neural network by utilizing a DDPG algorithmu) A critic network Q (s, a; thetaQ) Wherein s is the input of the operator network and the criticc network and is equal to the characteristic vector h, theta extracted by the coding networkuAs a weight parameter of the actor network, thetaQA weight parameter for the critic network; and obtaining an action a through action and random noise sampling generated by the operator network, wherein the action a is an attention probability distribution vector.
5. The method of claim 4, wherein the fourth step comprises: and acquiring a condition characteristic vector according to a formula c h a, wherein c is the condition characteristic vector, inputting the acquired condition characteristic vector c into a decoding network, wherein the decoding network structure is symmetrical to the encoding network, the output result of the decoding network is x 'g (w' c + b '), w' is the weight of the decoding network, b 'is the offset of the decoding network, and x' is the output of the decoding network, and the type of the sample to be identified is judged according to the output of the decoding network.
6. The method of claim 5, wherein the step five comprises: training the coding network and the decoding network by using a known class sample set N, and obtaining a loss function formula
Figure FDA0002320720790000021
Wherein, yiFor the input i-th sample to be recognized, yi' for the category of the ith sample to be identified of the coded network output,xiis the ith sample to be identified, m is the number of the samples to be identified, xi' is the output of the ith sample to be identified in the decoding network; in the training process, the loss value L of m input samples is calculated every time1Then updating the network parameters through a back propagation algorithm until the loss value L of the network1If the number of the channels is not reduced, the training of the coding network and the decoding network is finished;
training of an actor network and a critic network, a target network, i.e. u (s'; theta), is first defined for the actor network and the critic network, respectivelyu′) And Q (s ', a'; thetaQ′) S 'is the input of the target operator network and the target critical network, a' is the action of the target operator network output, θu' is a weight parameter, θ, of the target actor networkQThe method comprises the steps of ' keeping parameters of a coding network fixed during training for weight parameters of a target critic network, enabling a training sample set to comprise an N sample set and an UN sample set, enabling an operator network to generate an action a according to an input state s provided by the coding network and a decoding network, enabling the coding network and the decoding network to calculate a reward r according to the action a and provide a next input state s ', storing a conversion state (s, a, r, s ') into a buffer, and collecting m samples(s) from the bufferi,ai,ri,si'), m 1, 2 …. m, critic network based on s in the transition stateiAnd the output a of the operator networkiCalculating a Q value, calculating a strategy gradient according to the Q value by the operator network and finishing updating the weight parameter of the operator network, wherein the formula is as follows:
Figure FDA0002320720790000031
where M is the number of samples taken from the buffer,
Figure FDA0002320720790000032
in order to be a strategy gradient, the gradient is determined,
Figure FDA0002320720790000033
is the sign of the gradient; target operator network based on forwardingS in the change of statei' obtaining action ai', target critic network according to si' and aiObtaining a target critic network output Q ', obtaining Q according to the critic network and Q ' obtained by the target critic network, calculating a time difference error L of the Q and the Q ', calculating a gradient according to the time difference error L, and updating a critic network weight according to the gradient, wherein the difference error expression is as follows:
Figure FDA0002320720790000034
wherein y isiCalculated according to the following formula:
yi=riQ′(si+1,u′(si+1u′)|ΘQ′)
wherein r isiThe reward value of the ith sample to be identified is gamma, the gamma is a preset weight coefficient, Q is a criticic network output value, and Q' is a target criticic network output value;
updating the weight parameters of the target operator network and the target critical network by a soft update algorithm in the following way:
ΘQ′=τΘQ+(1-τ)ΘQ′
Θu′=τΘu+(1-τ)Θu′
tau is a weight coefficient and takes the value of 0.001;
and finishing the training of the network in the DDPG algorithm until the difference error is smaller than a preset value and is stable.
7. An apparatus for unknown individual identification of radar signals, the apparatus comprising:
the sample set construction module is used for constructing and storing a known class sample set N and an unknown class sample set UN;
the extraction and classification module is used for inputting each sample to be identified in the known class sample set N and each sample to be identified in the unknown class sample set UN into a coding network, extracting the characteristic vector of the sample to be identified and classifying the sample to be identified;
the vector generation module is used for generating an attention probability distribution vector according to the feature vector of the sample to be identified by using a DDPG algorithm;
the class judgment module is used for generating a condition characteristic vector according to the characteristic vector and the attention probability distribution vector of the sample to be identified and inputting the condition characteristic vector into a decoding network to judge the class of the sample to be identified;
and the training module is used for carrying out network training in the coding network, the decoding network and the DDPG algorithm.
8. The apparatus according to claim 7, wherein the constructing of the known class sample set N and the unknown class sample set UN comprises: signals of N radars are collected, signals of K radars are used as known classes to form a known class sample set N, and signals of N-K radars are used as unknown classes to form an unknown class sample set UN.
9. The apparatus of claim 7, wherein the extraction classification module is further configured to: the encoding network comprises an input layer, an intermediate layer and an output layer, wherein a sample to be identified is input into the input layer, the intermediate layer consists of a one-dimensional convolution layer and a pooling layer, the convolution layer performs feature extraction on the sample to be identified, the pooling layer performs dimension reduction on the sample to be identified, the output layer outputs a feature vector h of the sample to be identified, the h is f (wx + b), w is the weight of the encoding network, b is the bias of the encoding network, x is the sample to be identified of an input one-dimensional sequence, after the feature vector h is obtained, the feature vector h is input into a classifier, and the category of the sample to be identified is obtained, wherein the classifier consists of a full-connection layer and a softmax classifier.
10. The apparatus of claim 9, wherein the vector generation module is further configured to: constructing an actor network u (s; theta) based on a convolutional neural network by utilizing a DDPG algorithmu) A critic network Q (s, a; thetaQ) Wherein s is actor network and criticc network, equal to the characteristic vector h, theta extracted by the coding networkuAs a weight parameter of the actor network, thetaQA weight parameter for the critic network; and obtaining an action a through action and random noise sampling generated by the operator network, wherein the action a is an attention probability distribution vector.
CN201911296607.8A 2019-12-16 2019-12-16 Unknown individual identification method and device for radar signals Active CN111144462B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911296607.8A CN111144462B (en) 2019-12-16 2019-12-16 Unknown individual identification method and device for radar signals

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911296607.8A CN111144462B (en) 2019-12-16 2019-12-16 Unknown individual identification method and device for radar signals

Publications (2)

Publication Number Publication Date
CN111144462A true CN111144462A (en) 2020-05-12
CN111144462B CN111144462B (en) 2023-10-20

Family

ID=70518457

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911296607.8A Active CN111144462B (en) 2019-12-16 2019-12-16 Unknown individual identification method and device for radar signals

Country Status (1)

Country Link
CN (1) CN111144462B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112014821A (en) * 2020-08-27 2020-12-01 电子科技大学 Unknown vehicle target identification method based on radar broadband characteristics
CN113792733A (en) * 2021-09-17 2021-12-14 平安科技(深圳)有限公司 Vehicle component detection method, system, electronic device and storage medium
CN113807243A (en) * 2021-09-16 2021-12-17 上海交通大学 Water obstacle detection system and method based on attention to unknown target

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180357542A1 (en) * 2018-06-08 2018-12-13 University Of Electronic Science And Technology Of China 1D-CNN-Based Distributed Optical Fiber Sensing Signal Feature Learning and Classification Method
CN109934269A (en) * 2019-02-25 2019-06-25 中国电子科技集团公司第三十六研究所 A kind of opener recognition methods of electromagnetic signal and device
CN110109109A (en) * 2019-04-26 2019-08-09 西安电子科技大学 HRRP target identification method based on multiresolution attention convolutional network
US20190331768A1 (en) * 2018-04-26 2019-10-31 Metawave Corporation Reinforcement learning engine for a radar system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190331768A1 (en) * 2018-04-26 2019-10-31 Metawave Corporation Reinforcement learning engine for a radar system
US20180357542A1 (en) * 2018-06-08 2018-12-13 University Of Electronic Science And Technology Of China 1D-CNN-Based Distributed Optical Fiber Sensing Signal Feature Learning and Classification Method
CN109934269A (en) * 2019-02-25 2019-06-25 中国电子科技集团公司第三十六研究所 A kind of opener recognition methods of electromagnetic signal and device
CN110109109A (en) * 2019-04-26 2019-08-09 西安电子科技大学 HRRP target identification method based on multiresolution attention convolutional network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
贾亚飞;朱永利;高佳程;袁博;: "基于样本加权FCM聚类的未知类别局部放电信号识别" *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112014821A (en) * 2020-08-27 2020-12-01 电子科技大学 Unknown vehicle target identification method based on radar broadband characteristics
CN112014821B (en) * 2020-08-27 2022-05-17 电子科技大学 Unknown vehicle target identification method based on radar broadband characteristics
CN113807243A (en) * 2021-09-16 2021-12-17 上海交通大学 Water obstacle detection system and method based on attention to unknown target
CN113807243B (en) * 2021-09-16 2023-12-05 上海交通大学 Water obstacle detection system and method based on attention to unknown target
CN113792733A (en) * 2021-09-17 2021-12-14 平安科技(深圳)有限公司 Vehicle component detection method, system, electronic device and storage medium
CN113792733B (en) * 2021-09-17 2023-07-21 平安科技(深圳)有限公司 Vehicle part detection method, system, electronic device and storage medium

Also Published As

Publication number Publication date
CN111144462B (en) 2023-10-20

Similar Documents

Publication Publication Date Title
CN113378632B (en) Pseudo-label optimization-based unsupervised domain adaptive pedestrian re-identification method
WO2021134871A1 (en) Forensics method for synthesized face image based on local binary pattern and deep learning
CN108985334B (en) General object detection system and method for improving active learning based on self-supervision process
CN108564129B (en) Trajectory data classification method based on generation countermeasure network
CN107609525B (en) Remote sensing image target detection method for constructing convolutional neural network based on pruning strategy
CN110796168A (en) Improved YOLOv 3-based vehicle detection method
CN112507996B (en) Face detection method of main sample attention mechanism
CN111079836B (en) Process data fault classification method based on pseudo label method and weak supervised learning
CN108898131A (en) It is a kind of complexity natural scene under digital instrument recognition methods
CN109697469A (en) A kind of self study small sample Classifying Method in Remote Sensing Image based on consistency constraint
CN114841257B (en) Small sample target detection method based on self-supervision comparison constraint
CN111144462B (en) Unknown individual identification method and device for radar signals
CN108171119B (en) SAR image change detection method based on residual error network
CN111401105B (en) Video expression recognition method, device and equipment
CN113096169A (en) Non-rigid multimode medical image registration model establishing method and application thereof
CN104978569A (en) Sparse representation based incremental face recognition method
CN115471712A (en) Learning method for generating zero sample based on visual semantic constraint
CN115690549A (en) Target detection method for realizing multi-dimensional feature fusion based on parallel interaction architecture model
CN113283467B (en) Weak supervision picture classification method based on average loss and category-by-category selection
CN111652264A (en) Negative migration sample screening method based on maximum mean difference
CN115511012B (en) Class soft label identification training method with maximum entropy constraint
CN117516937A (en) Rolling bearing unknown fault detection method based on multi-mode feature fusion enhancement
CN116704585A (en) Face recognition method based on quality perception
CN113420833B (en) Visual question answering method and device based on semantic mapping of questions
CN111209860A (en) Video attendance system and method based on deep learning and reinforcement learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant