CN113364712B - DDPG network-based mixed radiation source signal separation method - Google Patents

DDPG network-based mixed radiation source signal separation method Download PDF

Info

Publication number
CN113364712B
CN113364712B CN202110545722.5A CN202110545722A CN113364712B CN 113364712 B CN113364712 B CN 113364712B CN 202110545722 A CN202110545722 A CN 202110545722A CN 113364712 B CN113364712 B CN 113364712B
Authority
CN
China
Prior art keywords
signal
network
current
matrix
separation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110545722.5A
Other languages
Chinese (zh)
Other versions
CN113364712A (en
Inventor
张怡如
杨远望
邓建华
游长江
朱学勇
潘钰文
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN202110545722.5A priority Critical patent/CN113364712B/en
Publication of CN113364712A publication Critical patent/CN113364712A/en
Application granted granted Critical
Publication of CN113364712B publication Critical patent/CN113364712B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L25/00Baseband systems
    • H04L25/02Details ; arrangements for supplying electrical power along data transmission lines
    • H04L25/03Shaping networks in transmitter or receiver, e.g. adaptive shaping networks
    • H04L25/03006Arrangements for removing intersymbol interference

Landscapes

  • Engineering & Computer Science (AREA)
  • Power Engineering (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Variable-Direction Aerials And Aerial Arrays (AREA)

Abstract

The invention discloses a mixed radiation source signal separation method based on a DDPG network, which comprises the steps of firstly adopting K signal test antennas to collect radiation source signal samples of K sample radiation sources, processing the radiation source signal samples to obtain mixed signal samples, regarding a separation matrix as an intelligent body, regarding addition and subtraction of matrix elements as actions, regarding the separation degree of signals as an environment, designing the DDPG network, then adopting the mixed signal samples to train the DDPG network, obtaining mixed signals of the K radiation sources by each signal test antenna during actual application, inputting the mixed signals into the trained DDPG network for retraining, and obtaining a signal separation result. The invention effectively improves the accuracy of the mixed signal separation by introducing the DDPG network.

Description

DDPG network-based mixed radiation source signal separation method
Technical Field
The invention belongs to the technical field of signal separation, and particularly relates to a mixed radiation source signal separation method based on a DDPG network.
Background
Accurately and efficiently acquiring a desired signal from a mixed signal is an important research subject in the field of communications, and determines the reception capability of a communication system. The blind signal separation refers to separating signals under the condition that a source signal and a channel are unknown or partially known, is a hot spot in the field of modern signal processing in recent years, and has applications in aspects of wireless communication, voice recognition, biomedicine, mechanical engineering and the like. For wireless communications, blind signal separation is of great significance in the areas of cooperative and non-cooperative communications. In the cooperative communication field, the interference between signals in the mimo communication system and the satellite communication system can be suppressed and separated by blind separation. In the non-cooperative communication field and modern information battles, signals need to be accurately separated from mixed information of own enemies, thus being beneficial to detecting enemies as early as possible, and carrying out correct judgment on enemy equipment and taking corresponding actions. Blind signal analysis is correspondingly faced with greater difficulty in the field of communications, and separation methods in other fields are not necessarily well suited due to signal similarity and complexity.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provide a mixed radiation source signal separation method based on a DDPG network, and the DDPG network is introduced to effectively improve the accuracy of mixed signal separation.
In order to achieve the above purpose, the mixed radiation source signal separation method based on the DDPG network of the present invention includes the following steps:
s1: recording the number of positions where radiation sources are arranged in an actual application environment as K, configuring a sample radiation source at each position to transmit a modulation signal with the length of L, and recording the modulation signal transmitted by the jth sample radiation source as a source signal FjJ ═ 1,2, …, K; configuring K signal testing antennas in an application environment, firstly enabling each sample radiation source to independently send a modulation signal, respectively acquiring a signal sent by the sample radiation source by each signal testing antenna to obtain a data sample, and recording a data sample acquired by the ith signal testing antenna on the jth sample radiation source as Di,jI ═ 1,2, …, K; then, K radiation sources simultaneously transmit modulation signals, each signal test antenna respectively collects the received mixed signals to obtain a mixed signal sample, and the mixed signal sample collected by the ith signal test antenna is recorded as Xi
S2: for the DDPG network, the DDPG action space is designed by adopting the following method:
setting a K-order matrix C, wherein each element of the K-order matrix C is subject to standard normal distribution, and preferentially converting the K-order matrix C into K multiplied by K-dimensional vectors according to rows
Figure BDA0003073436770000021
Wherein c iskDenotes the kth element in vector C', K ═ 1,2, …, K2Corresponds to the first in the matrix C
Figure BDA0003073436770000022
The K-th% K-column elements of the row,
Figure BDA0003073436770000023
represents rounding down,% represents complementation; then, a boundary value bound is defined, and the vector C' and the boundary value bound form the motion space of the DDPG network
Figure BDA0003073436770000024
The DDPG state space is designed by adopting the following method:
setting a K-order separation matrix W, and converting the K-order matrix W into K multiplied by K dimension separation vectors according to row priority
Figure BDA0003073436770000025
Wherein wkRepresents the kth element of the vector W', corresponding to the kth element of the matrix W
Figure BDA0003073436770000026
Row K% K column elements;
recording the mixed signal with length L received by the ith signal test antenna as XiFrom mixed signals X according to preset data positionsiMiddle sampling P data points, and source signal FjThe ratio is calculated from the corresponding P data points, the P ratios form a ratio vector, and the K ratio vectors corresponding to the same mixed signal are spliced to obtain a ratio vector Hi,jDividing K ratio vectors Hi,jSplicing to obtain a vector x _ state with the dimension of K multiplied by P;
the separation signal matrix Y in the current update step is denoted by the mixed signal XiA mixed signal matrix of K × L formed as row vectors, and using the jth row vector of the separation signal matrix Y as the source signal separation result Y of the jth radiation sourcejSeparating the result y at each source signal according to the preset data positionjMiddle sampling P data points, and source signal FjThe ratio is calculated from the corresponding P data points, and the P ratios form a ratio vector GjDividing K ratio vectors GjSplicing to obtain a vector y _ state with the dimension of K multiplied by P;
defining a parameter on-coarse to indicate whether the current step reaches a preset target, if so, setting the on-coarse to 1, otherwise, setting the on-coarse to 0;
separating the vector
Figure BDA0003073436770000027
The vector x _ state, the vector y _ state and the parameter on-good form a state space of the DDPG network
Figure BDA0003073436770000028
The DDPG reward function is designed by adopting the following method:
for the K separation signals y obtained in the current stepjSeparately calculating signal-to-interference ratio SIRjThe calculation formula is as follows:
Figure BDA0003073436770000029
wherein | | | purple hair22 norm is obtained;
judging whether the current step reaches a preset target, i.e. whether each separation signal yjSignal-to-interference ratio (SIR) ofjAre all larger than the preset threshold value, if yes, the reward function is enabled
Figure BDA0003073436770000031
Delta represents a preset constant, otherwise the reward function
Figure BDA0003073436770000032
S3: constructing a DDPG network according to the action space and the state space designed in the step S2, wherein the DDPG network comprises a current strategy network, a current value network, a target strategy network and a target value network, and the method comprises the following steps:
the input information of the current strategy network is state s, and the output information is action a;
the input information of the current value network is state s and action a, and the output information is value Q;
target policy network: the input and output are the same as the current strategy network, and the current strategy network parameters are copied periodically;
target value network: inputting and outputting the current value network, and periodically copying the current value network parameters;
s4: the K mixed signal samples X obtained in the step S1 are samplediInputting a DDPG network, and training the DDPG network, wherein the method specifically comprises the following steps:
s4.1: randomly initializing parameters of four networks in the DDPG network;
s4.2: making the iteration number e equal to 1;
s4.3: randomly initializing a separation matrix W, then calculating a separation signal matrix Y as WX, and taking the jth row vector of the separation signal matrix Y as a source signal separation result Y of the jth radiation sourcejSeparating the result y from the current source signaljDetermining a current state s;
s4.4: initializing the step number t in the iteration to be 1;
s4.5: judging whether the step number T in the iteration is less than T, wherein T represents the preset maximum step number in each iteration, if so, entering a step S4.6, otherwise, entering a step S4.11;
s4.6: the current strategy network obtains action a according to the current state s, adjusts the separation matrix W according to the action a, and recalculates the source signal separation result y 'of each radiation source'jGenerating a next state s'; the current value network obtains a current value Q according to the current state s and the action a; extracting each source signal separation result y from the current state sjCalculating an award value r corresponding to the current state s; then putting the current state s, the action a, the reward value r and the next state s' into an experience pool as a group of experiences; if the experience pool is full when the experience is put in, deleting the earliest experience according to a first-in first-out principle, and then putting the current experience in;
s4.7: judging whether the experience pool is full, if so, entering a step S4.8, otherwise, entering a step S4.10;
s4.8: soft copying the parameters of the current strategy network to a target strategy network, and soft copying the parameters of the current value network to the target value network;
s4.9: for the current strategy network, updating the parameters of the current strategy network by using a gradient strategy by taking the value Q as a loss function;
for the current value network, calculating a loss function and updating the parameters of the current value network according to the loss function, wherein the calculation method of the loss function is as follows:
taking M groups of experiences from the experience pool, and taking the next state s 'in each group of experiences'mInputting a target strategy network to obtain next action a'mM is 1,2, …, M, then state s'mAnd motion a'mInputting a target value network to obtain a value QmCalculating the target return value Z by using the following formulam
Zm=γQm+Rm
Where γ represents a discount factor, RmRepresenting the reward value in the mth set of experiences;
calculating the Loss function Loss of the current value network by adopting the following formula:
Figure BDA0003073436770000041
s4.10: making S ═ S', t ═ t +1 in the current state, and returning to step S4.5;
s4.11: judging whether an iteration end condition is reached, if so, finishing training, and if not, entering the step S4.12;
s4.12: step S4.3 is returned to when e is equal to e + 1;
s5: in practical application, each signal test antenna obtains mixed signals of K radiation sources with the length of L
Figure BDA0003073436770000042
Mixing the K mixed signals
Figure BDA0003073436770000043
Inputting step S4 trained DDPG network for further processingPerforming secondary training; the DDPG action space at this time is designed by adopting the following method:
preferentially converting the K-order matrix C into K multiplied by K-dimensional vectors according to rows
Figure BDA0003073436770000044
The vector C' and the boundary value bound form the motion space of the DDPG network
Figure BDA0003073436770000045
The DDPG state space is designed by adopting the following method:
converting the K-order separation matrix W into K multiplied by K dimension separation vector according to row priority
Figure BDA0003073436770000046
Testing a mixed signal of an antenna for an ith signal
Figure BDA0003073436770000047
From mixed signals according to preset data positions
Figure BDA0003073436770000048
The P data points are sampled, and the P data are formed into a data vector H'i,jVector H 'of data'i,jCopying K times and splicing to obtain data vector with length of KxP
Figure BDA0003073436770000049
K data vectors
Figure BDA00030734367700000410
Splicing to obtain a vector with dimension K multiplied by P
Figure BDA00030734367700000411
Noting the separation signal matrix of the current update step
Figure BDA00030734367700000412
Figure BDA00030734367700000413
Representing a mixed signal
Figure BDA00030734367700000414
Separating the signal matrix as a K × L mixed signal matrix formed by row vectors
Figure BDA0003073436770000051
As a result of the separation of the source signals of the jth radiation source
Figure BDA0003073436770000052
Separating the results at each source signal according to a predetermined data location
Figure BDA0003073436770000053
The P data points of the intermediate sample constitute a data vector G'jK data vectors G'jSplicing to obtain a vector with dimension of KxP
Figure BDA0003073436770000054
Defining parameters
Figure BDA0003073436770000055
Indicating whether the current iteration number is less than a preset threshold value E or not, if so, determining whether the current iteration number is less than the preset threshold value E
Figure BDA0003073436770000056
Otherwise
Figure BDA0003073436770000057
Separating the vector
Figure BDA0003073436770000058
The vector x _ state, the vector y _ state and the parameter on-good form a state space of the DDPG network
Figure BDA0003073436770000059
The specific steps of retraining the DDPG network comprise:
s5.1: making the iteration number e equal to 1;
s5.2: randomly initializing a separation matrix W and then calculating a separation signal matrix
Figure BDA00030734367700000510
Figure BDA00030734367700000511
Representing a mixed signal
Figure BDA00030734367700000512
Separating the signal matrix as a K × L mixed signal matrix formed by row vectors
Figure BDA00030734367700000513
As a result of the separation of the source signals of the jth radiation source
Figure BDA00030734367700000514
Separating the result according to the current source signal
Figure BDA00030734367700000515
Determining a current state s;
s5.3: initializing the step number t in the iteration to be 1;
s5.4: judging whether the step number T in the iteration is less than T, wherein T represents the preset maximum step number in each iteration, if so, entering step S5.5, otherwise, entering step S5.7;
s5.5: the current strategy network obtains action a according to the current state s, adjusts the separation matrix W according to the action a, and recalculates the source signal separation result y 'of each radiation source'jGenerating a next state s';
s5.6: making S ═ S', t ═ t +1 in the current state, and returning to step S5.4;
s5.7: judging whether the iteration times E is less than E ', wherein E ' represents the preset iteration times of retraining, E ' is more than E, if so, entering a step S5.8, otherwise, entering a step S5.9 after retraining is finished;
s5.8: step S5.2 is returned to when e is equal to e + 1;
s5.9: determining a separation matrix from the last state
Figure BDA00030734367700000516
Then calculating a matrix of separated signals
Figure BDA00030734367700000517
Separating signal matrix
Figure BDA00030734367700000518
As a final source signal separation result of the jth radiation source
Figure BDA00030734367700000519
The invention relates to a mixed radiation source signal separation method based on a DDPG network, which comprises the steps of firstly adopting K signal test antennas to collect radiation source signal samples of K sample radiation sources, processing the radiation source signal samples to obtain mixed signal samples, regarding a separation matrix as an intelligent body, regarding addition and subtraction of matrix elements as actions, regarding the separation degree of signals as an environment, designing the DDPG network, then adopting the mixed signal samples to train the DDPG network, obtaining mixed signals of the K radiation sources by each signal test antenna during actual application, inputting the mixed signals into the trained DDPG network for retraining, and obtaining signal separation results.
The invention has the following beneficial effects:
1) the invention can realize signal separation by the prior knowledge of the sample radiation source under the condition of unknown mixed channel;
2) the invention adopts the DDPG network to interact the separated action and the signal environment, more accords with the actual separation scene and improves the signal separation effect.
Drawings
FIG. 1 is a block diagram of an embodiment of a DDPG network-based mixed radiation source signal separation method of the present invention;
FIG. 2 is a flow chart of DDPG network training in the present invention;
FIG. 3 is a diagram illustrating updating network parameters in the present embodiment;
FIG. 4 is a flow chart of the retraining of the DDPG network in the present invention;
fig. 5 is a comparison diagram of waveforms of the separated handset signal and the source signal in the present embodiment;
fig. 6 is a comparison graph of waveforms of the USRP device signal and the source signal obtained by separation in the present embodiment.
Detailed Description
The following description of the embodiments of the present invention is provided in order to better understand the present invention for those skilled in the art with reference to the accompanying drawings. It is to be expressly noted that in the following description, a detailed description of known functions and designs will be omitted when it may obscure the subject matter of the present invention.
Examples
Fig. 1 is a block diagram of an embodiment of a mixed radiation source signal separation method based on a DDPG network. As shown in fig. 1, the method for separating mixed radiation source signals based on the DDPG network of the present invention specifically includes the steps of:
s101: acquiring a radiation source signal sample:
recording the number of positions where radiation sources are arranged in an actual application environment as K, configuring a sample radiation source at each position to transmit a modulation signal with the length of L, and recording the modulation signal transmitted by the jth sample radiation source as a source signal FjJ is 1,2, …, K. Configuring K signal testing antennas in an application environment, firstly enabling each sample radiation source to independently send a modulation signal, respectively acquiring a signal sent by the sample radiation source by each signal testing antenna to obtain a data sample, and recording a data sample acquired by the ith signal testing antenna on the jth sample radiation source as Di,jI is 1,2, …, K. Then, K radiation sources simultaneously transmit modulation signals, each signal test antenna respectively collects the received mixed signals to obtain a mixed signal sample, and the mixed signal sample collected by the ith signal test antenna is recorded as Xi
In this embodiment, in order to enable the acquired signal samples to more accurately represent channel characteristics and enable the subsequently obtained separation matrix to be more accurate, the modulation signal sent by each sample radiation source needs to satisfy the following conditions: after the modulation signal is converted into IQ two-path data by signal samples acquired by each signal testing antenna, the modulus value of each data point in the IQ two-path data is larger than a preset threshold value.
In addition, in order to make the trained DDPG network adapt to small amplitude position variation of the radiation source, when obtaining the radiation source signal sample in step S101, the radiation source may be moved several times with small amplitude to obtain different transmission scenarios, for each transmission scenario, several data samples of the ith signal testing antenna to the jth sample radiation source and mixed signal samples in the transmission scenario may be obtained, and K × K data samples D in each transmission scenario may be obtainedi,jAnd K mixed signal samples XiI.e. a set of radiation source signal samples is formed, and the radiation source signal samples of all transmitted scenes are formed into a radiation source signal sample set.
S102: designing a DDPG network:
in the invention, the separation matrix is regarded as an agent, the addition and subtraction of matrix elements are regarded as actions, the separation degree of signals is regarded as an environment, and the ideal signal separation is realized through the interaction between the agent and the environment and the feedback of the environment to the agent. Based on the thought, the DDPG network is designed, and the specific method comprises the following steps:
1) designing DDPG action space
Setting a K-order matrix C, wherein each element of the K-order matrix C is subject to standard normal distribution, and preferentially converting the K-order matrix C into K multiplied by K-dimensional vectors according to rows
Figure BDA0003073436770000071
Wherein c iskDenotes the kth element in vector C', K ═ 1,2, …, K2Corresponds to the first in the matrix C
Figure BDA0003073436770000072
The K-th% K-column elements of the row,
Figure BDA0003073436770000073
meaning rounded down,% meaning complementation. Then, a boundary value bound is defined, and the vector C' and the boundary value bound form the motion space of the DDPG network
Figure BDA0003073436770000074
2) Designing DDPG state spaces
Setting a K-order separation matrix W, and converting the K-order matrix W into K multiplied by K dimension separation vectors according to row priority
Figure BDA0003073436770000075
Wherein wkRepresents the kth element of the vector W', corresponding to the kth element of the matrix W
Figure BDA0003073436770000076
Row K% K column elements.
Recording the mixed signal with length L received by the ith signal test antenna as XiFrom the mixed signal X according to predetermined data positionsiMiddle sampling P data points, and source signal FjThe ratio is calculated from the corresponding P data points, the P ratios form a ratio vector, and the K ratio vectors corresponding to the same mixed signal are spliced to obtain a ratio vector Hi,jDividing K ratio vectors Hi,jAnd splicing to obtain a vector x _ state with the dimension of K multiplied by P.
The separation signal matrix Y in the current update step is recorded as WX, X is represented by the mixed signal XiA mixed signal matrix of K × L formed as row vectors, and using the jth row vector of the separation signal matrix Y as the source signal separation result Y of the jth radiation sourcejSeparating the result y at each source signal according to the preset data positionjMiddle sampling P data points, and source signal FjThe ratio is calculated from the corresponding P data points, and the P ratios form a ratio vector GjDividing K ratio vectors GjAnd splicing to obtain a vector y _ state with the dimension of K multiplied by P.
And defining a parameter on-coarse to indicate whether the current step reaches a preset target, wherein if yes, the on-coarse is 1, and otherwise, the on-coarse is 0.
Separating the vector
Figure BDA0003073436770000081
The vector x _ state, the vector y _ state and the parameter on-real form the state space of the DDPG network
Figure BDA0003073436770000082
3) Designing DDPG reward functions
For the K separation signals y obtained in the current stepjSeparately calculating signal-to-interference ratio SIRjThe calculation formula is as follows:
Figure BDA0003073436770000083
wherein | | | calving2Which means 2 norm is found.
Judging whether the current step reaches a preset target, i.e. whether each separation signal yjSignal-to-interference ratio (SIR) ofjAre all larger than the preset threshold value, if yes, the reward function is enabled
Figure BDA0003073436770000084
Delta represents a preset constant, otherwise the reward function
Figure BDA0003073436770000085
In this embodiment, the threshold value of the signal to interference ratio is 30, and the value of Δ is 100. By adopting the mode, the reward function can represent whether the current step reaches the preset target or not and can measure the separation degree of the signals.
S103: constructing a DDPG network:
constructing a DDPG network according to the action space and the state space designed in the step S102, wherein the DDPG network comprises a current strategy network, a current value network, a target strategy network and a target value network, and the DDPG network comprises:
the input information of the current strategy network is state s, and the output information is action a;
the input information of the current value network is state s and action a, and the output information is value Q;
target policy network: the input and output are the same as the current strategy network, and the current strategy network parameters are copied periodically;
target value network: and the input and output are the same as the current value network, and the current value network parameters are periodically copied.
S104: training the DDPG network:
the K mixed signal samples X obtained in the step S101 are samplediInputting the DDPG network, and training the DDPG network.
FIG. 2 is a flow chart of DDPG network training in the present invention. As shown in fig. 2, the DDPG network training of the present invention specifically comprises:
s201: initializing the network:
parameters of four networks in the DDPG network are initialized randomly.
S202: initializing iteration parameters:
let the iteration number e equal to 1.
S203: initializing a separation matrix:
randomly initializing a separation matrix W, and calculating a separation signal matrix Y ═ WX, X representing a signal mixture XiA mixed signal matrix of K × L formed as row vectors, and using the jth row vector of the separation signal matrix Y as the source signal separation result Y of the jth radiation sourcejSeparating the result y from the current source signaljThe current state s is determined.
When a data sample set is acquired in step S101, a separate signal matrix is calculated and a subsequent operation is performed in step S203 by arbitrarily selecting a group of radiation source signal samples from the radiation source signal sample set.
S204: and initializing the step number t in the iteration to be 1.
S205: and judging whether the step number T in the current iteration is less than T, wherein T represents the preset maximum step number in each iteration, in the embodiment, T is 500, if so, the step S206 is executed, and if not, the step S211 is executed.
S206: generating new experience:
the current policy network is based on the current stateObtaining action a from state s, adjusting the separation matrix W according to the action a, and recalculating the source signal separation result y 'of each radiation source'jThe next state s' is generated. And the current value network obtains the current value Q according to the current state s and the action a. Extracting each source signal separation result y from the current state sjAnd calculating the reward value r corresponding to the current state s. The current state s, action a, reward value r, and next state s' are then placed into the experience pool as a set of experiences. If the experience pool is full when the experience is put in, the earliest experience is deleted according to the first-in first-out principle, and then the current experience is put in.
S207: and judging whether the experience pool is full, if so, entering step S208, and otherwise, entering step S210.
S208: copying network parameters:
and soft copying the parameters of the current strategy network to a target strategy network, and soft copying the parameters of the current value network to the target value network.
S209: and (3) updating network parameters:
fig. 3 is a schematic diagram of updating network parameters in the present embodiment. As shown in fig. 3, for the current policy network, the value Q is used as a loss function to update the parameters of the current policy network by using a gradient policy.
For the current value network, calculating a loss function and updating the parameters of the current value network according to the loss function, wherein the calculation method of the loss function is as follows:
taking M groups of experiences from the experience pool, and taking the next state s 'in each group of experiences'mInputting a target strategy network to obtain next action a'mM is 1,2, …, M, then state s'mAnd action a'mInputting a target value network to obtain a value QmCalculating the target return value Z by using the following formulam
Zm=γQm+Rm
Where γ represents a discount factor, RmRepresenting the prize values in the mth set of experiences.
Calculating the Loss function Loss of the current value network by adopting the following formula:
Figure BDA0003073436770000101
s210: let S ═ S', t ═ t +1, the process returns to step S205.
S211: and judging whether an iteration end condition is reached, if so, finishing training, and if not, entering the step S212. The iteration ending conditions of the DDPG network training generally include two conditions, one is that the iteration number reaches a preset threshold value, which is designed to be 10000 in this embodiment, and the other is that the reward value reaches a preset threshold value, which is set as required.
S212: let e be e +1, return to step S203.
S105: mixed signal separation:
in practical application, each signal test antenna obtains mixed signals of K radiation sources with the length of L
Figure BDA0003073436770000102
Mixing the K mixed signals
Figure BDA0003073436770000103
And inputting the DDPG network trained in the step S104 for retraining. The DDPG action space at this time is designed by adopting the following method:
preferentially converting the K-order matrix C into K multiplied by K-dimensional vectors according to rows
Figure BDA0003073436770000104
The vector C' and the boundary value bound form the motion space of the DDPG network
Figure BDA0003073436770000105
The DDPG state space is designed by adopting the following method:
converting the K-order separation matrix W into K multiplied by K dimension separation vector according to row priority
Figure BDA0003073436770000111
Test the mixed signal of the antenna for the ith signal as
Figure BDA0003073436770000112
From mixed signals according to preset data positions
Figure BDA0003073436770000113
The P data points are sampled, and the P data are formed into a data vector H'i,jVector H 'of data'i,jCopying K times and splicing to obtain data vector with length of KxP
Figure BDA0003073436770000114
The K data vectors
Figure BDA0003073436770000115
Splicing to obtain a vector with dimension of KxKxP
Figure BDA0003073436770000116
Noting the separation signal matrix of the current update step
Figure BDA0003073436770000117
Figure BDA0003073436770000118
Representing a mixed signal
Figure BDA0003073436770000119
Separating the signal matrix as a K × L mixed signal matrix formed by row vectors
Figure BDA00030734367700001110
As a result of the separation of the source signals of the jth radiation source
Figure BDA00030734367700001111
Separating the results at each source signal according to a predetermined data location
Figure BDA00030734367700001112
The P data points of the intermediate sample constitute a data vector G'jK data vectors G'jSplicing to obtain a vector with dimension of KxP
Figure BDA00030734367700001113
Defining parameters
Figure BDA00030734367700001114
Indicating whether the current iteration number is less than a preset threshold value E or not, if so, determining whether the current iteration number is less than the preset threshold value E
Figure BDA00030734367700001115
Otherwise
Figure BDA00030734367700001116
Separating the vector
Figure BDA00030734367700001117
The vector x _ state, the vector y _ state and the parameter on-good form a state space of the DDPG network
Figure BDA00030734367700001118
When the training is carried out again, the parameters of the DDPG network do not need to be updated. Figure 4 is a flow chart of the retraining of the DDPG network of the present invention. As shown in fig. 4, the specific steps of retraining the DDPG network in the present invention include:
s401: let the iteration number e equal to 1.
S402: initializing a separation matrix:
randomly initializing a separation matrix W and then calculating a separation signal matrix
Figure BDA00030734367700001119
Figure BDA00030734367700001120
Representing a mixed signal
Figure BDA00030734367700001121
Separating the signal matrix as a K × L mixed signal matrix formed by row vectors
Figure BDA00030734367700001122
As a result of the separation of the source signals of the jth radiation source
Figure BDA00030734367700001123
Separating the result according to the current source signal
Figure BDA00030734367700001124
The current state s is determined.
S403: and initializing the step number t in the iteration to be 1.
S404: judging whether the step number T in the iteration is less than T, wherein T represents the preset maximum step number in each iteration, if so, entering a step S405, otherwise, entering a step S407;
s405: the next state is generated:
the current strategy network obtains action a according to the current state s, adjusts the separation matrix W according to the action a, and recalculates the source signal separation result y 'of each radiation source'jThe next state s' is generated.
S406: returning to step S404 by setting S to S', t to t + 1;
s407: judging whether the iteration times E is less than E ', wherein E ' represents the preset retraining iteration times, and E ' is more than E, if so, entering step S408, otherwise, entering step S409 after retraining is finished;
s408: let e be e +1, return to step S402.
S409: obtaining a signal separation result:
determining a separation matrix from the last state
Figure BDA0003073436770000121
Then calculating a matrix of separated signals
Figure BDA0003073436770000122
Separating signal matrix
Figure BDA0003073436770000123
As a final source signal separation result of the jth radiation source
Figure BDA0003073436770000124
In order to better illustrate the technical effects of the present invention, a specific embodiment is adopted to perform simulation verification on the present invention.
In this embodiment, 3 radiation sources are provided, which are 1 mobile phone and 2 USRP (Universal Software Radio Peripheral) devices, respectively. An antenna in an AD9361 software radio platform is used as a signal testing antenna to collect modulation signals sent by a radiation source, wherein the sampling frequency range is 430-440MHz, and the sampling frequency is 20 MHz. Data samples are collected to train the DDPG network, and then the actual mixed signals are separated off line.
Fig. 5 is a comparison diagram of waveforms of the separated handset signal and the source signal in this embodiment. Fig. 6 is a comparison graph of waveforms of the USRP device signal and the source signal obtained by separation in the present embodiment. As shown in fig. 5 and fig. 6, the signal obtained by the separation of the present invention is very close to the source signal, the signal-to-interference ratio of the statistically separated signal reaches more than 30, and the correlation coefficient is more than 0.99, which can completely meet the requirements of engineering applications.
Although illustrative embodiments of the present invention have been described above to facilitate the understanding of the present invention by those skilled in the art, it should be understood that the present invention is not limited to the scope of the embodiments, and various changes may be made apparent to those skilled in the art as long as they are within the spirit and scope of the present invention as defined and defined by the appended claims, and all matters of the invention which utilize the inventive concepts are protected.

Claims (3)

1. A mixed radiation source signal separation method based on a DDPG network is characterized by comprising the following steps:
s1: the number of the positions provided with the radiation sources in the practical application environment is recorded asK, configuring a sample radiation source at each position to transmit a modulation signal with the length of L, and recording the modulation signal transmitted by the jth sample radiation source as a source signal FjJ ═ 1,2, …, K; configuring K signal testing antennas in an application environment, firstly enabling each sample radiation source to independently send a modulation signal, respectively acquiring a signal sent by the sample radiation source by each signal testing antenna to obtain a data sample, and recording a data sample acquired by the ith signal testing antenna on the jth sample radiation source as Di,jI ═ 1,2, …, K; then, K radiation sources simultaneously transmit modulation signals, each signal test antenna respectively collects the received mixed signals to obtain a mixed signal sample, and the mixed signal sample collected by the ith signal test antenna is recorded as Xi
S2: for the DDPG network, the DDPG action space is designed by adopting the following method:
setting a K-order matrix C, wherein each element of the K-order matrix C is subject to standard normal distribution, and preferentially converting the K-order matrix C into K multiplied by K-dimensional vectors according to rows
Figure FDA0003073436760000014
Wherein c iskDenotes the kth element in vector C', K ═ 1,2, …, K2Corresponds to the first in the matrix C
Figure FDA0003073436760000011
The K-th% K-column elements of the row,
Figure FDA0003073436760000012
represents rounding down,% represents complementation; then, a boundary value bound is defined, and the vector C' and the boundary value bound form the motion space of the DDPG network
Figure FDA0003073436760000015
The DDPG state space is designed by adopting the following method:
setting a K-order separation matrix W, and converting the K-order matrix W into K × K dimensions according to row prioritySeparate vector
Figure FDA0003073436760000016
Wherein wkRepresents the kth element of the vector W', corresponding to the kth element of the matrix W
Figure FDA0003073436760000013
Row K% K column elements;
recording the mixed signal with length L received by the ith signal test antenna as XiFrom mixed signals X according to preset data positionsiMiddle sampling P data points, and source signal FjThe ratio is calculated from the corresponding P data points, the P ratios form a ratio vector, and the K ratio vectors corresponding to the same mixed signal are spliced to obtain a ratio vector Hi,jDividing K ratio vectors Hi,jSplicing to obtain a vector x _ state with the dimension of K multiplied by P;
the separation signal matrix Y in the current update step is recorded as WX, X is represented by the mixed signal XiA mixed signal matrix of K × L formed as row vectors, and using the jth row vector of the separation signal matrix Y as the source signal separation result Y of the jth radiation sourcejSeparating the result y at each source signal according to the preset data positionjMiddle sampling P data points, and source signal FjThe ratio is calculated from the corresponding P data points, and the P ratios form a ratio vector GjDividing K ratio vectors GjSplicing to obtain a vector y _ state with the dimension of K multiplied by P;
defining a parameter on-coarse to indicate whether the current step reaches a preset target, if so, setting the on-coarse to 1, otherwise, setting the on-coarse to 0;
separating the vector
Figure FDA0003073436760000021
The vector x _ state, the vector y _ state and the parameter on-good form a state space of the DDPG network
Figure FDA0003073436760000022
The DDPG reward function is designed by adopting the following method:
for the K separate signals y obtained in the current stepjSeparately calculating signal-to-interference ratio SIRjThe calculation formula is as follows:
Figure FDA0003073436760000023
wherein | | | purple hair22 norm is obtained;
judging whether the current step reaches a preset target, i.e. whether each separation signal yjSignal to interference ratio SIR ofjAre all larger than the preset threshold value, if yes, the reward function is enabled
Figure FDA0003073436760000024
Delta represents a preset constant, otherwise the reward function
Figure FDA0003073436760000025
S3: constructing a DDPG network according to the action space and the state space designed in the step S2, wherein the DDPG network comprises a current strategy network, a current value network, a target strategy network and a target value network, and the method comprises the following steps:
the input information of the current strategy network is state s, and the output information is action a;
the input information of the current value network is state s and action a, and the output information is value Q;
target policy network: the input and output are the same as the current strategy network, and the current strategy network parameters are copied periodically;
target value network: inputting and outputting the current value network, and periodically copying the current value network parameters;
s4: the K mixed signal samples X obtained in the step S1 are samplediInputting a DDPG network, and training the DDPG network, wherein the method specifically comprises the following steps:
s4.1: randomly initializing parameters of four networks in the DDPG network;
s4.2: making the iteration number e equal to 1;
s4.3: random initialization splittingMatrix W, calculating separation signal matrix Y as WX, and using j row vector of separation signal matrix Y as source signal separation result Y of j radiation sourcejSeparating the result y from the current source signaljDetermining a current state s;
s4.4: initializing the step number t in the iteration to be 1;
s4.5: judging whether the step number T in the iteration is less than T, wherein T represents the preset maximum step number in each iteration, if so, entering a step S4.6, otherwise, entering a step S4.11;
s4.6: the current strategy network obtains action a according to the current state s, adjusts the separation matrix W according to the action a, and recalculates the source signal separation result y 'of each radiation source'jGenerating a next state s'; the current value network obtains a current value Q according to the current state s and the action a; extracting each source signal separation result y from the current state sjCalculating an award value r corresponding to the current state s; then putting the current state s, the action a, the reward value r and the next state s' into an experience pool as a group of experiences; if the experience pool is full when the experience is put in, deleting the earliest experience according to a first-in first-out principle, and then putting the current experience in;
s4.7: judging whether the experience pool is full, if so, entering a step S4.8, otherwise, entering a step S4.10;
s4.8: soft copying the parameters of the current strategy network to a target strategy network, and soft copying the parameters of the current value network to the target value network;
s4.9: for the current strategy network, updating the parameters of the current strategy network by using a gradient strategy by taking the value Q as a loss function;
for the current value network, calculating a loss function and updating the parameters of the current value network according to the loss function, wherein the calculation method of the loss function is as follows:
taking M groups of experiences from the experience pool, and taking the next state s 'in each group of experiences'mInputting a target strategy network to obtain next action a'mM is 1,2, …, M, then state s'mAnd motion a'mInputting a target value network to obtain a priceValue QmCalculating the target return value Z by using the following formulam
Zm=γQm+Rm
Where γ represents a discount factor, RmRepresenting the reward value in the mth set of experiences;
calculating the Loss function Loss of the current value network by adopting the following formula:
Figure FDA0003073436760000031
s4.10: let current state S ═ S', t ═ t +1, return to step S4.5;
s4.11: judging whether an iteration end condition is reached, if so, finishing training, and if not, entering the step S4.12;
s4.12: step S4.3 is returned to when e is equal to e + 1;
s5: in practical application, each signal test antenna obtains mixed signals of K radiation sources with the length of L
Figure FDA0003073436760000032
Mixing the K mixed signals
Figure FDA0003073436760000033
Inputting the DDPG network trained in the step S4 for retraining; the DDPG action space at this time is designed by adopting the following method:
preferentially converting the K-order matrix C into K multiplied by K-dimensional vectors according to rows
Figure FDA0003073436760000034
The vector C' and the boundary value bound form the motion space of the DDPG network
Figure FDA0003073436760000041
The DDPG state space is designed by adopting the following method:
converting the K-order separation matrix W into K x K-dimensional components according to row priorityVector of deviation
Figure FDA0003073436760000042
Testing a mixed signal of an antenna for an ith signal
Figure FDA0003073436760000043
From mixed signals according to preset data positions
Figure FDA0003073436760000044
The P data points are sampled, and the P data are formed into a data vector H'i,jVector H 'of data'i,jCopying K times and splicing to obtain data vector with length of KxP
Figure FDA0003073436760000045
The K data vectors
Figure FDA0003073436760000046
Splicing to obtain a vector with dimension of KxKxP
Figure FDA0003073436760000047
Noting the split signal matrix of the current update step
Figure FDA0003073436760000048
Figure FDA0003073436760000049
Representing a mixed signal
Figure FDA00030734367600000410
Separating the signal matrix as a K × L mixed signal matrix formed by row vectors
Figure FDA00030734367600000411
As a result of the separation of the source signals of the jth radiation source
Figure FDA00030734367600000412
Separating the results at each source signal according to a predetermined data location
Figure FDA00030734367600000413
The P data points of the intermediate sample constitute a data vector G'jK data vectors G'jSplicing to obtain a vector with dimension of KxP
Figure FDA00030734367600000414
Defining parameters
Figure FDA00030734367600000415
Indicating whether the current iteration number is less than a preset threshold value E or not, if so, determining whether the current iteration number is less than the preset threshold value E
Figure FDA00030734367600000416
Otherwise
Figure FDA00030734367600000417
Separating the vector
Figure FDA00030734367600000418
The vector x _ state, the vector y _ state and the parameter on-good form a state space of the DDPG network
Figure FDA00030734367600000419
The specific steps of retraining the DDPG network again comprise:
s5.1: making the iteration number e equal to 1;
s5.2: randomly initializing a separation matrix W and then calculating a separation signal matrix
Figure FDA00030734367600000420
Figure FDA00030734367600000421
Representing a mixed signal
Figure FDA00030734367600000422
Separating the signal matrix as a K × L mixed signal matrix formed by row vectors
Figure FDA00030734367600000423
As a result of the separation of the source signals of the jth radiation source
Figure FDA00030734367600000424
Separating the result according to the current source signal
Figure FDA00030734367600000425
Determining a current state s;
s5.3: initializing the step number t in the iteration to be 1;
s5.4: judging whether the step number T in the iteration is less than T, wherein T represents the preset maximum step number in each iteration, if yes, entering the step S5.5, and otherwise, entering the step S5.7;
s5.5: the current strategy network obtains action a according to the current state s, adjusts the separation matrix W according to the action a, and recalculates the source signal separation result y 'of each radiation source'jGenerating a next state s';
s5.6: making S ═ S', t ═ t +1 in the current state, and returning to step S5.4;
s5.7: judging whether the iteration times E is less than E ', wherein E ' represents the preset iteration times of retraining, E ' is more than E, if so, entering a step S5.8, otherwise, entering a step S5.9 after retraining is finished;
s5.8: step S5.2 is returned to when e is equal to e + 1;
s5.9: determining a separation matrix from the last state
Figure FDA0003073436760000051
Then calculating a matrix of separated signals
Figure FDA0003073436760000052
Separating signal matrix
Figure FDA0003073436760000053
As a final source signal separation result of the jth radiation source
Figure FDA0003073436760000054
2. The hybrid radiation source signal separation method of claim 1, wherein the modulation signal transmitted by each sample radiation source in step S1 is required to satisfy the following condition: after the modulation signal is converted into IQ two-path data by signal samples acquired by each signal testing antenna, the modulus value of each data point in the IQ two-path data is larger than a preset threshold value.
3. The method for separating mixed radiation source signals according to claim 1, wherein when obtaining the radiation source signal samples in step S1, the radiation source is moved several times with small amplitude to obtain different transmission scenarios, for each transmission scenario, several data samples of the ith signal testing antenna to the jth sample radiation source and mixed signal samples under the transmission scenario are obtained, and K × K data samples D under each transmission scenario are obtainedi,jForming a group of radiation source signal samples by the K mixed signal samples, and forming a radiation source signal sample set by the radiation source signal samples of all transmitting scenes;
in said step S4.3, a set of radiation source signal samples is arbitrarily selected from the set of radiation source signal samples to calculate a separation signal matrix and to perform the subsequent operations.
CN202110545722.5A 2021-05-19 2021-05-19 DDPG network-based mixed radiation source signal separation method Active CN113364712B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110545722.5A CN113364712B (en) 2021-05-19 2021-05-19 DDPG network-based mixed radiation source signal separation method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110545722.5A CN113364712B (en) 2021-05-19 2021-05-19 DDPG network-based mixed radiation source signal separation method

Publications (2)

Publication Number Publication Date
CN113364712A CN113364712A (en) 2021-09-07
CN113364712B true CN113364712B (en) 2022-06-14

Family

ID=77526547

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110545722.5A Active CN113364712B (en) 2021-05-19 2021-05-19 DDPG network-based mixed radiation source signal separation method

Country Status (1)

Country Link
CN (1) CN113364712B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104392252A (en) * 2014-12-08 2015-03-04 中国铁路总公司 Method and device of identifying radiation sources
CN109212476A (en) * 2018-09-18 2019-01-15 广西大学 A kind of RFID indoor positioning algorithms based on DDPG
CN109358318A (en) * 2018-11-20 2019-02-19 南京理工大学 A kind of method that external illuminators-based radar blind source separating extracts target echo and direct wave
TWI651927B (en) * 2018-02-14 2019-02-21 National Central University Signal source separation method and signal source separation device
CN109548044A (en) * 2018-11-02 2019-03-29 电子科技大学 A kind of energy based on DDPG collects the bit rate optimization algorithm of communication
WO2020166997A1 (en) * 2019-02-13 2020-08-20 Samsung Electronics Co., Ltd. Improvements in and relating to random access in a telecommunication system
CN112668235A (en) * 2020-12-07 2021-04-16 中原工学院 Robot control method of DDPG algorithm based on offline model pre-training learning

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104392252A (en) * 2014-12-08 2015-03-04 中国铁路总公司 Method and device of identifying radiation sources
TWI651927B (en) * 2018-02-14 2019-02-21 National Central University Signal source separation method and signal source separation device
CN109212476A (en) * 2018-09-18 2019-01-15 广西大学 A kind of RFID indoor positioning algorithms based on DDPG
CN109548044A (en) * 2018-11-02 2019-03-29 电子科技大学 A kind of energy based on DDPG collects the bit rate optimization algorithm of communication
CN109358318A (en) * 2018-11-20 2019-02-19 南京理工大学 A kind of method that external illuminators-based radar blind source separating extracts target echo and direct wave
WO2020166997A1 (en) * 2019-02-13 2020-08-20 Samsung Electronics Co., Ltd. Improvements in and relating to random access in a telecommunication system
CN112668235A (en) * 2020-12-07 2021-04-16 中原工学院 Robot control method of DDPG algorithm based on offline model pre-training learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
An Applicable Scheme Employing Bispectrum and Convolutional Neural Network for Individual RF Fingerprint Identification;ZHANG Yi-Ru;《中国科技论文在线》;20210330;全文 *

Also Published As

Publication number Publication date
CN113364712A (en) 2021-09-07

Similar Documents

Publication Publication Date Title
CN107171831B (en) Network deployment method and device
TWI607642B (en) Method of processing a plurality of signals and signal processing device
CN110300075B (en) Wireless channel estimation method
Levie et al. Pathloss prediction using deep learning with applications to cellular optimization and efficient D2D link scheduling
CN112511241B (en) Composite fading channel random number generation method based on lognormal distribution approximation
CN113115344B (en) Unmanned aerial vehicle base station communication resource allocation strategy prediction method based on noise optimization
CN114223270B (en) Training method and device for antenna signal processing model, antenna and storage medium
CN101982953A (en) Frequency domain multi-dimensional parameterized model of broadband wireless communication channel and modeling method
CN113364712B (en) DDPG network-based mixed radiation source signal separation method
CN115499074A (en) Terahertz scattering parameter prediction method and device based on neural network
CN111007457B (en) Radiation source direct positioning method based on block sparse Bayesian model
CN109444571B (en) Moonlet communication load electromagnetic compatibility prediction method
CN105187139B (en) A kind of outdoor radio signal reception strength map constructing method based on intelligent perception
CN106371078A (en) Emission waveform and position joint estimation based passive positioning method for multiple emission sources
CN109581291B (en) Direct positioning method based on artificial bee colony
CN114567386B (en) High-precision channel group delay characteristic fitting and simulation implementation method, system, storage medium and communication system
CN115882985A (en) Low-orbit satellite channel prediction method and system based on Gaussian process regression
CN115942231A (en) RSS-based 5G outdoor positioning method
Nagao et al. Fine-tuning for propagation modeling of different frequencies with few data
CN113901949A (en) Communication scene recognition method and device, electronic equipment and storage medium
CN113052312A (en) Deep reinforcement learning model training method and device, medium and electronic equipment
CN116405880B (en) Radio map construction method and system based on federal learning
CN112398543A (en) Method, apparatus, system, device and computer readable medium for optical communication
US20060167671A1 (en) Device and method of calibration for a modelling tool and modelling tool incorporating such a device
CN113810138B (en) Multipath channel modeling method for dynamic on-body channel in wireless body area network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant