CN113364712B - DDPG network-based mixed radiation source signal separation method - Google Patents
DDPG network-based mixed radiation source signal separation method Download PDFInfo
- Publication number
- CN113364712B CN113364712B CN202110545722.5A CN202110545722A CN113364712B CN 113364712 B CN113364712 B CN 113364712B CN 202110545722 A CN202110545722 A CN 202110545722A CN 113364712 B CN113364712 B CN 113364712B
- Authority
- CN
- China
- Prior art keywords
- signal
- network
- current
- matrix
- separation
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L25/00—Baseband systems
- H04L25/02—Details ; arrangements for supplying electrical power along data transmission lines
- H04L25/03—Shaping networks in transmitter or receiver, e.g. adaptive shaping networks
- H04L25/03006—Arrangements for removing intersymbol interference
Landscapes
- Engineering & Computer Science (AREA)
- Power Engineering (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Variable-Direction Aerials And Aerial Arrays (AREA)
Abstract
The invention discloses a mixed radiation source signal separation method based on a DDPG network, which comprises the steps of firstly adopting K signal test antennas to collect radiation source signal samples of K sample radiation sources, processing the radiation source signal samples to obtain mixed signal samples, regarding a separation matrix as an intelligent body, regarding addition and subtraction of matrix elements as actions, regarding the separation degree of signals as an environment, designing the DDPG network, then adopting the mixed signal samples to train the DDPG network, obtaining mixed signals of the K radiation sources by each signal test antenna during actual application, inputting the mixed signals into the trained DDPG network for retraining, and obtaining a signal separation result. The invention effectively improves the accuracy of the mixed signal separation by introducing the DDPG network.
Description
Technical Field
The invention belongs to the technical field of signal separation, and particularly relates to a mixed radiation source signal separation method based on a DDPG network.
Background
Accurately and efficiently acquiring a desired signal from a mixed signal is an important research subject in the field of communications, and determines the reception capability of a communication system. The blind signal separation refers to separating signals under the condition that a source signal and a channel are unknown or partially known, is a hot spot in the field of modern signal processing in recent years, and has applications in aspects of wireless communication, voice recognition, biomedicine, mechanical engineering and the like. For wireless communications, blind signal separation is of great significance in the areas of cooperative and non-cooperative communications. In the cooperative communication field, the interference between signals in the mimo communication system and the satellite communication system can be suppressed and separated by blind separation. In the non-cooperative communication field and modern information battles, signals need to be accurately separated from mixed information of own enemies, thus being beneficial to detecting enemies as early as possible, and carrying out correct judgment on enemy equipment and taking corresponding actions. Blind signal analysis is correspondingly faced with greater difficulty in the field of communications, and separation methods in other fields are not necessarily well suited due to signal similarity and complexity.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provide a mixed radiation source signal separation method based on a DDPG network, and the DDPG network is introduced to effectively improve the accuracy of mixed signal separation.
In order to achieve the above purpose, the mixed radiation source signal separation method based on the DDPG network of the present invention includes the following steps:
s1: recording the number of positions where radiation sources are arranged in an actual application environment as K, configuring a sample radiation source at each position to transmit a modulation signal with the length of L, and recording the modulation signal transmitted by the jth sample radiation source as a source signal FjJ ═ 1,2, …, K; configuring K signal testing antennas in an application environment, firstly enabling each sample radiation source to independently send a modulation signal, respectively acquiring a signal sent by the sample radiation source by each signal testing antenna to obtain a data sample, and recording a data sample acquired by the ith signal testing antenna on the jth sample radiation source as Di,jI ═ 1,2, …, K; then, K radiation sources simultaneously transmit modulation signals, each signal test antenna respectively collects the received mixed signals to obtain a mixed signal sample, and the mixed signal sample collected by the ith signal test antenna is recorded as Xi;
S2: for the DDPG network, the DDPG action space is designed by adopting the following method:
setting a K-order matrix C, wherein each element of the K-order matrix C is subject to standard normal distribution, and preferentially converting the K-order matrix C into K multiplied by K-dimensional vectors according to rowsWherein c iskDenotes the kth element in vector C', K ═ 1,2, …, K2Corresponds to the first in the matrix CThe K-th% K-column elements of the row,represents rounding down,% represents complementation; then, a boundary value bound is defined, and the vector C' and the boundary value bound form the motion space of the DDPG network
The DDPG state space is designed by adopting the following method:
setting a K-order separation matrix W, and converting the K-order matrix W into K multiplied by K dimension separation vectors according to row priorityWherein wkRepresents the kth element of the vector W', corresponding to the kth element of the matrix WRow K% K column elements;
recording the mixed signal with length L received by the ith signal test antenna as XiFrom mixed signals X according to preset data positionsiMiddle sampling P data points, and source signal FjThe ratio is calculated from the corresponding P data points, the P ratios form a ratio vector, and the K ratio vectors corresponding to the same mixed signal are spliced to obtain a ratio vector Hi,jDividing K ratio vectors Hi,jSplicing to obtain a vector x _ state with the dimension of K multiplied by P;
the separation signal matrix Y in the current update step is denoted by the mixed signal XiA mixed signal matrix of K × L formed as row vectors, and using the jth row vector of the separation signal matrix Y as the source signal separation result Y of the jth radiation sourcejSeparating the result y at each source signal according to the preset data positionjMiddle sampling P data points, and source signal FjThe ratio is calculated from the corresponding P data points, and the P ratios form a ratio vector GjDividing K ratio vectors GjSplicing to obtain a vector y _ state with the dimension of K multiplied by P;
defining a parameter on-coarse to indicate whether the current step reaches a preset target, if so, setting the on-coarse to 1, otherwise, setting the on-coarse to 0;
separating the vectorThe vector x _ state, the vector y _ state and the parameter on-good form a state space of the DDPG network
The DDPG reward function is designed by adopting the following method:
for the K separation signals y obtained in the current stepjSeparately calculating signal-to-interference ratio SIRjThe calculation formula is as follows:
wherein | | | purple hair22 norm is obtained;
judging whether the current step reaches a preset target, i.e. whether each separation signal yjSignal-to-interference ratio (SIR) ofjAre all larger than the preset threshold value, if yes, the reward function is enabledDelta represents a preset constant, otherwise the reward function
S3: constructing a DDPG network according to the action space and the state space designed in the step S2, wherein the DDPG network comprises a current strategy network, a current value network, a target strategy network and a target value network, and the method comprises the following steps:
the input information of the current strategy network is state s, and the output information is action a;
the input information of the current value network is state s and action a, and the output information is value Q;
target policy network: the input and output are the same as the current strategy network, and the current strategy network parameters are copied periodically;
target value network: inputting and outputting the current value network, and periodically copying the current value network parameters;
s4: the K mixed signal samples X obtained in the step S1 are samplediInputting a DDPG network, and training the DDPG network, wherein the method specifically comprises the following steps:
s4.1: randomly initializing parameters of four networks in the DDPG network;
s4.2: making the iteration number e equal to 1;
s4.3: randomly initializing a separation matrix W, then calculating a separation signal matrix Y as WX, and taking the jth row vector of the separation signal matrix Y as a source signal separation result Y of the jth radiation sourcejSeparating the result y from the current source signaljDetermining a current state s;
s4.4: initializing the step number t in the iteration to be 1;
s4.5: judging whether the step number T in the iteration is less than T, wherein T represents the preset maximum step number in each iteration, if so, entering a step S4.6, otherwise, entering a step S4.11;
s4.6: the current strategy network obtains action a according to the current state s, adjusts the separation matrix W according to the action a, and recalculates the source signal separation result y 'of each radiation source'jGenerating a next state s'; the current value network obtains a current value Q according to the current state s and the action a; extracting each source signal separation result y from the current state sjCalculating an award value r corresponding to the current state s; then putting the current state s, the action a, the reward value r and the next state s' into an experience pool as a group of experiences; if the experience pool is full when the experience is put in, deleting the earliest experience according to a first-in first-out principle, and then putting the current experience in;
s4.7: judging whether the experience pool is full, if so, entering a step S4.8, otherwise, entering a step S4.10;
s4.8: soft copying the parameters of the current strategy network to a target strategy network, and soft copying the parameters of the current value network to the target value network;
s4.9: for the current strategy network, updating the parameters of the current strategy network by using a gradient strategy by taking the value Q as a loss function;
for the current value network, calculating a loss function and updating the parameters of the current value network according to the loss function, wherein the calculation method of the loss function is as follows:
taking M groups of experiences from the experience pool, and taking the next state s 'in each group of experiences'mInputting a target strategy network to obtain next action a'mM is 1,2, …, M, then state s'mAnd motion a'mInputting a target value network to obtain a value QmCalculating the target return value Z by using the following formulam:
Zm=γQm+Rm
Where γ represents a discount factor, RmRepresenting the reward value in the mth set of experiences;
calculating the Loss function Loss of the current value network by adopting the following formula:
s4.10: making S ═ S', t ═ t +1 in the current state, and returning to step S4.5;
s4.11: judging whether an iteration end condition is reached, if so, finishing training, and if not, entering the step S4.12;
s4.12: step S4.3 is returned to when e is equal to e + 1;
s5: in practical application, each signal test antenna obtains mixed signals of K radiation sources with the length of LMixing the K mixed signalsInputting step S4 trained DDPG network for further processingPerforming secondary training; the DDPG action space at this time is designed by adopting the following method:
preferentially converting the K-order matrix C into K multiplied by K-dimensional vectors according to rowsThe vector C' and the boundary value bound form the motion space of the DDPG network
The DDPG state space is designed by adopting the following method:
converting the K-order separation matrix W into K multiplied by K dimension separation vector according to row priority
Testing a mixed signal of an antenna for an ith signalFrom mixed signals according to preset data positionsThe P data points are sampled, and the P data are formed into a data vector H'i,jVector H 'of data'i,jCopying K times and splicing to obtain data vector with length of KxPK data vectorsSplicing to obtain a vector with dimension K multiplied by P
Noting the separation signal matrix of the current update step Representing a mixed signalSeparating the signal matrix as a K × L mixed signal matrix formed by row vectorsAs a result of the separation of the source signals of the jth radiation sourceSeparating the results at each source signal according to a predetermined data locationThe P data points of the intermediate sample constitute a data vector G'jK data vectors G'jSplicing to obtain a vector with dimension of KxP
Defining parametersIndicating whether the current iteration number is less than a preset threshold value E or not, if so, determining whether the current iteration number is less than the preset threshold value EOtherwise
Separating the vectorThe vector x _ state, the vector y _ state and the parameter on-good form a state space of the DDPG network
The specific steps of retraining the DDPG network comprise:
s5.1: making the iteration number e equal to 1;
s5.2: randomly initializing a separation matrix W and then calculating a separation signal matrix Representing a mixed signalSeparating the signal matrix as a K × L mixed signal matrix formed by row vectorsAs a result of the separation of the source signals of the jth radiation sourceSeparating the result according to the current source signalDetermining a current state s;
s5.3: initializing the step number t in the iteration to be 1;
s5.4: judging whether the step number T in the iteration is less than T, wherein T represents the preset maximum step number in each iteration, if so, entering step S5.5, otherwise, entering step S5.7;
s5.5: the current strategy network obtains action a according to the current state s, adjusts the separation matrix W according to the action a, and recalculates the source signal separation result y 'of each radiation source'jGenerating a next state s';
s5.6: making S ═ S', t ═ t +1 in the current state, and returning to step S5.4;
s5.7: judging whether the iteration times E is less than E ', wherein E ' represents the preset iteration times of retraining, E ' is more than E, if so, entering a step S5.8, otherwise, entering a step S5.9 after retraining is finished;
s5.8: step S5.2 is returned to when e is equal to e + 1;
s5.9: determining a separation matrix from the last stateThen calculating a matrix of separated signalsSeparating signal matrixAs a final source signal separation result of the jth radiation source
The invention relates to a mixed radiation source signal separation method based on a DDPG network, which comprises the steps of firstly adopting K signal test antennas to collect radiation source signal samples of K sample radiation sources, processing the radiation source signal samples to obtain mixed signal samples, regarding a separation matrix as an intelligent body, regarding addition and subtraction of matrix elements as actions, regarding the separation degree of signals as an environment, designing the DDPG network, then adopting the mixed signal samples to train the DDPG network, obtaining mixed signals of the K radiation sources by each signal test antenna during actual application, inputting the mixed signals into the trained DDPG network for retraining, and obtaining signal separation results.
The invention has the following beneficial effects:
1) the invention can realize signal separation by the prior knowledge of the sample radiation source under the condition of unknown mixed channel;
2) the invention adopts the DDPG network to interact the separated action and the signal environment, more accords with the actual separation scene and improves the signal separation effect.
Drawings
FIG. 1 is a block diagram of an embodiment of a DDPG network-based mixed radiation source signal separation method of the present invention;
FIG. 2 is a flow chart of DDPG network training in the present invention;
FIG. 3 is a diagram illustrating updating network parameters in the present embodiment;
FIG. 4 is a flow chart of the retraining of the DDPG network in the present invention;
fig. 5 is a comparison diagram of waveforms of the separated handset signal and the source signal in the present embodiment;
fig. 6 is a comparison graph of waveforms of the USRP device signal and the source signal obtained by separation in the present embodiment.
Detailed Description
The following description of the embodiments of the present invention is provided in order to better understand the present invention for those skilled in the art with reference to the accompanying drawings. It is to be expressly noted that in the following description, a detailed description of known functions and designs will be omitted when it may obscure the subject matter of the present invention.
Examples
Fig. 1 is a block diagram of an embodiment of a mixed radiation source signal separation method based on a DDPG network. As shown in fig. 1, the method for separating mixed radiation source signals based on the DDPG network of the present invention specifically includes the steps of:
s101: acquiring a radiation source signal sample:
recording the number of positions where radiation sources are arranged in an actual application environment as K, configuring a sample radiation source at each position to transmit a modulation signal with the length of L, and recording the modulation signal transmitted by the jth sample radiation source as a source signal FjJ is 1,2, …, K. Configuring K signal testing antennas in an application environment, firstly enabling each sample radiation source to independently send a modulation signal, respectively acquiring a signal sent by the sample radiation source by each signal testing antenna to obtain a data sample, and recording a data sample acquired by the ith signal testing antenna on the jth sample radiation source as Di,jI is 1,2, …, K. Then, K radiation sources simultaneously transmit modulation signals, each signal test antenna respectively collects the received mixed signals to obtain a mixed signal sample, and the mixed signal sample collected by the ith signal test antenna is recorded as Xi。
In this embodiment, in order to enable the acquired signal samples to more accurately represent channel characteristics and enable the subsequently obtained separation matrix to be more accurate, the modulation signal sent by each sample radiation source needs to satisfy the following conditions: after the modulation signal is converted into IQ two-path data by signal samples acquired by each signal testing antenna, the modulus value of each data point in the IQ two-path data is larger than a preset threshold value.
In addition, in order to make the trained DDPG network adapt to small amplitude position variation of the radiation source, when obtaining the radiation source signal sample in step S101, the radiation source may be moved several times with small amplitude to obtain different transmission scenarios, for each transmission scenario, several data samples of the ith signal testing antenna to the jth sample radiation source and mixed signal samples in the transmission scenario may be obtained, and K × K data samples D in each transmission scenario may be obtainedi,jAnd K mixed signal samples XiI.e. a set of radiation source signal samples is formed, and the radiation source signal samples of all transmitted scenes are formed into a radiation source signal sample set.
S102: designing a DDPG network:
in the invention, the separation matrix is regarded as an agent, the addition and subtraction of matrix elements are regarded as actions, the separation degree of signals is regarded as an environment, and the ideal signal separation is realized through the interaction between the agent and the environment and the feedback of the environment to the agent. Based on the thought, the DDPG network is designed, and the specific method comprises the following steps:
1) designing DDPG action space
Setting a K-order matrix C, wherein each element of the K-order matrix C is subject to standard normal distribution, and preferentially converting the K-order matrix C into K multiplied by K-dimensional vectors according to rowsWherein c iskDenotes the kth element in vector C', K ═ 1,2, …, K2Corresponds to the first in the matrix CThe K-th% K-column elements of the row,meaning rounded down,% meaning complementation. Then, a boundary value bound is defined, and the vector C' and the boundary value bound form the motion space of the DDPG network
2) Designing DDPG state spaces
Setting a K-order separation matrix W, and converting the K-order matrix W into K multiplied by K dimension separation vectors according to row priorityWherein wkRepresents the kth element of the vector W', corresponding to the kth element of the matrix WRow K% K column elements.
Recording the mixed signal with length L received by the ith signal test antenna as XiFrom the mixed signal X according to predetermined data positionsiMiddle sampling P data points, and source signal FjThe ratio is calculated from the corresponding P data points, the P ratios form a ratio vector, and the K ratio vectors corresponding to the same mixed signal are spliced to obtain a ratio vector Hi,jDividing K ratio vectors Hi,jAnd splicing to obtain a vector x _ state with the dimension of K multiplied by P.
The separation signal matrix Y in the current update step is recorded as WX, X is represented by the mixed signal XiA mixed signal matrix of K × L formed as row vectors, and using the jth row vector of the separation signal matrix Y as the source signal separation result Y of the jth radiation sourcejSeparating the result y at each source signal according to the preset data positionjMiddle sampling P data points, and source signal FjThe ratio is calculated from the corresponding P data points, and the P ratios form a ratio vector GjDividing K ratio vectors GjAnd splicing to obtain a vector y _ state with the dimension of K multiplied by P.
And defining a parameter on-coarse to indicate whether the current step reaches a preset target, wherein if yes, the on-coarse is 1, and otherwise, the on-coarse is 0.
Separating the vectorThe vector x _ state, the vector y _ state and the parameter on-real form the state space of the DDPG network
3) Designing DDPG reward functions
For the K separation signals y obtained in the current stepjSeparately calculating signal-to-interference ratio SIRjThe calculation formula is as follows:
wherein | | | calving2Which means 2 norm is found.
Judging whether the current step reaches a preset target, i.e. whether each separation signal yjSignal-to-interference ratio (SIR) ofjAre all larger than the preset threshold value, if yes, the reward function is enabledDelta represents a preset constant, otherwise the reward function
In this embodiment, the threshold value of the signal to interference ratio is 30, and the value of Δ is 100. By adopting the mode, the reward function can represent whether the current step reaches the preset target or not and can measure the separation degree of the signals.
S103: constructing a DDPG network:
constructing a DDPG network according to the action space and the state space designed in the step S102, wherein the DDPG network comprises a current strategy network, a current value network, a target strategy network and a target value network, and the DDPG network comprises:
the input information of the current strategy network is state s, and the output information is action a;
the input information of the current value network is state s and action a, and the output information is value Q;
target policy network: the input and output are the same as the current strategy network, and the current strategy network parameters are copied periodically;
target value network: and the input and output are the same as the current value network, and the current value network parameters are periodically copied.
S104: training the DDPG network:
the K mixed signal samples X obtained in the step S101 are samplediInputting the DDPG network, and training the DDPG network.
FIG. 2 is a flow chart of DDPG network training in the present invention. As shown in fig. 2, the DDPG network training of the present invention specifically comprises:
s201: initializing the network:
parameters of four networks in the DDPG network are initialized randomly.
S202: initializing iteration parameters:
let the iteration number e equal to 1.
S203: initializing a separation matrix:
randomly initializing a separation matrix W, and calculating a separation signal matrix Y ═ WX, X representing a signal mixture XiA mixed signal matrix of K × L formed as row vectors, and using the jth row vector of the separation signal matrix Y as the source signal separation result Y of the jth radiation sourcejSeparating the result y from the current source signaljThe current state s is determined.
When a data sample set is acquired in step S101, a separate signal matrix is calculated and a subsequent operation is performed in step S203 by arbitrarily selecting a group of radiation source signal samples from the radiation source signal sample set.
S204: and initializing the step number t in the iteration to be 1.
S205: and judging whether the step number T in the current iteration is less than T, wherein T represents the preset maximum step number in each iteration, in the embodiment, T is 500, if so, the step S206 is executed, and if not, the step S211 is executed.
S206: generating new experience:
the current policy network is based on the current stateObtaining action a from state s, adjusting the separation matrix W according to the action a, and recalculating the source signal separation result y 'of each radiation source'jThe next state s' is generated. And the current value network obtains the current value Q according to the current state s and the action a. Extracting each source signal separation result y from the current state sjAnd calculating the reward value r corresponding to the current state s. The current state s, action a, reward value r, and next state s' are then placed into the experience pool as a set of experiences. If the experience pool is full when the experience is put in, the earliest experience is deleted according to the first-in first-out principle, and then the current experience is put in.
S207: and judging whether the experience pool is full, if so, entering step S208, and otherwise, entering step S210.
S208: copying network parameters:
and soft copying the parameters of the current strategy network to a target strategy network, and soft copying the parameters of the current value network to the target value network.
S209: and (3) updating network parameters:
fig. 3 is a schematic diagram of updating network parameters in the present embodiment. As shown in fig. 3, for the current policy network, the value Q is used as a loss function to update the parameters of the current policy network by using a gradient policy.
For the current value network, calculating a loss function and updating the parameters of the current value network according to the loss function, wherein the calculation method of the loss function is as follows:
taking M groups of experiences from the experience pool, and taking the next state s 'in each group of experiences'mInputting a target strategy network to obtain next action a'mM is 1,2, …, M, then state s'mAnd action a'mInputting a target value network to obtain a value QmCalculating the target return value Z by using the following formulam:
Zm=γQm+Rm
Where γ represents a discount factor, RmRepresenting the prize values in the mth set of experiences.
Calculating the Loss function Loss of the current value network by adopting the following formula:
s210: let S ═ S', t ═ t +1, the process returns to step S205.
S211: and judging whether an iteration end condition is reached, if so, finishing training, and if not, entering the step S212. The iteration ending conditions of the DDPG network training generally include two conditions, one is that the iteration number reaches a preset threshold value, which is designed to be 10000 in this embodiment, and the other is that the reward value reaches a preset threshold value, which is set as required.
S212: let e be e +1, return to step S203.
S105: mixed signal separation:
in practical application, each signal test antenna obtains mixed signals of K radiation sources with the length of LMixing the K mixed signalsAnd inputting the DDPG network trained in the step S104 for retraining. The DDPG action space at this time is designed by adopting the following method:
preferentially converting the K-order matrix C into K multiplied by K-dimensional vectors according to rowsThe vector C' and the boundary value bound form the motion space of the DDPG network
The DDPG state space is designed by adopting the following method:
converting the K-order separation matrix W into K multiplied by K dimension separation vector according to row priority
Test the mixed signal of the antenna for the ith signal asFrom mixed signals according to preset data positionsThe P data points are sampled, and the P data are formed into a data vector H'i,jVector H 'of data'i,jCopying K times and splicing to obtain data vector with length of KxPThe K data vectorsSplicing to obtain a vector with dimension of KxKxP
Noting the separation signal matrix of the current update step Representing a mixed signalSeparating the signal matrix as a K × L mixed signal matrix formed by row vectorsAs a result of the separation of the source signals of the jth radiation sourceSeparating the results at each source signal according to a predetermined data locationThe P data points of the intermediate sample constitute a data vector G'jK data vectors G'jSplicing to obtain a vector with dimension of KxP
Defining parametersIndicating whether the current iteration number is less than a preset threshold value E or not, if so, determining whether the current iteration number is less than the preset threshold value EOtherwise
Separating the vectorThe vector x _ state, the vector y _ state and the parameter on-good form a state space of the DDPG network
When the training is carried out again, the parameters of the DDPG network do not need to be updated. Figure 4 is a flow chart of the retraining of the DDPG network of the present invention. As shown in fig. 4, the specific steps of retraining the DDPG network in the present invention include:
s401: let the iteration number e equal to 1.
S402: initializing a separation matrix:
randomly initializing a separation matrix W and then calculating a separation signal matrix Representing a mixed signalSeparating the signal matrix as a K × L mixed signal matrix formed by row vectorsAs a result of the separation of the source signals of the jth radiation sourceSeparating the result according to the current source signalThe current state s is determined.
S403: and initializing the step number t in the iteration to be 1.
S404: judging whether the step number T in the iteration is less than T, wherein T represents the preset maximum step number in each iteration, if so, entering a step S405, otherwise, entering a step S407;
s405: the next state is generated:
the current strategy network obtains action a according to the current state s, adjusts the separation matrix W according to the action a, and recalculates the source signal separation result y 'of each radiation source'jThe next state s' is generated.
S406: returning to step S404 by setting S to S', t to t + 1;
s407: judging whether the iteration times E is less than E ', wherein E ' represents the preset retraining iteration times, and E ' is more than E, if so, entering step S408, otherwise, entering step S409 after retraining is finished;
s408: let e be e +1, return to step S402.
S409: obtaining a signal separation result:
determining a separation matrix from the last stateThen calculating a matrix of separated signalsSeparating signal matrixAs a final source signal separation result of the jth radiation source
In order to better illustrate the technical effects of the present invention, a specific embodiment is adopted to perform simulation verification on the present invention.
In this embodiment, 3 radiation sources are provided, which are 1 mobile phone and 2 USRP (Universal Software Radio Peripheral) devices, respectively. An antenna in an AD9361 software radio platform is used as a signal testing antenna to collect modulation signals sent by a radiation source, wherein the sampling frequency range is 430-440MHz, and the sampling frequency is 20 MHz. Data samples are collected to train the DDPG network, and then the actual mixed signals are separated off line.
Fig. 5 is a comparison diagram of waveforms of the separated handset signal and the source signal in this embodiment. Fig. 6 is a comparison graph of waveforms of the USRP device signal and the source signal obtained by separation in the present embodiment. As shown in fig. 5 and fig. 6, the signal obtained by the separation of the present invention is very close to the source signal, the signal-to-interference ratio of the statistically separated signal reaches more than 30, and the correlation coefficient is more than 0.99, which can completely meet the requirements of engineering applications.
Although illustrative embodiments of the present invention have been described above to facilitate the understanding of the present invention by those skilled in the art, it should be understood that the present invention is not limited to the scope of the embodiments, and various changes may be made apparent to those skilled in the art as long as they are within the spirit and scope of the present invention as defined and defined by the appended claims, and all matters of the invention which utilize the inventive concepts are protected.
Claims (3)
1. A mixed radiation source signal separation method based on a DDPG network is characterized by comprising the following steps:
s1: the number of the positions provided with the radiation sources in the practical application environment is recorded asK, configuring a sample radiation source at each position to transmit a modulation signal with the length of L, and recording the modulation signal transmitted by the jth sample radiation source as a source signal FjJ ═ 1,2, …, K; configuring K signal testing antennas in an application environment, firstly enabling each sample radiation source to independently send a modulation signal, respectively acquiring a signal sent by the sample radiation source by each signal testing antenna to obtain a data sample, and recording a data sample acquired by the ith signal testing antenna on the jth sample radiation source as Di,jI ═ 1,2, …, K; then, K radiation sources simultaneously transmit modulation signals, each signal test antenna respectively collects the received mixed signals to obtain a mixed signal sample, and the mixed signal sample collected by the ith signal test antenna is recorded as Xi;
S2: for the DDPG network, the DDPG action space is designed by adopting the following method:
setting a K-order matrix C, wherein each element of the K-order matrix C is subject to standard normal distribution, and preferentially converting the K-order matrix C into K multiplied by K-dimensional vectors according to rowsWherein c iskDenotes the kth element in vector C', K ═ 1,2, …, K2Corresponds to the first in the matrix CThe K-th% K-column elements of the row,represents rounding down,% represents complementation; then, a boundary value bound is defined, and the vector C' and the boundary value bound form the motion space of the DDPG network
The DDPG state space is designed by adopting the following method:
setting a K-order separation matrix W, and converting the K-order matrix W into K × K dimensions according to row prioritySeparate vectorWherein wkRepresents the kth element of the vector W', corresponding to the kth element of the matrix WRow K% K column elements;
recording the mixed signal with length L received by the ith signal test antenna as XiFrom mixed signals X according to preset data positionsiMiddle sampling P data points, and source signal FjThe ratio is calculated from the corresponding P data points, the P ratios form a ratio vector, and the K ratio vectors corresponding to the same mixed signal are spliced to obtain a ratio vector Hi,jDividing K ratio vectors Hi,jSplicing to obtain a vector x _ state with the dimension of K multiplied by P;
the separation signal matrix Y in the current update step is recorded as WX, X is represented by the mixed signal XiA mixed signal matrix of K × L formed as row vectors, and using the jth row vector of the separation signal matrix Y as the source signal separation result Y of the jth radiation sourcejSeparating the result y at each source signal according to the preset data positionjMiddle sampling P data points, and source signal FjThe ratio is calculated from the corresponding P data points, and the P ratios form a ratio vector GjDividing K ratio vectors GjSplicing to obtain a vector y _ state with the dimension of K multiplied by P;
defining a parameter on-coarse to indicate whether the current step reaches a preset target, if so, setting the on-coarse to 1, otherwise, setting the on-coarse to 0;
separating the vectorThe vector x _ state, the vector y _ state and the parameter on-good form a state space of the DDPG network
The DDPG reward function is designed by adopting the following method:
for the K separate signals y obtained in the current stepjSeparately calculating signal-to-interference ratio SIRjThe calculation formula is as follows:
wherein | | | purple hair22 norm is obtained;
judging whether the current step reaches a preset target, i.e. whether each separation signal yjSignal to interference ratio SIR ofjAre all larger than the preset threshold value, if yes, the reward function is enabledDelta represents a preset constant, otherwise the reward function
S3: constructing a DDPG network according to the action space and the state space designed in the step S2, wherein the DDPG network comprises a current strategy network, a current value network, a target strategy network and a target value network, and the method comprises the following steps:
the input information of the current strategy network is state s, and the output information is action a;
the input information of the current value network is state s and action a, and the output information is value Q;
target policy network: the input and output are the same as the current strategy network, and the current strategy network parameters are copied periodically;
target value network: inputting and outputting the current value network, and periodically copying the current value network parameters;
s4: the K mixed signal samples X obtained in the step S1 are samplediInputting a DDPG network, and training the DDPG network, wherein the method specifically comprises the following steps:
s4.1: randomly initializing parameters of four networks in the DDPG network;
s4.2: making the iteration number e equal to 1;
s4.3: random initialization splittingMatrix W, calculating separation signal matrix Y as WX, and using j row vector of separation signal matrix Y as source signal separation result Y of j radiation sourcejSeparating the result y from the current source signaljDetermining a current state s;
s4.4: initializing the step number t in the iteration to be 1;
s4.5: judging whether the step number T in the iteration is less than T, wherein T represents the preset maximum step number in each iteration, if so, entering a step S4.6, otherwise, entering a step S4.11;
s4.6: the current strategy network obtains action a according to the current state s, adjusts the separation matrix W according to the action a, and recalculates the source signal separation result y 'of each radiation source'jGenerating a next state s'; the current value network obtains a current value Q according to the current state s and the action a; extracting each source signal separation result y from the current state sjCalculating an award value r corresponding to the current state s; then putting the current state s, the action a, the reward value r and the next state s' into an experience pool as a group of experiences; if the experience pool is full when the experience is put in, deleting the earliest experience according to a first-in first-out principle, and then putting the current experience in;
s4.7: judging whether the experience pool is full, if so, entering a step S4.8, otherwise, entering a step S4.10;
s4.8: soft copying the parameters of the current strategy network to a target strategy network, and soft copying the parameters of the current value network to the target value network;
s4.9: for the current strategy network, updating the parameters of the current strategy network by using a gradient strategy by taking the value Q as a loss function;
for the current value network, calculating a loss function and updating the parameters of the current value network according to the loss function, wherein the calculation method of the loss function is as follows:
taking M groups of experiences from the experience pool, and taking the next state s 'in each group of experiences'mInputting a target strategy network to obtain next action a'mM is 1,2, …, M, then state s'mAnd motion a'mInputting a target value network to obtain a priceValue QmCalculating the target return value Z by using the following formulam:
Zm=γQm+Rm
Where γ represents a discount factor, RmRepresenting the reward value in the mth set of experiences;
calculating the Loss function Loss of the current value network by adopting the following formula:
s4.10: let current state S ═ S', t ═ t +1, return to step S4.5;
s4.11: judging whether an iteration end condition is reached, if so, finishing training, and if not, entering the step S4.12;
s4.12: step S4.3 is returned to when e is equal to e + 1;
s5: in practical application, each signal test antenna obtains mixed signals of K radiation sources with the length of LMixing the K mixed signalsInputting the DDPG network trained in the step S4 for retraining; the DDPG action space at this time is designed by adopting the following method:
preferentially converting the K-order matrix C into K multiplied by K-dimensional vectors according to rowsThe vector C' and the boundary value bound form the motion space of the DDPG network
The DDPG state space is designed by adopting the following method:
converting the K-order separation matrix W into K x K-dimensional components according to row priorityVector of deviation
Testing a mixed signal of an antenna for an ith signalFrom mixed signals according to preset data positionsThe P data points are sampled, and the P data are formed into a data vector H'i,jVector H 'of data'i,jCopying K times and splicing to obtain data vector with length of KxPThe K data vectorsSplicing to obtain a vector with dimension of KxKxP
Noting the split signal matrix of the current update step Representing a mixed signalSeparating the signal matrix as a K × L mixed signal matrix formed by row vectorsAs a result of the separation of the source signals of the jth radiation sourceSeparating the results at each source signal according to a predetermined data locationThe P data points of the intermediate sample constitute a data vector G'jK data vectors G'jSplicing to obtain a vector with dimension of KxP
Defining parametersIndicating whether the current iteration number is less than a preset threshold value E or not, if so, determining whether the current iteration number is less than the preset threshold value EOtherwise
Separating the vectorThe vector x _ state, the vector y _ state and the parameter on-good form a state space of the DDPG network
The specific steps of retraining the DDPG network again comprise:
s5.1: making the iteration number e equal to 1;
s5.2: randomly initializing a separation matrix W and then calculating a separation signal matrix Representing a mixed signalSeparating the signal matrix as a K × L mixed signal matrix formed by row vectorsAs a result of the separation of the source signals of the jth radiation sourceSeparating the result according to the current source signalDetermining a current state s;
s5.3: initializing the step number t in the iteration to be 1;
s5.4: judging whether the step number T in the iteration is less than T, wherein T represents the preset maximum step number in each iteration, if yes, entering the step S5.5, and otherwise, entering the step S5.7;
s5.5: the current strategy network obtains action a according to the current state s, adjusts the separation matrix W according to the action a, and recalculates the source signal separation result y 'of each radiation source'jGenerating a next state s';
s5.6: making S ═ S', t ═ t +1 in the current state, and returning to step S5.4;
s5.7: judging whether the iteration times E is less than E ', wherein E ' represents the preset iteration times of retraining, E ' is more than E, if so, entering a step S5.8, otherwise, entering a step S5.9 after retraining is finished;
s5.8: step S5.2 is returned to when e is equal to e + 1;
2. The hybrid radiation source signal separation method of claim 1, wherein the modulation signal transmitted by each sample radiation source in step S1 is required to satisfy the following condition: after the modulation signal is converted into IQ two-path data by signal samples acquired by each signal testing antenna, the modulus value of each data point in the IQ two-path data is larger than a preset threshold value.
3. The method for separating mixed radiation source signals according to claim 1, wherein when obtaining the radiation source signal samples in step S1, the radiation source is moved several times with small amplitude to obtain different transmission scenarios, for each transmission scenario, several data samples of the ith signal testing antenna to the jth sample radiation source and mixed signal samples under the transmission scenario are obtained, and K × K data samples D under each transmission scenario are obtainedi,jForming a group of radiation source signal samples by the K mixed signal samples, and forming a radiation source signal sample set by the radiation source signal samples of all transmitting scenes;
in said step S4.3, a set of radiation source signal samples is arbitrarily selected from the set of radiation source signal samples to calculate a separation signal matrix and to perform the subsequent operations.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110545722.5A CN113364712B (en) | 2021-05-19 | 2021-05-19 | DDPG network-based mixed radiation source signal separation method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110545722.5A CN113364712B (en) | 2021-05-19 | 2021-05-19 | DDPG network-based mixed radiation source signal separation method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113364712A CN113364712A (en) | 2021-09-07 |
CN113364712B true CN113364712B (en) | 2022-06-14 |
Family
ID=77526547
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110545722.5A Active CN113364712B (en) | 2021-05-19 | 2021-05-19 | DDPG network-based mixed radiation source signal separation method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113364712B (en) |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104392252A (en) * | 2014-12-08 | 2015-03-04 | 中国铁路总公司 | Method and device of identifying radiation sources |
CN109212476A (en) * | 2018-09-18 | 2019-01-15 | 广西大学 | A kind of RFID indoor positioning algorithms based on DDPG |
CN109358318A (en) * | 2018-11-20 | 2019-02-19 | 南京理工大学 | A kind of method that external illuminators-based radar blind source separating extracts target echo and direct wave |
TWI651927B (en) * | 2018-02-14 | 2019-02-21 | National Central University | Signal source separation method and signal source separation device |
CN109548044A (en) * | 2018-11-02 | 2019-03-29 | 电子科技大学 | A kind of energy based on DDPG collects the bit rate optimization algorithm of communication |
WO2020166997A1 (en) * | 2019-02-13 | 2020-08-20 | Samsung Electronics Co., Ltd. | Improvements in and relating to random access in a telecommunication system |
CN112668235A (en) * | 2020-12-07 | 2021-04-16 | 中原工学院 | Robot control method of DDPG algorithm based on offline model pre-training learning |
-
2021
- 2021-05-19 CN CN202110545722.5A patent/CN113364712B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104392252A (en) * | 2014-12-08 | 2015-03-04 | 中国铁路总公司 | Method and device of identifying radiation sources |
TWI651927B (en) * | 2018-02-14 | 2019-02-21 | National Central University | Signal source separation method and signal source separation device |
CN109212476A (en) * | 2018-09-18 | 2019-01-15 | 广西大学 | A kind of RFID indoor positioning algorithms based on DDPG |
CN109548044A (en) * | 2018-11-02 | 2019-03-29 | 电子科技大学 | A kind of energy based on DDPG collects the bit rate optimization algorithm of communication |
CN109358318A (en) * | 2018-11-20 | 2019-02-19 | 南京理工大学 | A kind of method that external illuminators-based radar blind source separating extracts target echo and direct wave |
WO2020166997A1 (en) * | 2019-02-13 | 2020-08-20 | Samsung Electronics Co., Ltd. | Improvements in and relating to random access in a telecommunication system |
CN112668235A (en) * | 2020-12-07 | 2021-04-16 | 中原工学院 | Robot control method of DDPG algorithm based on offline model pre-training learning |
Non-Patent Citations (1)
Title |
---|
An Applicable Scheme Employing Bispectrum and Convolutional Neural Network for Individual RF Fingerprint Identification;ZHANG Yi-Ru;《中国科技论文在线》;20210330;全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN113364712A (en) | 2021-09-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107171831B (en) | Network deployment method and device | |
TWI607642B (en) | Method of processing a plurality of signals and signal processing device | |
CN110300075B (en) | Wireless channel estimation method | |
Levie et al. | Pathloss prediction using deep learning with applications to cellular optimization and efficient D2D link scheduling | |
CN112511241B (en) | Composite fading channel random number generation method based on lognormal distribution approximation | |
CN113115344B (en) | Unmanned aerial vehicle base station communication resource allocation strategy prediction method based on noise optimization | |
CN114223270B (en) | Training method and device for antenna signal processing model, antenna and storage medium | |
CN101982953A (en) | Frequency domain multi-dimensional parameterized model of broadband wireless communication channel and modeling method | |
CN113364712B (en) | DDPG network-based mixed radiation source signal separation method | |
CN115499074A (en) | Terahertz scattering parameter prediction method and device based on neural network | |
CN111007457B (en) | Radiation source direct positioning method based on block sparse Bayesian model | |
CN109444571B (en) | Moonlet communication load electromagnetic compatibility prediction method | |
CN105187139B (en) | A kind of outdoor radio signal reception strength map constructing method based on intelligent perception | |
CN106371078A (en) | Emission waveform and position joint estimation based passive positioning method for multiple emission sources | |
CN109581291B (en) | Direct positioning method based on artificial bee colony | |
CN114567386B (en) | High-precision channel group delay characteristic fitting and simulation implementation method, system, storage medium and communication system | |
CN115882985A (en) | Low-orbit satellite channel prediction method and system based on Gaussian process regression | |
CN115942231A (en) | RSS-based 5G outdoor positioning method | |
Nagao et al. | Fine-tuning for propagation modeling of different frequencies with few data | |
CN113901949A (en) | Communication scene recognition method and device, electronic equipment and storage medium | |
CN113052312A (en) | Deep reinforcement learning model training method and device, medium and electronic equipment | |
CN116405880B (en) | Radio map construction method and system based on federal learning | |
CN112398543A (en) | Method, apparatus, system, device and computer readable medium for optical communication | |
US20060167671A1 (en) | Device and method of calibration for a modelling tool and modelling tool incorporating such a device | |
CN113810138B (en) | Multipath channel modeling method for dynamic on-body channel in wireless body area network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |