CN113364712B

CN113364712B - DDPG network-based mixed radiation source signal separation method

Info

Publication number: CN113364712B
Application number: CN202110545722.5A
Authority: CN
Inventors: 张怡如; 杨远望; 邓建华; 游长江; 朱学勇; 潘钰文
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2021-05-19
Filing date: 2021-05-19
Publication date: 2022-06-14
Anticipated expiration: 2041-05-19
Also published as: CN113364712A

Abstract

The invention discloses a mixed radiation source signal separation method based on a DDPG network, which comprises the steps of firstly adopting K signal test antennas to collect radiation source signal samples of K sample radiation sources, processing the radiation source signal samples to obtain mixed signal samples, regarding a separation matrix as an intelligent body, regarding addition and subtraction of matrix elements as actions, regarding the separation degree of signals as an environment, designing the DDPG network, then adopting the mixed signal samples to train the DDPG network, obtaining mixed signals of the K radiation sources by each signal test antenna during actual application, inputting the mixed signals into the trained DDPG network for retraining, and obtaining a signal separation result. The invention effectively improves the accuracy of the mixed signal separation by introducing the DDPG network.

Description

DDPG network-based mixed radiation source signal separation method

Technical Field

The invention belongs to the technical field of signal separation, and particularly relates to a mixed radiation source signal separation method based on a DDPG network.

Background

Accurately and efficiently acquiring a desired signal from a mixed signal is an important research subject in the field of communications, and determines the reception capability of a communication system. The blind signal separation refers to separating signals under the condition that a source signal and a channel are unknown or partially known, is a hot spot in the field of modern signal processing in recent years, and has applications in aspects of wireless communication, voice recognition, biomedicine, mechanical engineering and the like. For wireless communications, blind signal separation is of great significance in the areas of cooperative and non-cooperative communications. In the cooperative communication field, the interference between signals in the mimo communication system and the satellite communication system can be suppressed and separated by blind separation. In the non-cooperative communication field and modern information battles, signals need to be accurately separated from mixed information of own enemies, thus being beneficial to detecting enemies as early as possible, and carrying out correct judgment on enemy equipment and taking corresponding actions. Blind signal analysis is correspondingly faced with greater difficulty in the field of communications, and separation methods in other fields are not necessarily well suited due to signal similarity and complexity.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provide a mixed radiation source signal separation method based on a DDPG network, and the DDPG network is introduced to effectively improve the accuracy of mixed signal separation.

In order to achieve the above purpose, the mixed radiation source signal separation method based on the DDPG network of the present invention includes the following steps:

s1: recording the number of positions where radiation sources are arranged in an actual application environment as K, configuring a sample radiation source at each position to transmit a modulation signal with the length of L, and recording the modulation signal transmitted by the jth sample radiation source as a source signal F_jJ ═ 1,2, …, K; configuring K signal testing antennas in an application environment, firstly enabling each sample radiation source to independently send a modulation signal, respectively acquiring a signal sent by the sample radiation source by each signal testing antenna to obtain a data sample, and recording a data sample acquired by the ith signal testing antenna on the jth sample radiation source as D_i,jI ═ 1,2, …, K; then, K radiation sources simultaneously transmit modulation signals, each signal test antenna respectively collects the received mixed signals to obtain a mixed signal sample, and the mixed signal sample collected by the ith signal test antenna is recorded as X_i；

S2: for the DDPG network, the DDPG action space is designed by adopting the following method:

setting a K-order matrix C, wherein each element of the K-order matrix C is subject to standard normal distribution, and preferentially converting the K-order matrix C into K multiplied by K-dimensional vectors according to rows

Wherein c is_kDenotes the kth element in vector C', K ═ 1,2, …, K²Corresponds to the first in the matrix C

The K-th% K-column elements of the row,

represents rounding down,% represents complementation; then, a boundary value bound is defined, and the vector C' and the boundary value bound form the motion space of the DDPG network

The DDPG state space is designed by adopting the following method:

setting a K-order separation matrix W, and converting the K-order matrix W into K multiplied by K dimension separation vectors according to row priority

Wherein w_kRepresents the kth element of the vector W', corresponding to the kth element of the matrix W

Row K% K column elements;

recording the mixed signal with length L received by the ith signal test antenna as X_iFrom mixed signals X according to preset data positions_iMiddle sampling P data points, and source signal F_jThe ratio is calculated from the corresponding P data points, the P ratios form a ratio vector, and the K ratio vectors corresponding to the same mixed signal are spliced to obtain a ratio vector H_i,jDividing K ratio vectors H_i,jSplicing to obtain a vector x _ state with the dimension of K multiplied by P;

the separation signal matrix Y in the current update step is denoted by the mixed signal X_iA mixed signal matrix of K × L formed as row vectors, and using the jth row vector of the separation signal matrix Y as the source signal separation result Y of the jth radiation source_jSeparating the result y at each source signal according to the preset data position_jMiddle sampling P data points, and source signal F_jThe ratio is calculated from the corresponding P data points, and the P ratios form a ratio vector G_jDividing K ratio vectors G_jSplicing to obtain a vector y _ state with the dimension of K multiplied by P;

defining a parameter on-coarse to indicate whether the current step reaches a preset target, if so, setting the on-coarse to 1, otherwise, setting the on-coarse to 0;

separating the vector

The vector x _ state, the vector y _ state and the parameter on-good form a state space of the DDPG network

The DDPG reward function is designed by adopting the following method:

for the K separation signals y obtained in the current step_jSeparately calculating signal-to-interference ratio SIR_jThe calculation formula is as follows:

wherein | | | purple hair₂2 norm is obtained;

judging whether the current step reaches a preset target, i.e. whether each separation signal y_jSignal-to-interference ratio (SIR) of_jAre all larger than the preset threshold value, if yes, the reward function is enabled

Delta represents a preset constant, otherwise the reward function

S3: constructing a DDPG network according to the action space and the state space designed in the step S2, wherein the DDPG network comprises a current strategy network, a current value network, a target strategy network and a target value network, and the method comprises the following steps:

the input information of the current strategy network is state s, and the output information is action a;

the input information of the current value network is state s and action a, and the output information is value Q;

target policy network: the input and output are the same as the current strategy network, and the current strategy network parameters are copied periodically;

target value network: inputting and outputting the current value network, and periodically copying the current value network parameters;

s4: the K mixed signal samples X obtained in the step S1 are sampled_iInputting a DDPG network, and training the DDPG network, wherein the method specifically comprises the following steps:

s4.1: randomly initializing parameters of four networks in the DDPG network;

s4.2: making the iteration number e equal to 1;

s4.3: randomly initializing a separation matrix W, then calculating a separation signal matrix Y as WX, and taking the jth row vector of the separation signal matrix Y as a source signal separation result Y of the jth radiation source_jSeparating the result y from the current source signal_jDetermining a current state s;

s4.4: initializing the step number t in the iteration to be 1;

s4.5: judging whether the step number T in the iteration is less than T, wherein T represents the preset maximum step number in each iteration, if so, entering a step S4.6, otherwise, entering a step S4.11;

s4.6: the current strategy network obtains action a according to the current state s, adjusts the separation matrix W according to the action a, and recalculates the source signal separation result y 'of each radiation source'_jGenerating a next state s'; the current value network obtains a current value Q according to the current state s and the action a; extracting each source signal separation result y from the current state s_jCalculating an award value r corresponding to the current state s; then putting the current state s, the action a, the reward value r and the next state s' into an experience pool as a group of experiences; if the experience pool is full when the experience is put in, deleting the earliest experience according to a first-in first-out principle, and then putting the current experience in;

s4.7: judging whether the experience pool is full, if so, entering a step S4.8, otherwise, entering a step S4.10;

s4.8: soft copying the parameters of the current strategy network to a target strategy network, and soft copying the parameters of the current value network to the target value network;

s4.9: for the current strategy network, updating the parameters of the current strategy network by using a gradient strategy by taking the value Q as a loss function;

for the current value network, calculating a loss function and updating the parameters of the current value network according to the loss function, wherein the calculation method of the loss function is as follows:

taking M groups of experiences from the experience pool, and taking the next state s 'in each group of experiences'_mInputting a target strategy network to obtain next action a'_mM is 1,2, …, M, then state s'_mAnd motion a'_mInputting a target value network to obtain a value Q_mCalculating the target return value Z by using the following formula_m：

Z_m＝γQ_m+R_m

Where γ represents a discount factor, R_mRepresenting the reward value in the mth set of experiences;

calculating the Loss function Loss of the current value network by adopting the following formula:

s4.10: making S ═ S', t ═ t +1 in the current state, and returning to step S4.5;

s4.11: judging whether an iteration end condition is reached, if so, finishing training, and if not, entering the step S4.12;

s4.12: step S4.3 is returned to when e is equal to e + 1;

s5: in practical application, each signal test antenna obtains mixed signals of K radiation sources with the length of L

Mixing the K mixed signals

Inputting step S4 trained DDPG network for further processingPerforming secondary training; the DDPG action space at this time is designed by adopting the following method:

preferentially converting the K-order matrix C into K multiplied by K-dimensional vectors according to rows

The vector C' and the boundary value bound form the motion space of the DDPG network

The DDPG state space is designed by adopting the following method:

converting the K-order separation matrix W into K multiplied by K dimension separation vector according to row priority

Testing a mixed signal of an antenna for an ith signal

From mixed signals according to preset data positions

The P data points are sampled, and the P data are formed into a data vector H'_i,jVector H 'of data'_i,jCopying K times and splicing to obtain data vector with length of KxP

K data vectors

Splicing to obtain a vector with dimension K multiplied by P

Noting the separation signal matrix of the current update step

Representing a mixed signal

Separating the signal matrix as a K × L mixed signal matrix formed by row vectors

As a result of the separation of the source signals of the jth radiation source

Separating the results at each source signal according to a predetermined data location

The P data points of the intermediate sample constitute a data vector G'_jK data vectors G'_jSplicing to obtain a vector with dimension of KxP

Defining parameters

Indicating whether the current iteration number is less than a preset threshold value E or not, if so, determining whether the current iteration number is less than the preset threshold value E

Otherwise

Separating the vector

The specific steps of retraining the DDPG network comprise:

s5.1: making the iteration number e equal to 1;

s5.2: randomly initializing a separation matrix W and then calculating a separation signal matrix

Representing a mixed signal

As a result of the separation of the source signals of the jth radiation source

Separating the result according to the current source signal

Determining a current state s;

s5.3: initializing the step number t in the iteration to be 1;

s5.4: judging whether the step number T in the iteration is less than T, wherein T represents the preset maximum step number in each iteration, if so, entering step S5.5, otherwise, entering step S5.7;

s5.5: the current strategy network obtains action a according to the current state s, adjusts the separation matrix W according to the action a, and recalculates the source signal separation result y 'of each radiation source'_jGenerating a next state s';

s5.6: making S ═ S', t ═ t +1 in the current state, and returning to step S5.4;

s5.7: judging whether the iteration times E is less than E ', wherein E ' represents the preset iteration times of retraining, E ' is more than E, if so, entering a step S5.8, otherwise, entering a step S5.9 after retraining is finished;

s5.8: step S5.2 is returned to when e is equal to e + 1;

s5.9: determining a separation matrix from the last state

Then calculating a matrix of separated signals

Separating signal matrix

As a final source signal separation result of the jth radiation source

The invention relates to a mixed radiation source signal separation method based on a DDPG network, which comprises the steps of firstly adopting K signal test antennas to collect radiation source signal samples of K sample radiation sources, processing the radiation source signal samples to obtain mixed signal samples, regarding a separation matrix as an intelligent body, regarding addition and subtraction of matrix elements as actions, regarding the separation degree of signals as an environment, designing the DDPG network, then adopting the mixed signal samples to train the DDPG network, obtaining mixed signals of the K radiation sources by each signal test antenna during actual application, inputting the mixed signals into the trained DDPG network for retraining, and obtaining signal separation results.

The invention has the following beneficial effects:

1) the invention can realize signal separation by the prior knowledge of the sample radiation source under the condition of unknown mixed channel;

2) the invention adopts the DDPG network to interact the separated action and the signal environment, more accords with the actual separation scene and improves the signal separation effect.

Drawings

FIG. 1 is a block diagram of an embodiment of a DDPG network-based mixed radiation source signal separation method of the present invention;

FIG. 2 is a flow chart of DDPG network training in the present invention;

FIG. 3 is a diagram illustrating updating network parameters in the present embodiment;

FIG. 4 is a flow chart of the retraining of the DDPG network in the present invention;

fig. 5 is a comparison diagram of waveforms of the separated handset signal and the source signal in the present embodiment;

fig. 6 is a comparison graph of waveforms of the USRP device signal and the source signal obtained by separation in the present embodiment.

Detailed Description

The following description of the embodiments of the present invention is provided in order to better understand the present invention for those skilled in the art with reference to the accompanying drawings. It is to be expressly noted that in the following description, a detailed description of known functions and designs will be omitted when it may obscure the subject matter of the present invention.

Examples

Fig. 1 is a block diagram of an embodiment of a mixed radiation source signal separation method based on a DDPG network. As shown in fig. 1, the method for separating mixed radiation source signals based on the DDPG network of the present invention specifically includes the steps of:

s101: acquiring a radiation source signal sample:

recording the number of positions where radiation sources are arranged in an actual application environment as K, configuring a sample radiation source at each position to transmit a modulation signal with the length of L, and recording the modulation signal transmitted by the jth sample radiation source as a source signal F_jJ is 1,2, …, K. Configuring K signal testing antennas in an application environment, firstly enabling each sample radiation source to independently send a modulation signal, respectively acquiring a signal sent by the sample radiation source by each signal testing antenna to obtain a data sample, and recording a data sample acquired by the ith signal testing antenna on the jth sample radiation source as D_i,jI is 1,2, …, K. Then, K radiation sources simultaneously transmit modulation signals, each signal test antenna respectively collects the received mixed signals to obtain a mixed signal sample, and the mixed signal sample collected by the ith signal test antenna is recorded as X_i。

In this embodiment, in order to enable the acquired signal samples to more accurately represent channel characteristics and enable the subsequently obtained separation matrix to be more accurate, the modulation signal sent by each sample radiation source needs to satisfy the following conditions: after the modulation signal is converted into IQ two-path data by signal samples acquired by each signal testing antenna, the modulus value of each data point in the IQ two-path data is larger than a preset threshold value.

In addition, in order to make the trained DDPG network adapt to small amplitude position variation of the radiation source, when obtaining the radiation source signal sample in step S101, the radiation source may be moved several times with small amplitude to obtain different transmission scenarios, for each transmission scenario, several data samples of the ith signal testing antenna to the jth sample radiation source and mixed signal samples in the transmission scenario may be obtained, and K × K data samples D in each transmission scenario may be obtained_i,jAnd K mixed signal samples X_iI.e. a set of radiation source signal samples is formed, and the radiation source signal samples of all transmitted scenes are formed into a radiation source signal sample set.

S102: designing a DDPG network:

in the invention, the separation matrix is regarded as an agent, the addition and subtraction of matrix elements are regarded as actions, the separation degree of signals is regarded as an environment, and the ideal signal separation is realized through the interaction between the agent and the environment and the feedback of the environment to the agent. Based on the thought, the DDPG network is designed, and the specific method comprises the following steps:

1) designing DDPG action space

The K-th% K-column elements of the row,

meaning rounded down,% meaning complementation. Then, a boundary value bound is defined, and the vector C' and the boundary value bound form the motion space of the DDPG network

2) Designing DDPG state spaces

Row K% K column elements.

Recording the mixed signal with length L received by the ith signal test antenna as X_iFrom the mixed signal X according to predetermined data positions_iMiddle sampling P data points, and source signal F_jThe ratio is calculated from the corresponding P data points, the P ratios form a ratio vector, and the K ratio vectors corresponding to the same mixed signal are spliced to obtain a ratio vector H_i,jDividing K ratio vectors H_i,jAnd splicing to obtain a vector x _ state with the dimension of K multiplied by P.

The separation signal matrix Y in the current update step is recorded as WX, X is represented by the mixed signal X_iA mixed signal matrix of K × L formed as row vectors, and using the jth row vector of the separation signal matrix Y as the source signal separation result Y of the jth radiation source_jSeparating the result y at each source signal according to the preset data position_jMiddle sampling P data points, and source signal F_jThe ratio is calculated from the corresponding P data points, and the P ratios form a ratio vector G_jDividing K ratio vectors G_jAnd splicing to obtain a vector y _ state with the dimension of K multiplied by P.

And defining a parameter on-coarse to indicate whether the current step reaches a preset target, wherein if yes, the on-coarse is 1, and otherwise, the on-coarse is 0.

Separating the vector

The vector x _ state, the vector y _ state and the parameter on-real form the state space of the DDPG network

3) Designing DDPG reward functions

wherein | | | calving₂Which means 2 norm is found.

Delta represents a preset constant, otherwise the reward function

In this embodiment, the threshold value of the signal to interference ratio is 30, and the value of Δ is 100. By adopting the mode, the reward function can represent whether the current step reaches the preset target or not and can measure the separation degree of the signals.

S103: constructing a DDPG network:

constructing a DDPG network according to the action space and the state space designed in the step S102, wherein the DDPG network comprises a current strategy network, a current value network, a target strategy network and a target value network, and the DDPG network comprises:

target value network: and the input and output are the same as the current value network, and the current value network parameters are periodically copied.

S104: training the DDPG network:

the K mixed signal samples X obtained in the step S101 are sampled_iInputting the DDPG network, and training the DDPG network.

FIG. 2 is a flow chart of DDPG network training in the present invention. As shown in fig. 2, the DDPG network training of the present invention specifically comprises:

s201: initializing the network:

parameters of four networks in the DDPG network are initialized randomly.

S202: initializing iteration parameters:

let the iteration number e equal to 1.

S203: initializing a separation matrix:

randomly initializing a separation matrix W, and calculating a separation signal matrix Y ═ WX, X representing a signal mixture X_iA mixed signal matrix of K × L formed as row vectors, and using the jth row vector of the separation signal matrix Y as the source signal separation result Y of the jth radiation source_jSeparating the result y from the current source signal_jThe current state s is determined.

When a data sample set is acquired in step S101, a separate signal matrix is calculated and a subsequent operation is performed in step S203 by arbitrarily selecting a group of radiation source signal samples from the radiation source signal sample set.

S204: and initializing the step number t in the iteration to be 1.

S205: and judging whether the step number T in the current iteration is less than T, wherein T represents the preset maximum step number in each iteration, in the embodiment, T is 500, if so, the step S206 is executed, and if not, the step S211 is executed.

S206: generating new experience:

the current policy network is based on the current stateObtaining action a from state s, adjusting the separation matrix W according to the action a, and recalculating the source signal separation result y 'of each radiation source'_jThe next state s' is generated. And the current value network obtains the current value Q according to the current state s and the action a. Extracting each source signal separation result y from the current state s_jAnd calculating the reward value r corresponding to the current state s. The current state s, action a, reward value r, and next state s' are then placed into the experience pool as a set of experiences. If the experience pool is full when the experience is put in, the earliest experience is deleted according to the first-in first-out principle, and then the current experience is put in.

S207: and judging whether the experience pool is full, if so, entering step S208, and otherwise, entering step S210.

S208: copying network parameters:

and soft copying the parameters of the current strategy network to a target strategy network, and soft copying the parameters of the current value network to the target value network.

S209: and (3) updating network parameters:

fig. 3 is a schematic diagram of updating network parameters in the present embodiment. As shown in fig. 3, for the current policy network, the value Q is used as a loss function to update the parameters of the current policy network by using a gradient policy.

taking M groups of experiences from the experience pool, and taking the next state s 'in each group of experiences'_mInputting a target strategy network to obtain next action a'_mM is 1,2, …, M, then state s'_mAnd action a'_mInputting a target value network to obtain a value Q_mCalculating the target return value Z by using the following formula_m：

Z_m＝γQ_m+R_m

Where γ represents a discount factor, R_mRepresenting the prize values in the mth set of experiences.

s210: let S ═ S', t ═ t +1, the process returns to step S205.

S211: and judging whether an iteration end condition is reached, if so, finishing training, and if not, entering the step S212. The iteration ending conditions of the DDPG network training generally include two conditions, one is that the iteration number reaches a preset threshold value, which is designed to be 10000 in this embodiment, and the other is that the reward value reaches a preset threshold value, which is set as required.

S212: let e be e +1, return to step S203.

S105: mixed signal separation:

in practical application, each signal test antenna obtains mixed signals of K radiation sources with the length of L

Mixing the K mixed signals

And inputting the DDPG network trained in the step S104 for retraining. The DDPG action space at this time is designed by adopting the following method:

The DDPG state space is designed by adopting the following method:

Test the mixed signal of the antenna for the ith signal as

From mixed signals according to preset data positions

The K data vectors

Splicing to obtain a vector with dimension of KxKxP

Noting the separation signal matrix of the current update step

Representing a mixed signal

As a result of the separation of the source signals of the jth radiation source

Defining parameters

Otherwise

Separating the vector

When the training is carried out again, the parameters of the DDPG network do not need to be updated. Figure 4 is a flow chart of the retraining of the DDPG network of the present invention. As shown in fig. 4, the specific steps of retraining the DDPG network in the present invention include:

s401: let the iteration number e equal to 1.

S402: initializing a separation matrix:

randomly initializing a separation matrix W and then calculating a separation signal matrix

Representing a mixed signal

As a result of the separation of the source signals of the jth radiation source

Separating the result according to the current source signal

The current state s is determined.

S403: and initializing the step number t in the iteration to be 1.

S404: judging whether the step number T in the iteration is less than T, wherein T represents the preset maximum step number in each iteration, if so, entering a step S405, otherwise, entering a step S407;

s405: the next state is generated:

the current strategy network obtains action a according to the current state s, adjusts the separation matrix W according to the action a, and recalculates the source signal separation result y 'of each radiation source'_jThe next state s' is generated.

S406: returning to step S404 by setting S to S', t to t + 1;

s407: judging whether the iteration times E is less than E ', wherein E ' represents the preset retraining iteration times, and E ' is more than E, if so, entering step S408, otherwise, entering step S409 after retraining is finished;

s408: let e be e +1, return to step S402.

S409: obtaining a signal separation result:

determining a separation matrix from the last state

Then calculating a matrix of separated signals

Separating signal matrix

As a final source signal separation result of the jth radiation source

In order to better illustrate the technical effects of the present invention, a specific embodiment is adopted to perform simulation verification on the present invention.

In this embodiment, 3 radiation sources are provided, which are 1 mobile phone and 2 USRP (Universal Software Radio Peripheral) devices, respectively. An antenna in an AD9361 software radio platform is used as a signal testing antenna to collect modulation signals sent by a radiation source, wherein the sampling frequency range is 430-440MHz, and the sampling frequency is 20 MHz. Data samples are collected to train the DDPG network, and then the actual mixed signals are separated off line.

Fig. 5 is a comparison diagram of waveforms of the separated handset signal and the source signal in this embodiment. Fig. 6 is a comparison graph of waveforms of the USRP device signal and the source signal obtained by separation in the present embodiment. As shown in fig. 5 and fig. 6, the signal obtained by the separation of the present invention is very close to the source signal, the signal-to-interference ratio of the statistically separated signal reaches more than 30, and the correlation coefficient is more than 0.99, which can completely meet the requirements of engineering applications.

Although illustrative embodiments of the present invention have been described above to facilitate the understanding of the present invention by those skilled in the art, it should be understood that the present invention is not limited to the scope of the embodiments, and various changes may be made apparent to those skilled in the art as long as they are within the spirit and scope of the present invention as defined and defined by the appended claims, and all matters of the invention which utilize the inventive concepts are protected.

Claims

1. A mixed radiation source signal separation method based on a DDPG network is characterized by comprising the following steps:

s1: the number of the positions provided with the radiation sources in the practical application environment is recorded asK, configuring a sample radiation source at each position to transmit a modulation signal with the length of L, and recording the modulation signal transmitted by the jth sample radiation source as a source signal F_jJ ═ 1,2, …, K; configuring K signal testing antennas in an application environment, firstly enabling each sample radiation source to independently send a modulation signal, respectively acquiring a signal sent by the sample radiation source by each signal testing antenna to obtain a data sample, and recording a data sample acquired by the ith signal testing antenna on the jth sample radiation source as D_i,jI ═ 1,2, …, K; then, K radiation sources simultaneously transmit modulation signals, each signal test antenna respectively collects the received mixed signals to obtain a mixed signal sample, and the mixed signal sample collected by the ith signal test antenna is recorded as X_i；

The K-th% K-column elements of the row,

The DDPG state space is designed by adopting the following method:

setting a K-order separation matrix W, and converting the K-order matrix W into K × K dimensions according to row prioritySeparate vector

Row K% K column elements;

the separation signal matrix Y in the current update step is recorded as WX, X is represented by the mixed signal X_iA mixed signal matrix of K × L formed as row vectors, and using the jth row vector of the separation signal matrix Y as the source signal separation result Y of the jth radiation source_jSeparating the result y at each source signal according to the preset data position_jMiddle sampling P data points, and source signal F_jThe ratio is calculated from the corresponding P data points, and the P ratios form a ratio vector G_jDividing K ratio vectors G_jSplicing to obtain a vector y _ state with the dimension of K multiplied by P;

separating the vector

The DDPG reward function is designed by adopting the following method:

for the K separate signals y obtained in the current step_jSeparately calculating signal-to-interference ratio SIR_jThe calculation formula is as follows:

wherein | | | purple hair₂2 norm is obtained;

judging whether the current step reaches a preset target, i.e. whether each separation signal y_jSignal to interference ratio SIR of_jAre all larger than the preset threshold value, if yes, the reward function is enabled

Delta represents a preset constant, otherwise the reward function

s4.1: randomly initializing parameters of four networks in the DDPG network;

s4.2: making the iteration number e equal to 1;

s4.3: random initialization splittingMatrix W, calculating separation signal matrix Y as WX, and using j row vector of separation signal matrix Y as source signal separation result Y of j radiation source_jSeparating the result y from the current source signal_jDetermining a current state s;

s4.4: initializing the step number t in the iteration to be 1;

taking M groups of experiences from the experience pool, and taking the next state s 'in each group of experiences'_mInputting a target strategy network to obtain next action a'_mM is 1,2, …, M, then state s'_mAnd motion a'_mInputting a target value network to obtain a priceValue Q_mCalculating the target return value Z by using the following formula_m：

Z_m＝γQ_m+R_m

s4.10: let current state S ═ S', t ═ t +1, return to step S4.5;

s4.12: step S4.3 is returned to when e is equal to e + 1;

Mixing the K mixed signals

Inputting the DDPG network trained in the step S4 for retraining; the DDPG action space at this time is designed by adopting the following method:

The DDPG state space is designed by adopting the following method:

converting the K-order separation matrix W into K x K-dimensional components according to row priorityVector of deviation

Testing a mixed signal of an antenna for an ith signal

From mixed signals according to preset data positions

The K data vectors

Splicing to obtain a vector with dimension of KxKxP

Noting the split signal matrix of the current update step

Representing a mixed signal

As a result of the separation of the source signals of the jth radiation source

Defining parameters

Otherwise

Separating the vector

The specific steps of retraining the DDPG network again comprise:

s5.1: making the iteration number e equal to 1;

Representing a mixed signal

As a result of the separation of the source signals of the jth radiation source

Separating the result according to the current source signal

Determining a current state s;

s5.3: initializing the step number t in the iteration to be 1;

s5.4: judging whether the step number T in the iteration is less than T, wherein T represents the preset maximum step number in each iteration, if yes, entering the step S5.5, and otherwise, entering the step S5.7;

s5.8: step S5.2 is returned to when e is equal to e + 1;

s5.9: determining a separation matrix from the last state

Then calculating a matrix of separated signals

Separating signal matrix

As a final source signal separation result of the jth radiation source

2. The hybrid radiation source signal separation method of claim 1, wherein the modulation signal transmitted by each sample radiation source in step S1 is required to satisfy the following condition: after the modulation signal is converted into IQ two-path data by signal samples acquired by each signal testing antenna, the modulus value of each data point in the IQ two-path data is larger than a preset threshold value.

3. The method for separating mixed radiation source signals according to claim 1, wherein when obtaining the radiation source signal samples in step S1, the radiation source is moved several times with small amplitude to obtain different transmission scenarios, for each transmission scenario, several data samples of the ith signal testing antenna to the jth sample radiation source and mixed signal samples under the transmission scenario are obtained, and K × K data samples D under each transmission scenario are obtained_i,jForming a group of radiation source signal samples by the K mixed signal samples, and forming a radiation source signal sample set by the radiation source signal samples of all transmitting scenes;

in said step S4.3, a set of radiation source signal samples is arbitrarily selected from the set of radiation source signal samples to calculate a separation signal matrix and to perform the subsequent operations.