CN113051070A

CN113051070A - Electromagnetic signal sorting method

Info

Publication number: CN113051070A
Application number: CN202110229049.4A
Authority: CN
Inventors: 宋正鑫; 刘燕; 郭亮; 李斌; 郭建明; �田明宏; 申娟; 周红; 汪小平; 郭俊
Original assignee: Strategic Early Warning Research Institute Of People's Liberation Army Air Force Research Institute
Current assignee: Strategic Early Warning Research Institute Of People's Liberation Army Air Force Research Institute
Priority date: 2021-02-26
Filing date: 2021-02-26
Publication date: 2021-06-29

Abstract

The invention provides an electromagnetic signal sorting method, which comprises the following implementation steps of designing a software and hardware architecture based on an embedded GPU; building a calculation framework for radar pulse stream parallelization data processing; the typical parameter measurement algorithm is operated on the embedded GPU; and designing and finishing a migration flow of a parallelization data algorithm from a mathematical model to software codes. The invention executes ADC sampling to the received radar pulse stream signal, applies the algorithm transplanting flow according to the data quantity of the digital radar pulse stream and the condition of hardware resources, carries out algorithm modeling and integrated debugging under the real software and hardware environment to the corresponding parameter measurement algorithm, and ensures the accuracy of the parameter measurement result of the radar pulse stream signal. The problems that the efficiency of calculation of core steps in the process of sorting the radar pulse streams is not high and the utilization of hardware resources is not enough are solved.

Description

Electromagnetic signal sorting method

Technical Field

The invention belongs to the technical field of radar signal processing, and relates to an electromagnetic signal sorting method.

Background

In recent years, electronic information technology is rapidly developed in the military field, and an electromagnetic space is gradually developed into a fifth-dimensional space except a four-dimensional battlefield space of sea, land, air and space, so that the electromagnetic space is a new battlefield environment. In the new space, a plurality of signals are overlapped, and at a certain point of a battlefield airspace, radiation signals from military, civil and natural world can be received, and interference signals of both enemies and the my can exist; different radiation sources may emit signals which are sometimes dense and sometimes silent, and which may be different in timing, some of which may be in the form of continuous waves and some of which may be in the form of pulses, and many parameters of which, such as carrier frequency (RF), Pulse Width (PW), Pulse Repetition Frequency (PRF), etc., may be different; an enemy can also transmit electromagnetic energy to any specific space by controlling the antenna, so that the power characteristics of the space electromagnetic environment are richer; aliasing of these factors can lead to the formation of more complex dynamic environments in the time domain. Meanwhile, with the great development of the technology in the field of radar, various radars with new systems are developed and deployed, so that the density of radar pulse streams in electronic countermeasure is increased sharply, pulse interleaving is more and more serious, and the stability of pulse parameters is more and more unable to be effectively ensured.

The traditional radar signal sorting method is based on the statistical analysis of radar pulse description word Parameters (PDW), and belongs to the field of sorting targets according to the relativity or the irrelevance among pulses and eliminating noise points during sorting. The method is greatly influenced by parameter change, the radar with similar PDW parameters cannot be sorted, in addition, when the PRI fine sorting is carried out, the calculation results of the two times of calculation have dependency, and parallel calculation cannot be adopted between data, so that the method only can carry out single-core calculation, the occupancy rate of hardware resources is low, the sorting speed is low, the complex and large-quantity data is difficult to process, and certain requirements are provided for the integrity and the purity of the data. As the direction of machine learning and artificial intelligence is greatly promoted, the sorting basis is not limited to five parameters of PDW, some more stable internal characteristics of the pulse can be extracted, the internal characteristics have statistical independence, and the deep research direction of people is not the integrity description of the characteristics of a certain pulse but the extraction of useful information of different signal patterns is distinguished for the sorting signal, namely, the data of the characteristic information of the pulse needs to be compressed and sensed, so that the extracted information can represent the difference of each signal pattern as much as possible. The radar signal sorting is taken as a core part in the whole electronic countermeasure system, the sorting speed and the sorting quality can reflect the radar technical level of one country to a great extent, and the radar signal sorting system is valued by various countries, and based on the above environment where modern war information countermeasures, higher requirements are provided for reconnaissance equipment (a radar receiver, a parameter measurer and a signal processor) so as to meet the requirements of real-time performance and high efficiency, so that in addition to the promotion of new radar signal sorting algorithm research, the parallel processing method also needs to select a proper computing platform to parallelize the part occupying most of execution time in the algorithm according to the Amdahl acceleration ratio law.

As can be seen from the understanding of the prior art, the prior radar signal sorting technology mainly focuses on analyzing inter-pulse characteristic parameters or intra-pulse characteristic parameters in terms of research direction, wherein the mature inter-pulse development is multi-parameter inter-pulse matching, and the mature intra-pulse development is Independent Component Analysis (ICA) and various clustering algorithms under blind sources; the main flow is preprocessing and main processing in terms of implementation, wherein the preprocessing is mainly used for reducing the data volume to be processed, and the main two methods are compressed sensing, namely, the difference between signals can be represented by using as less data as possible, and the time required for processing the data is reduced; data is partitioned in advance by using partial parameters and then processed respectively, pulse arrival angle DOA/pulse arrival time TOA is generally selected for primary partitioning, then the data is subdivided by using a modulation form (modulation frequency K) of a pulse, and then the data is partitioned again by using carrier frequency RF, so that the purpose of reducing the sorting time is achieved.

In the prior art, a serial platform is used for performing matrix operation and iterative solution to complete blind source radar pulse stream interference sorting, the problems of low operation speed, low resource utilization rate of a hardware operation unit and the like exist, and the requirement of real-time sorting in a complex electromagnetic environment cannot be met.

Disclosure of Invention

In order to solve the above technical problem, the present invention provides an electromagnetic signal sorting method, where the signal processing system of the electromagnetic signal includes a radar signal preprocessor and a radar signal host processor, the radar signal host processor adopts a CPU + GPU structure, and the acceleration method includes the following steps:

step 1, a radar signal preprocessing machine samples original radar data received by a radar signal receiver, and a CPU (central processing unit) of a radar signal main processor distributes a video memory space for the sampled data at a GPU (graphics processing unit) end according to the size of the data and copies the sampled data into the video memory;

step 2, carrying out zero equalization and decorrelation processing on the radar data received by the GPU end, and forming a signal matrix by the processed radar signals;

step 3, separating the signal matrix processed in the step 2 by utilizing a rapid independent component analysis method at the GPU end, wherein the separation calculation is an iterative process, and after each cycle iteration is finished, the control condition flag bit is transmitted back to the CPU end;

step 4, after obtaining the control flag bit, the CPU judges whether the control flag bit meets the control condition of iterative computation; if the control condition is not met, continuing loop iteration at the GPU end;

step 5, the CPU obtains a control flag bit, and when the control flag bit meets the control condition of iterative computation, the CPU analyzes and separates an initial source signal at a GPU end, eliminates noise, obtains the correct radar number K, and then transmits the number K to the next step for signal clustering and sorting;

step 6, preprocessing the pulse description word PDW data obtained by the radar signal sorting machine comprises the following steps: carrying out first blocking pretreatment on data according to the direction angle DOA of pulse arrival, carrying out second blocking pretreatment according to a frequency modulation parameter K, carrying out third blocking pretreatment according to carrier frequency RF, distributing initial class labels to blocking results obtained by each pretreatment specifically according to the obtained K value, and normalizing each item in a radar pulse description word at a GPU end;

step 7, performing clustering sorting iterative processing on the PDW of the signals at the GPU end, calculating Euclidean distance from the normalized sample point to each initial point,clustering after comparisonAfter each circulation is finished, transmitting a control circulation mark to a CPU end for judgment, entering the step 8 if circulation conditions are met, and continuing to iterate in the GPU if the circulation conditions are not met;

and 8, copying the result data obtained at the GPU end back to the host memory under the control of the CPU, performing classified storage, and then releasing the video memory to prepare for next signal sorting.

Further, the sampling data in step 1 refers to sampling data obtained by performing analog-to-digital conversion a/D on various types of pulse signal data received by the radar signal receiver.

Further, the sampling of the raw radar data received by the radar signal receiver in step 1 includes: and sampling the original radar data received by the radar signal receiver in angle and time.

Furthermore, the sub-angle sampling of the original radar data received by the radar signal receiver means that the received original radar data is divided every 15 degrees according to the direction of arrival angle DOA.

Further, the pretreatment in step 6 comprises: the radar preprocessor is used for performing intra-pulse feature extraction on the signal pulse by using rapid independent component analysis aiming at the signal pulse obtained by the frequency measurement receiver and the direction measurement receiver, acquiring the number of signal sources and eliminating the influence of noise.

Further, the radar signal preprocessor receives a pulse stream signal PDW output by the radar receiver, performs pulse parameter matching analysis on the randomly overlapped radar pulse signal stream, and separates a known radar pulse train or subtracts adjacent and square radar pulses from the known radar pulse train.

Further, the step 2 comprises the following substeps:

step 2.1, performing zero-mean decorrelation processing on the radar pulse signals in a data preprocessing stage to remove the correlation among the radar data;

step 2.2, the data whitening adopts matrix multiplication, and a library function cublasDgemm () is called to complete matrix multiplication operation;

and 2.3, performing zero-averaging data processing on the data by opening n × p threads in the GPU and processing one data by using each thread.

Further, the loop iteration algorithm in step 3 includes the following sub-steps:

step 3.1, Z₁←W*Z

Step 3.2, Z₂←g(Z₁)；Z₂'←diag[g'(Z₁)*1_p]*W

Step 3.3, W₂←Z₂Z^T

Step 3.4, W₁←W₂-Z₂'

Step 3.5, for W₁Performing singular decomposition and judging whether iteration is finished;

wherein Z is a matrix of n x p, Z^TIs a rank matrix of Z, Z₁Is a whitened matrix of Z, Z₂Is an objective function matrix, n represents the number of radar radiation sources, p represents the number of radar pulse signals, p > n; w is n x n square matrix, g' (Z)₁) Is g (Z)₁) Derivative of l_pIs an identity matrix, g (Z) is a non-linear function, Diag [, ]]Is a diagonalization function operation.

Further, the step of removing noise in step 5 includes: distinguishing the signal and the noise, determining the expected value of the noise to be 0, and judging the noise and filtering the noise when the expected value of the separated signal is smaller than a threshold value.

Further, the clustering and sorting iteration of step 7 comprises the following sub-steps:

step 7.1, counting samples of each cluster;

step 7.2, counting the number of samples of each cluster;

step 7.3, calculate the average of all samples for each cluster and take it as the cluster center: sample and/or number of samples;

and 7.4, calculating the Euclidean distance between each sample and each cluster center.

Compared with the prior art, the method has the following advantages:

1. and (3) reducing the algorithm complexity: the invention innovates a passive radar pulse data processing method, designs a new data processing method for parallelization of radar pulse streams, and converts the traditional sorting algorithm which has dependency between the calculation results of two times before and after and can only be calculated in series into a method based on matrix parallel operation by utilizing an independent component analysis method of blind source signals, so that the complexity of the algorithm is greatly reduced.

2. And (3) improving the operation speed: the invention relates to a matrix operation and iteration method based on a parallel computing platform, in particular to an operation for reducing redundancy of radar data and a subsequent clustering iteration method for expanding parallel computation in thread blocks and among thread blocks, thereby improving the clustering operation speed.

3. The utilization rate of hardware resources is improved: the invention utilizes thousands of small and high-efficient hardware cores, combines the initial data volume according to the hardware resource capacity, sets reasonable thread configuration, and effectively distributes blind source radar data to each thread block and threads in the blocks, so that matrix operation and clustering iteration are carried out to the maximum extent.

Drawings

FIG. 1 is a flow chart of the general solution of the present invention;

FIG. 2 is a schematic diagram of the signal processor of the present invention;

FIG. 3 is a flow chart of a serial Fast-CIA based K-Means clustering algorithm program;

FIG. 4 is a flowchart of a parallel Fast-ICA algorithm procedure;

FIG. 5 is a flowchart of a parallel K-Means clustering algorithm process;

FIG. 6 is a flow chart of a parallel Fast-ICA based K-Means clustering algorithm process;

FIG. 7 is a flowchart of a serial Fast-ICA based algorithm procedure.

Detailed Description

Aiming at the problems that the blind source radar pulse stream interference sorting operation speed is not fast and the utilization rate of a hardware operation unit is not high in the prior art, the invention discloses an electromagnetic signal sorting method which is obvious in acceleration effect and high in hardware resource utilization rate. The electromagnetic signal processing system comprises a radar signal preprocessor and a radar signal main processor, wherein the main processor adopts a CPU + GPU structure, in order to realize the optimal performance of the radar signal sorting method, the parameters of the GPU device of a host are inquired through a deviceQuery () function, and the performance of the algorithm is optimal through matching thread number and reasonably distributing the storage space of the device, and the data processing method comprises the following steps:

step 1, reasonably sampling original radar data received by a radar signal receiver in different angles and time, and distributing a video memory space for the data (pulses of different systems) at a GPU (graphics processing unit) end according to the size of the sampled data at a CPU control end and copying the data;

step 2, centralizing (zero averaging) and whitening (decorrelation) processing are carried out on the sampled data at a GPU end;

step 3, separating the signal matrix processed in the last step by using a rapid independent component analysis method at the GPU end, and transmitting the control condition flag bit back to the CPU end after each cycle iteration is completed;

step 4, after the CPU obtains the control flag bit, if the control flag bit does not meet the condition, the CPU continues loop iteration at the GPU end;

step 5, the CPU obtains a control flag bit, and when the control flag bit meets the conditions, the CPU analyzes and separates an initial source signal at a GPU end, eliminates noise, obtains the number K of correct radars, and transmits the number K to the next step for signal clustering and sorting;

step 6, performing primary data partitioning pretreatment on PDW (pulse description word) data obtained by the radar signal sorting machine according to a pulse arrival angle DOA, performing secondary partitioning pretreatment according to a signal pattern (specifically frequency modulation parameter K), performing tertiary partitioning pretreatment according to carrier frequency RF, distributing an initial class label for an obtained partitioning result specifically according to the obtained K value, and performing the next step after normalizing various items in the radar pulse description word at a GPU end;

step 7, performing clustering sorting iteration processing on the PDW of the signals at the GPU end, calculating Euclidean distances from the normalized sample points to each initial point, performing clustering after comparison, transmitting a control cycle mark to the CPU end for judgment after each cycle is completed, entering the next step if the cycle conditions are met, and continuing iteration in the GPU if the cycle conditions are not met;

The invention lays a solid foundation for realizing the multifunctional radar by designing a new data processing method for parallelizing the radar pulse stream.

The following detailed description of embodiments of the invention refers to the accompanying drawings.

The invention provides an electromagnetic signal sorting method, which is used for developing a reconnaissance data processing algorithm model of a radar pulse stream based on an embedded GPU aiming at the problem of intelligent processing design requirements of the signal pulse stream, and has the following specific requirements:

1) the signal patterns of simple pulse, positive linear frequency modulation and negative linear frequency modulation can be distinguished;

2) designing a computing frame and a development environment based on an embedded GPU;

3) designing a transplanting flow of a machine learning algorithm;

4) and the model integrates debugging and programming.

The technical indexes are as follows:

data rate: 2.5Gsps sampling rate;

pulse stream density: 1 ten thousand pulses/s (when 4 different parameter signals are input simultaneously);

signal pattern: simple pulse, positive linear frequency modulation and negative linear frequency modulation;

signal bandwidth: can reach 100 MHz;

when signal sorting is carried out, the traditional classical algorithm carries out serial operation in a CPU, when the data volume is large or the number of single simple operation cycles is large, great burden is brought, time overhead of program operation becomes large, the requirement on the real-time performance of signal processing cannot be met at present, so the traditional signal sorting algorithm is improved, namely the traditional signal sorting algorithm is transplanted to a new computing platform, and data capable of being processed in parallel is changed into a program capable of being operated in parallel as far as possible.

After a classical signal sorting algorithm is transplanted to a new parallel operation platform, the operation speed can be greatly improved.

The invention uses a basic parallel processing hardware platform NVIDIA Jetson TX2, Jetson TX2 is an upgrade of NIVDIA targeted artificial intelligence after Jetson TK1 and TX1 are introduced. The GPU and the CPU of the TX2 are upgraded, the memory is increased to 8GB, the storage is increased to 32GB, Wifi and Bluetooth are supported, the coding and decoding support H.265, and the size is small. According to the official introduction of NVIDIA, Jetson TX2 provides two modes of operation: one is MAX Q, the energy efficiency ratio is highest and is 2 times of that of previous generation TX1, and the power consumption is below 7.5W; the other is MAX P, the performance can be the highest, the energy efficiency ratio can be 2 times of that of the previous generation, and the power consumption is below 15W.

Writing a C language program corresponding to signal sorting, carrying out parallel processing on a matrix operation part and a clustering iteration part which can be in parallel in signal processing to form a CUDA-C program, and finally transplanting the CUDA-C program to a TX2 parallel processing environment.

The method comprises the specific processes of utilizing pulse arrival angle DOA, pulse frequency modulation rate K and carrier frequency RF parameters obtained by a frequency measurement and direction finding receiver and a radar signal preprocessor as rough signal sorting conditions, blocking radar data, reducing data quantity to be processed respectively, further reducing data complexity through centralization of blocked blind source radar signal data and parallelization of a whitening method, and solving the number of radar sources through a FAST-ICA algorithm. And then, an iteration initial point can be selected on the data block which is partitioned and subjected to noise point elimination in the preprocessing stage by utilizing the K value, so that the problems that the number of radar sources and the selection of the initial point are required to be obtained in K-means clustering are solved. And determining whether the sorted signal is an existing signal in the library or a new radar by comparing the sorted result with the radars in the library. The signal processed by the Fuzzy _ Matching method is used for illustration: the first item of data is the sorted signal, the second item is the matched signal in the sorted signal and the library, the third item is the matching reliability, and the fourth item is the new signal. Exactly the same result as the previous matlab program emulated on Windows.

The invention is described in detail below with reference to the drawings and the detailed description.

As shown in fig. 1, the general technical scheme flow of the electromagnetic signal sorting method provided by the invention comprises the following steps:

1) designing a calculation framework under software and hardware: designing a computing architecture according to the software and hardware environment of the parallel processing to be used;

2) model to code for a typical method: the code of the radar signal sorting algorithm is realized according to a mathematical algorithm model of various classical radar signal sorting algorithms, a classical serial flow is deduced according to the basic principle of the radar signal sorting algorithm, then the flow is optimized by using a parallel thought, and then the flow is combined;

3) modeling and transplanting a parameter measurement algorithm: debugging and running the collected radar data (obtained by simulation) on a built parallel processing platform;

4) and (3) correct and valid verification: and verifying the correctness and validity of the processing result.

The pulse stream output from the radar receiver system to the signal sorting processing system is a densely overlapped radar pulse parameter stream and a radar pulse time domain signal, and the signal sorting is the processing process of the signal pulse stream by the signal processing system and is used for separating the pulse train of each radar from the randomly overlapped pulse signal stream. The overall block diagram of the signal sorting machine can be obtained according to the definition of signal sorting, and the schematic diagram of the signal acquisition of the radar receiver is shown in fig. 2:

the main task of the preprocessor in fig. 2 is to receive a pulse stream signal (pulse description word PDW format) of the receiver, perform pulse parameter matching analysis on the randomly overlapped radar pulse signal stream according to the priori knowledge of the main characteristic parameters of the known radar radiation source, intra-pulse characteristic parameters and the like, and separate the known radar pulse train or subtract the adjacent and the existing radar pulses from the known radar pulse train, thereby achieving the purpose of diluting the pulse stream density; the main tasks of the main processor are to finish the main sorting, radar identification and threat level judgment of signals.

Fig. 1 is a general overview of the whole development process, and the technical solution of the present invention is further described below with reference to the drawings.

The embodiment of the invention provides an electromagnetic signal sorting method and system, which specifically comprise the following steps:

step one, designing a hardware architecture:

the missile-borne information processing system hardware architecture mainly comprises a GPU module to replace a traditional signal processing DSP module, and mainly comprises the GPU module (responsible for signal processing, data and image processing and the like), the DSP module (used for radar master control, radar resource scheduling management and the like), a radio frequency module (responsible for radar radio frequency access), an SRIO network module (responsible for data interaction), a backboard module and the like.

Step two, designing a software architecture:

in addition to the parallel processing guarantee of the GPU hardware chip, the parallel processing chip based on TX2 implements atomic operations or basic operations such as convolution, pooling, FFT, IFFT, and the like, which are common algorithms related to image processing, matrix operation, and neural network, through parallel software support layers such as CUDA, OPENCL, and the like. And distributed training and testing among multiple GPU chips can be realized through multiple modes such as model parallel and data parallel.

The data parallel means that training data is segmented, and a plurality of model examples are adopted to train the data of a plurality of segments in parallel. Parameter exchange is needed to complete data parallel, and the completion is usually assisted by a parameter server. In the training process, a plurality of training processes are independent from each other, the training result, namely the variation of the model, needs to be reported to a parameter server, the parameter server is responsible for updating the model to the latest model, and then the latest model is distributed to a training program so as to start training from a new starting point.

Data is divided into synchronous mode and asynchronous mode. In the synchronous mode, all training programs simultaneously train a batch of training data, and after completion, parameters are exchanged simultaneously after synchronization. After the parameter exchange is completed, all training programs have a common new model as a starting point, and then the next batch is trained. In the asynchronous mode, the training program completes a batch of training data, and immediately exchanges parameters with the parameter server without considering the states of other training programs. The latest results of one training program in asynchronous mode are not immediately reflected in the other training programs until they proceed to the next parameter exchange. The parameter server is a logical concept and is not necessarily deployed as a separate server. Sometimes, the parameter server is attached to a certain training program, and sometimes, the parameter server is divided into different fragments according to a model and is respectively deployed.

The model is divided into a plurality of fragments in parallel, and the fragments are respectively held by a plurality of training units and cooperate with each other to complete training. Communication overhead occurs when the input to one neuron comes from the output of a neuron on another training unit. In many cases, the communication overhead and synchronization consumption brought by model parallelism exceed data parallelism, so the speed-up ratio is not as high as that of data parallelism. However, for a large model which cannot be accommodated by a single-machine memory, model parallelism is a good choice. Unfortunately, neither data parallel nor model parallel can be extended indefinitely. When the number of training programs with parallel data is too large, the learning rate has to be reduced to ensure the stability of the training process; when the number of parallel fragments of the model is too many, the exchange amount of the output values of the neurons can be increased rapidly, and the efficiency is reduced greatly. Therefore, it is also a common solution to perform model parallel and data parallel simultaneously.

And step three, researching the existing serial rapid independent component analysis algorithm and the serial K-means clustering algorithm and providing a combination scheme.

The Fast-ICA algorithm, also known as the fixed-point algorithm, uses a batch-wise approach in each iteration step, and therefore processes a large amount of radar data at a time. The Fast-ICA algorithm has the advantages of high convergence rate and small steady-state error, but the sorting signal data speed is low. The invention is realized by the principle of a Fast-ICA signal sorting algorithm based on the maximization of negative entropy.

The implementation steps can be divided into three parts, namely a preprocessing part, a loop iteration part and a post-processing part.

1) Data pre-processing

And performing decentralized processing on the received radar pulse signals, and changing the observation signals into zero mean value variables. The operation of Fast-ICA algorithm can be simplified through the preprocessing, and certainly, if the operation is not carried out on the signal data at this step, the data can be sorted out, but the number of loop iteration times is larger.

Radar pulse signals obtained from a radar receiver all have correlation, and in order to reduce data dimension, reduce estimation parameters of the signals and improve the operation speed of an algorithm, whitening processing is generally carried out on received radar data so as to simplify operation.

It can be shown that the degree of freedom of the whitened hybrid matrix is reduced by half, and the workload for the subsequent separation of independent components is reduced by half.

2) Iteration of loop

The Fast-ICA sorting algorithm takes the maximization of negative entropy as an iteration direction, and then a separation matrix is obtained so as to separate each signal. Entropy is a measure of uncertainty, and the entropy of a gaussian variable of a random variable that can yield equal variance is also the largest. When the non-Gaussian metric reaches the maximum in the signal sorting process, the algorithm is indicated to complete the separation of the radar components.

3) Post-treatment

And (4) counting the sorting time of the source signals, counting the number of the source signals and preparing for subsequent optimization.

The K-Means clustering algorithm is derived from a vector quantization method in signal processing, and is popular in the fields of big data processing and the like as a clustering analysis method at present. The K-Means clustering signal sorting algorithm divides n radar data points into K radar clusters, so that points with the same attribute all belong to one cluster corresponding to the mean value nearest to the points. The K-Means algorithm uses Euclidean distance as a clustering criterion.

The K-Means clustering algorithm is an unsupervised machine learning method and can automatically classify similar objects into the same cluster, and the principle and the implementation of the K-Means clustering signal sorting algorithm are introduced below.

The clustering algorithm is a high-efficiency, simple and easy-to-implement machine learning algorithm, and the K-Means sorting algorithm can independently sort radar sample data without training of a large amount of data.

The working steps of the sorting algorithm based on the K-Means clustering signal are as follows:

1) assume that the samples are divided into 2 classes and the initial centers of the 2 classes are randomly selected;

2) in the kth iteration, the distance from any sample to the 2 centers is obtained, then the 2 distance values are compared, and the smaller sample is labeled with the corresponding class;

3) calculating the average value of all data in the corresponding class and taking the average value as the central value of the class;

after the iteration of the second step and the third step is updated, the change of the central value is judged, if the change is smaller than the expected value, the iteration is ended, otherwise, the step 2 and the step 3 are continuously executed.

And the K-Means clustering sorting algorithm carries out clustering sorting on similar signals based on parameters in the signal pulse description words. The K-Means clustering algorithm is developed from a 2-mean clustering algorithm, and only the number of clustering centers 2 is changed into K. The operation steps of the K-Means clustering sorting algorithm are as follows:

1) selecting the sorting parameters for use as a cluster sorting algorithm. The DOA of each part signal in the radar pulse signal does not change suddenly and can be used as a preliminary sorting parameter, and three parameters of { RF, PW, PA } are selected as parameters of cluster sorting.

2) And normalizing the radar pulse signals. By normalizing the data, the influence of data with larger dimensions (such as carrier frequency) on the sorting result of the signal can be eliminated. The carrier frequency is used as a reference, and other 2 radar pulse parameters are also normalized in the same way, wherein the carrier frequency data normalization formula is as follows:

in the formula of RF_max、RF_minRespectively acquiring the maximum value and the minimum value of carrier frequency in the signal by the radar receiver; RF (radio frequency)_iIs the radar radio frequency signal after normalization.

3) Initializing the cluster center. The center value of each cluster can be set randomly.

4) Iteration. The euclidean distance of each sample from the center of the respective cluster is calculated and the label of the sample is labeled as the one with the smallest euclidean distance.

In the formula of RF_i、PW_i、PA_iIs a parameter value of the signal pulse, where i 1, 2.. n denotes n radar data;

parameter values for each cluster center, where j 1, 2.. k is k cluster centers; d_jThe euclidean distance of each radar datum from the cluster center.

5) Update cluster center. And summing and calculating the average value of the whole new cluster, and updating the cluster center.

In the formula, m_jIs the number of samples in the jth cluster; PDW_iRepresenting a signal for the ith signal pulse description word; c_jData representing the jth cluster center.

6) Judging. And (4) calculating the square error of the central value of the new cluster and the central value of the old cluster, and if the square error does not reach the expected value and is less than the iteration number, repeating the steps (3) and (4) until the square error meets the condition.

E_r＝abs(C_t+1-C_t)²Formula (11)

In the formula, E_rThe squared error value of the t +1 th iteration and the t th iteration is obtained; c_t+1Is the value of the t +1 th cluster center.

According to the steps, the time complexity of the K-Means clustering signal sorting algorithm is low:

time ═ o (nkt) formula (12)

Wherein n is the number of radar pulse samples; k is the number of clustering centers; t is the number of iterations required; time is the time complexity of the mean clustering algorithm.

As can be seen from the equation (12), the clustering algorithm runs the speed block, and is easy for multi-platform transplantation. Therefore, based on the steps, firstly, the serial K-Means mean value clustering signal sorting algorithm is realized by using the C + + programming language to verify the effectiveness of the clustering sorting algorithm in signal sorting, a flow chart of a K-Means clustering signal sorting algorithm program is provided, and as shown in FIG. 7, a CPU-based program is written based on the flow chart. The steps of the algorithm can be visually seen, wherein most of the time is consumed by the iteration part of the loop. Then, the algorithm is used for sorting the simulated radar pulse data, and the part with large time overhead is parallelized so as to accelerate the sorting speed of the radar data.

The data clustering iterative operation in the K-Means clustering signal sorting algorithm consumes the most time, and occupies more than 99% of the total time of the clustering algorithm. According to Amdahl law, data clustering iteration is the key point of the parallel optimization of the part.

Radar number sorting algorithms based on Fast-ICA or K-Means clustering can successfully sort out radar pulse signals of all parts, but various problems exist in the sorting algorithm based on only one signal.

The Fast-ICA-based signal sorting algorithm has the advantages that unknown signal sources can be separated without knowing the priori knowledge of the signal sources in a complex radar electromagnetic environment, the influence of noise can be eliminated, and the speed is low and the implementation is complex when a large amount of signal data are sorted. The signal sorting algorithm based on the K-Means clustering has the advantages of simple algorithm implementation, high operation speed and good local sorting effect. But the disadvantage is that the priori knowledge of the radar radiation source, such as the initial clustering center number k, needs to be known in advance, and if an improper k value is selected, a poor clustering result is caused; also, the number of iterations of cluster sorting is affected by the initial cluster center value, and interference of isolated point noise cannot be eliminated. The advantages of the two can be well complemented and fused in the signal sorting process, so that a radar pulse stream interference sorting parallelization acceleration method based on Fast-ICA K-Means clustering is designed and realized.

The K-Means clustering signal sorting algorithm based on independent component analysis integrates the advantages of the 2 algorithms, and can quickly sort out unknown radar pulse signals in a complex electromagnetic environment. The algorithm firstly carries out quick independent component separation on part of radar pulse time domain signals, can separate each part of radar pulse signals and random noise, and eliminates random noise vector columns after mean value judgment. Then providing effective K value for the next mean value clustering signal sorting algorithm and eliminating isolationNoise, K-Means clustering signal sorting algorithm based on pulse description word PDW_i＝{RF_i,PW_i,PA_iAnd rapidly clustering and sorting a large number of radar pulse signals. The fusion algorithm has the following working steps:

1) signal data is obtained from a radar receiver, and since the radar position does not change every moment, radar pulse stream data is first divided every 15 degrees by the direction of arrival (DOA), and the following operations are sequentially performed.

2) And randomly taking out part of signal data of the signals subjected to DOA sorting, and performing Fast independent component analysis on the signals by using a Fast-ICA sorting algorithm to separate the signals and noise and obtain the signal types.

3) And obtaining a K value required by a clustering signal sorting algorithm in the last step, and quickly clustering and sorting the radar pulse signals based on three parameters of the radar pulse description words RF, PW and PA.

4) And (5) storing each data cluster to a hard disk or handing each data cluster to a subsequent algorithm for processing the sorting result.

Based on the working steps of the signal sorting algorithm, a program flow chart of the fusion algorithm is given as shown in fig. 3. In modern overlapping and complex electromagnetic environments, the data volume of radar receivers grows exponentially. The algorithm realized based on the CPU has more loop iteration times and slower operation speed, and is parallelized and realized on a GPU based on a CUDA framework in order to quickly and accurately sort a large number of signals, so that the sorting of the signals can be greatly accelerated. FIG. 3 is a flow chart of a serial Fast-ICA-based K-Means clustering signal sorting algorithm program.

And step four, respectively providing parallelization implementation ideas and methods of the Fast-ICA radar signal sorting algorithm and the K-Means radar signal sorting algorithm, and fusing the parallelization implementation ideas and the parallelization implementation methods to obtain an algorithm with a faster and better sorting effect.

The principle and serial implementation of the Fast-ICA-based K-Means clustering sorting algorithm, which can accurately sort out unknown signals, is studied and implemented, but the running time is too long when a large amount of signal data is sorted out. The method is to analyze the part which takes long time in the algorithm based on the algorithm and realize the part on the GPU so as to accelerate the operation efficiency of the sorting algorithm and shorten the signal sorting time. A novel radar pulse flow interference sorting parallelization acceleration method is realized, and the algorithm can be used for quickly sorting unknown signal sources.

After a parallelization Fast-ICA-based signal sorting algorithm and a parallelization K-Means clustering-based signal sorting algorithm are respectively realized, the 2 algorithms are fused and complemented, the Fast and real-time characteristics of K-Means clustering signal sorting are fused through the capabilities of Fast-ICA unknown signal source sorting and noise separation, and then the algorithm is operated on a GPU to obtain a correct result. The specific implementation is described below.

The Fast-ICA algorithm, also known as the fixed-point algorithm, is an independent component analysis algorithm that requires little a priori knowledge and can separate unknown radiation source signals. Parallelizing as many algorithm steps as possible, and implementing the algorithm steps on a GPU to accelerate the sorting speed of the radar data. In order to realize parallelization of Fast-ICA signal sorting algorithm, the method is carried out according to three steps of task division, performance optimization and result verification. Firstly, the Fast-ICA algorithm realized based on the CPU comprises five steps:

1) inputting diluted radar pulse stream data divided every 15 degrees according to the direction of arrival (DOA), reading the data into a host memory from a hard disk, and handing the data to a CPU for processing;

2) data preprocessing, namely performing centralization and whitening processing on radar data obtained from a radar receiver;

3) performing cycle iterative operation on radar sample data by Fast-ICA iterative data, stopping iteration after a control condition is reached, and obtaining a separation matrix;

4) data post-processing, performing matrix multiplication and solving radar source signals;

5) and outputting the data, and writing the data back to the hard disk by using a file writing command and storing the data.

The data exchange of the step 1 and the step 5 cannot be parallelized and must be completed at a CPU end, and the CPU starts a CUDA library function, cudammalloc () to allocate a video memory for the data at a GPU end and starts a cudammcmpy () function to transfer the data to the GPU end. Wherein the 2 nd, 3 rd and 4 th steps are the main algorithm data processing part of Fast-ICA. The time overhead of data loop iteration occupies more than 96% of the total algorithm, so the loop iteration part is the key point of parallelization. The Fast-ICA algorithm is parallelized integrally in consideration of the time overhead of data exchange between a CPU end and a GPU end, and a parallel implementation scheme is given as follows:

1) data pre-processing parallelization

The data preprocessing stage is to perform centering and whitening processing on the radar pulse signals to remove the correlation among the radar data. Where data whitening is simply a matrix multiplication where a library function cublasDgemm () is called to complete the matrix multiplication operation. The centralization step of data processing is based on the cycle completion of each data search processing in CPU and can make its time complexity directly become 1 by opening up n x p threads in GPU and using each thread to process one data. Since the data sample point is too large and the GPU is not enough to afford to take as large a thread, 256 threads are allocated for each block, then the data-centric partial speed-up ratio is as follows:

Speed_center(n p)/256 type (13)

Speed in the formula_centerAn acceleration ratio representing data neutralization; n and p respectively represent the data dimension (radar number) and the data sample number.

And parallelizing the data centralization part, and parallelizing the step by adopting Single Instruction and Multithreading (SIMD).

2) Iterative processing of data loops

Since the iterative processing of the data loop occupies a major part of the time overhead, this part is the focus of the parallel optimization. Through analysis, the method comprises a large number of matrix operations such as matrix transposition, matrix multiplication, matrix eigenvalue and eigenvector calculation. The loop part in the loop iteration processing is refined as follows, and the operation steps are as follows:

1、Z₁←W*Z

2、Z₂←g(Z₁)；Z₂'←diag[g'(Z₁)*1_p]*W

3、W₂←Z₂Z^T

4、W₁←W₂-Z₂′

5. to W₁And carrying out singular decomposition and judging whether iteration is finished.

The main operations in the above functions include multiplication of large matrices like nn np and np, subtraction of nn-nn matrices and summation of matrix columns. The above operation is realized by calling the library function of the official CUDA and the CULA TOOLS. The library functions used are as follows:

CublasDgemm (): function of double precision matrix multiplication. The function is based on the CUDA matrix multiplication operation, and the following operations are completed on input matrixes A and B and an output matrix C:

c ═ α · op (a) × (b) formula (14)

Wherein α represents a constant; op (a) denotes matrix a or a transpose of matrix a.

CublasDgel (): a double precision matrix transpose operation that is a CUDA based matrix transpose operation.

The GPU-based matrix operations and MATLAB-based run times are as follows:

TABLE 1 matrix multiplication of CPU and GPU time

The operations in the table above are the time taken to implement a matrix multiplication operation on matlab with the library function implemented on GTX860M, respectively. The matrix operation is C ═ A^T，A、A^TRespectively a matrix and a transpose of the matrix.

As can be seen from Table 1, the matrix multiplication operation completed by the library function based on the GPU has greatly shortened running time and acceleration ratio even reaching 60 times compared with the matlab matrix multiplication operation. When the library function of matrix multiplication is directly used, the difference between the data dimensions n and p is often large, and the above table shows that when the number p of radar samples is large, the acceleration is small due to the limitation of hardware resources. The matrix multiplication is performed in blocks to achieve the best performance of the Fast-ICA sorting algorithm.

CulaDeviceDgeev (): and (4) solving a function of the eigenvalue and the eigenvector of the matrix based on the CUDA. The function is a tool provided by an open source community of CULA TOOLS, is a function for solving matrix eigenvalues and eigenvectors based on a GPU, and has an acceleration ratio up to dozens of times higher than that of the traditional eigen () function.

Based on the above 3 library functions, most of the matrix operations can be completed on the GPU, and the main loop iteration can be completed.

3) Data post-processing

That is, the separation matrix is multiplied by the radar sample data to obtain a radar sorting signal, which can be implemented by using a matrix multiplication library function cublasDgemm (). The data is then analyzed and submitted to further processing.

According to the Fast-ICA principle and the implementation steps described above, a program flow diagram of the CUDA-based Fast-ICA algorithm is given, as shown in FIG. 4:

a large iteration involved in the K-Means clustering signal sorting algorithm is the Euclidean distance from each data point to the clustering center, and the kernel function is written to carry out parallel optimization on the part. The method is realized according to the principle and the series of a K-Means clustering signal sorting algorithm. Parallelizing a sorting algorithm based on K-Means signals according to task mode division, and decomposing the algorithm into the following steps:

1. initializing class labels per radar data

2. Clustering iteration

a) Counting Sum of radar data of each type;

b) counting the total number m of each type of radar data;

c) calculating the clustering center of each class:C_i＝Sum_i/m_i；

d) solving the Euclidean distance between each radar data and each cluster center;

e) and updating the class labels of the radar data according to the Euclidean distance between each radar data and the clustering center.

3. And finishing signal sorting and transmitting the sorted data from the GPU end to the CPU end for storage.

According to the steps, each step is parallelized, and an implementation scheme of the algorithm parallelization is given:

1. initializing class labels for each sample

Since this step is performed only once, no optimization emphasis is made, and 256 threads are allocated to each block for a total of (n +255)/256 grid. According to the Amdahl rule, the overhead of clustering iterative operation in the algorithm occupies more than 90% of the total time, so that the second step is the optimization key point.

2. Clustering iterative parallelization

a) The sample and Sum _ Cluster () of each Cluster are counted:

in this scheme, the sample sum for each cluster is solved here by using an atomic operation. Since the data for each cluster is scattered, the delay is hidden by increasing the number of threads. Through experimental comparison, it was assigned:

Block：(16，16)；

Grid：((w+15)/16，(n+15)/16)；

where w is the dimension of the sample radar data. By increasing the number of threads, the latency of accessing memory can be efficiently hidden with the warp scheduler.

b) Counting the number of samples Count _ numincruster () -of each cluster:

and calculating the number of samples of each cluster, and realizing the accumulation operation of each variable through an atomic operation. Reading and writing to the global memory may be reduced by allocating K shared memories for each block. Calculating the required thread number by Occupanacy _ Call according to the size of K so as to obtain the maximum hidden access delay:

Block：1024；

Grid：(n+1023)/1024；

wherein n is the number of sample data.

c) Calculate the average of all samples for each cluster and take it as the cluster center: sample and/sample number Scale _ Cluster ():

the part is a simple division operation, the calculation amount of each thread is small, and the thread is allocated to the part:

Block：(16，16)；

Grid：((w+15)/16，(k+15)/16)；

where w is the dimension of the radar pulse data and k is the number of radar sources.

d) Calculating the Euclidean distance distOfCluster () of each sample from each cluster center:

most of the operations in the K-Means clustering algorithm are concentrated, so that the step is the key point of optimization. In this scheme, each block is allocated an appropriate size of shared memory (approximately K) so that it can calculate the euclidean distance between multiple radar data points and the cluster center at a time. Storing the cluster centers in shared memory speeds up data processing. When the radar data is copied from the video memory to the shared memory, because the matrix is stored in columns in the video memory, the data reading mode is as follows:

by storing data in a continuous memory area, the data can be merged and read, the memory access bandwidth of a memory can be effectively utilized, and the thread number is distributed to the function:

Block：(16，16)；

Grid：((k+15)/16，(n+15)/16)；

wherein k is the number of radar sources and n is the number of radar pulse parameters.

e) Updating the class label Update _ ObjClusterIdx () of each sample according to the Euclidean distance between the sample and the cluster center:

in the step, by comparing Euclidean distances between radar data and each clustering center, a minimum distance value is found and updated to a corresponding class label, and a one-dimensional thread is developed for the minimum distance value:

Block：256；

Grid：(1，(n+255)/256)；

where n is the number of radar data. If the number of the cluster centers is more than 16, the part can be optimized, and the minimum value is searched by a reduction method.

The method comprises the steps of performing task blocking on a K-means clustering signal sorting algorithm, gradually decomposing the signal clustering sorting algorithm and parallelizing the signal clustering sorting algorithm, and then realizing the algorithm based on a CUDA C language, wherein a program flow chart is shown in FIG. 5:

FIG. 5 is a flowchart of the K-Means clustering algorithm procedure under the CPU + GPU architecture. It can be seen that data is first prepared at the CPU side and copied to the GPU side. And then, clustering and sorting the radar sample data at the GPU side. And after the clustering is finished, copying the sorted data from the GPU end to the CPU end and storing the data in a hard disk, and finally releasing the video memory space by the CPU for preparing for next data sorting.

Parallelization of the Fast-ICA signal sorting algorithm and the K-Means clustering sorting algorithm is realized. There are various disadvantages to using either the Fast-ICA algorithm or the K-Means clustering signal sorting algorithm alone. Therefore, a fused signal sorting algorithm, namely a K-Means clustering radar pulse stream interference sorting parallelization acceleration method based on Fast-ICA, is researched, the algorithm integrates the advantages of quick independent component analysis and separation of unknown signals and quick K-Means clustering algorithm, and signals can be well sorted. The fused signal sorting algorithm is based on a CPU + GPU heterogeneous architecture, and when the same radar data is processed, the algorithm is shorter in time consumption and higher in efficiency.

The parallelization Fast-ICA-based signal sorting algorithm and the parallelization K-Means-clustering-based signal sorting algorithm are respectively realized. The 2 algorithms are fused and complemented, so that unknown signals can be rapidly and accurately sorted. The following gives a parallel flow chart:

fig. 6 is a flow chart of a parallelization signal sorting algorithm of K-means clustering based on rapid independent component analysis, and it can be seen that the main signal sorting algorithm is completely parallelized and then data is processed on the GPU, and the CPU only does the skipping and data distribution work of some control statements. It can be seen that the algorithm has the working steps of:

The above is a detailed working step of the present signal sorting algorithm, wherein in the third step, the discrimination signal and noise are judged using expectation, because the noise is randomly scrambled and its expectation is 0, and by setting a threshold in advance, if the expectation of the separated signal is less than the threshold, it is judged as noise and filtered out. For this step, the separated signals can be averaged by an atomic operation, but the expectation value is achieved by a method of sum reduction in consideration of performance.

The parallelization principle and the implementation process of the signal sorting algorithm are described in detail above. Firstly, the steps of the signal sorting algorithm are decomposed and the main operation is realized on a GPU (graphics processing unit) based on the signal sorting algorithm of the rapid independent component analysis, the CPU only carries out the jump and control of some instructions, and the original circulation iteration processing data is divided into a plurality of threads for decomposition through multithread processing data. Simplifying program design and improving data processing efficiency based on the CUDA library function, and finally providing a parallel program flow chart of the algorithm. Secondly, aiming at the parallelization of the K-means clustering signal sorting algorithm, data are decomposed and processed in a multi-thread single-instruction mode, the time complexity of the algorithm is reduced to O (knt/Num), a parallel program of the K-means clustering signal sorting algorithm is designed according to a parallel program flow chart, and the signal sorting time is greatly shortened through comparison research. And finally, 2 algorithms are fused based on a CPU + GPU architecture, so that the parallelization acceleration method for radar pulse stream interference sorting is realized, radar radiation source signals can be rapidly sorted out by the algorithm, and a new solution is provided for signal sorting in the current era.

Finally, it should be noted that the above embodiments are only used for illustrating the technical solutions of the embodiments of the present invention and not for limiting, and although the embodiments of the present invention are described in detail with reference to the above preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions can be made on the technical solutions of the embodiments of the present invention without departing from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. An electromagnetic signal sorting method, wherein a signal processing system of an electromagnetic signal comprises a radar signal preprocessor and a radar signal main processor, wherein the radar signal main processor adopts a CPU + GPU structure, and the acceleration method comprises the following steps:

step 7, performing clustering sorting iteration processing on the PDW of the signals at the GPU end, calculating Euclidean distances from the normalized sample points to each initial point, performing clustering after comparison, transmitting a control cycle mark to the CPU end for judgment after each cycle is completed, entering step 8 if the cycle condition is met, and continuing iteration in the GPU if the cycle condition is not met;

2. The sorting method according to claim 1, wherein the sampling data in step 1 is sampling data obtained by performing analog-to-digital conversion a/D on pulse signal data of various systems received by the radar signal receiver.

3. The sorting method of claim 1, wherein sampling raw radar data received by the radar signal receiver in step 1 comprises: and sampling the original radar data received by the radar signal receiver in angle and time.

4. The sorting method according to claim 3, wherein the angle-divided sampling of the raw radar data received by the radar signal receiver means that the raw radar data received is divided every 15 degrees by the DOA.

5. The sorting method according to claim 1, wherein the preprocessing in the step 6 comprises: the radar preprocessor is used for performing intra-pulse feature extraction on the signal pulse by using rapid independent component analysis aiming at the signal pulse obtained by the frequency measurement receiver and the direction measurement receiver, acquiring the number of signal sources and eliminating the influence of noise.

6. The sorting method of claim 5, wherein the radar signal preprocessor receives a pulse stream signal PDW from the radar receiver, performs pulse parameter matching analysis on the randomly overlapped radar pulse signal stream, and separates a known radar pulse train or subtracts adjacent and squared radar pulses from the known radar pulse train.

7. The sorting method according to claim 1, wherein the step 2 comprises the sub-steps of:

8. The sorting method according to claim 7, wherein the iterative loop algorithm in step 3 comprises the sub-steps of:

step 3.1, Z₁←W*Z

Step 3.2, Z₂←g(Z₁)；Z₂'←diag[g'(Z₁)*1_p]*W

Step 3.3, W₂←Z₂Z^T

Step 3.4, W₁←W₂-Z₂'

9. The sorting method of claim 1, wherein the step of removing noise in step 5 comprises: distinguishing the signal and the noise, determining the expected value of the noise to be 0, and judging the noise and filtering the noise when the expected value of the separated signal is smaller than a threshold value.

10. The sorting method of claim 1, wherein the cluster sorting iteration of step 7 comprises the sub-steps of:

step 7.1, counting samples of each cluster;

step 7.2, counting the number of samples of each cluster;