CN117828396A

CN117828396A - Modulation identification and model training method based on WCAN

Info

Publication number: CN117828396A
Application number: CN202311724965.0A
Authority: CN
Inventors: 魏蛟龙; 唐祖平; 冯缘; 彭克潇; 王思芮; 赵峥
Original assignee: Huazhong University of Science and Technology
Current assignee: Huazhong University of Science and Technology
Priority date: 2023-12-14
Filing date: 2023-12-14
Publication date: 2024-04-05

Abstract

The invention discloses a modulation identification and model training method based on WCAN, and belongs to the field of wireless communication. The invention designs a window attention convolution network for identifying a signal modulation mode, which divides an input IQ signal into a plurality of local windows with different scales, performs attention calculation in the local windows, which is formed by taking deep convolution, deep expansion convolution and point state convolution as cores, and performs identification output after linear combination of local features obtained by calculation; through experiments of an open source data set, the method is superior to other convolution models and a transducer model in the aspect of overall classification accuracy of a modulation recognition model based on a window attention convolution network.

Description

Modulation identification and model training method based on WCAN

Technical Field

The invention belongs to the field of wireless communication, and in particular relates to a modulation identification and model training method based on WCAN.

Background

Modulation classification tasks are of significant importance in the field of wireless communications, especially in a modern communication environment that is diverse and highly complex. By accurately identifying different modulation modes, the process of signal demodulation and data recovery can be optimized, so that efficient transmission of signals and improvement of the performance of the whole communication system are realized. This is particularly critical in resource constrained, high interference or multi-user environments. Particularly, without prior information, how to accurately identify the incoming signal modulation mode becomes a crucial basic function.

In recent years, deep learning networks originally used for image recognition have achieved significant results in modulation recognition, wherein global self-attention in a transducer architecture provides opportunities for each element in a sequence to interact with all other elements, facilitating capturing long-distance dependencies. However, when a transducer is applied to the IQ-path signal classification task, the transducer faces several challenges: first, while self-intent has the ability to process multidimensional data, it does not fully exploit joint structures in IQ-path signals to maximize feature capture capability and robustness; second, it may not be as efficient as mechanisms such as CNN or RNN in processing local features, which makes it difficult to decode fine structures and transient characteristics inside the signal.

Convolutional Neural Networks (CNNs) exhibit excellent performance in capturing local and structured features with their unique spatial hierarchy and fixed receptive fields. However, CNNs may require more parameters and calculations in order to capture longer dependencies or process multi-scale information, which introduces greater complexity and computational cost.

Disclosure of Invention

In order to meet the above defects or improvement demands of the prior art, the invention provides a modulation identification method based on WCAN and a model training method thereof, and aims to improve the modulation identification performance based on a deep learning network.

To achieve the above object, in a first aspect, the present invention provides a modulation recognition model training method based on WCAN, the method comprising:

constructing a modulation recognition model based on WCAN, wherein in the model, IQ signals sequentially pass through a plurality of stages to extract local features with different scales, and the local features are subjected to recognition output after linear combination; each stage comprises a block embedding layer and a signal attention block; the block embedding layer divides an input signal into a plurality of local windows, the signal attention block performs attention calculation in the local windows, wherein the attention calculation is formed by taking depth convolution, depth expansion convolution and point state convolution as cores, and features in the local windows are extracted;

setting training parameters, and training the model by using training data until reaching the training ending condition.

Preferably, the signal attention block comprises a plurality of signal attention layers, the signal attention layers comprise attention sublayers and multi-layer perceptron sublayers based on a depth expansion convolution attention mechanism, the attention sublayers and the multi-layer perceptron sublayers are connected by residual errors, and output data of the attention sublayers are input into the multi-layer perceptron sublayers.

Preferably, the operation process of the signal attention layer is as follows:

Atten2＝MLP(LN(Atten1))

wherein Output and Input represent Output and Input respectively; LN () represents a normalization operation; a is thatthe entry () represents an attention sub-layer operation based on a depth-expanded convolution attention mechanism;representing an element-level product; MLP () represents a multi-layer perceptron operation.

Preferably, the operation process of the attention sub-layer is as follows:

Output1＝Conv1D(DDCA(GELU(Conv1D(X _in ))))

wherein Output1 is the Output, X _in Conv1D () represents a one-dimensional convolution operation as input; GELU () represents an activation function; DDCA () represents a depth-expanded convolution attention operation.

Preferably, the depth expansion convolution attention operation process is as follows:

Atten 3＝Conv ₁ (DDConv(DConv(in)))

wherein Output2 is the Output; in is input; DConv () represents a deep convolution operation; DDConv () represents a depth-expansion convolution operation; conv ₁ () Representing point state convolution operation; atten3 is the weight of each feature of the input signal sequence;representing the element-level product.

Preferably, the operation process of the block embedding layer is as follows:

P _i ＝Norm(Conv1D(X))

wherein, norm () represents a batch normalization operation; conv1D () represents a one-dimensional convolution operation; x represents an input; p (P) _i Representing the output of the block embedding layer in the i-th stage;

pi is the shape [ C ] _i ，Len/n _i ]Tensor of (C), where C _i Representing the number of channels output; len/n _i Representing the size of the output window; len represents the length of the IQ signal; n is n _i ＝4×2 ^i-1 ；

The number of the input and output channels of the signal attention block is C _i 。

Preferably, the IQ signal sequentially goes through a plurality of stages to extract local features of different scales.

In a second aspect, the present invention provides a modulation identification method based on WCAN, the method comprising:

receiving an IQ signal to be identified and preprocessing the signal;

performing modulation recognition on the preprocessed IQ signal by using a pre-trained modulation recognition model;

outputting the identified modulation mode;

wherein the modulation recognition model is trained according to the method of any one of claims 1-7.

In a third aspect, the present invention provides an electronic device comprising: a memory for storing a program; a processor for executing a memory-stored program, the processor being for performing any one of the methods described in the first aspect or for performing the method described in the second aspect when the memory-stored program is executed.

In a fourth aspect, the present invention provides a storage medium storing a computer program which, when run on a processor, causes the processor to perform any one of the methods described in the first aspect, or to perform the method described in the second aspect.

In general, the above technical solutions conceived by the present invention have the following beneficial effects compared with the prior art:

the invention provides a model structure of a window attention convolution network (Window Convolution Attention Network, WCAN) aiming at the modulation identification problem, wherein the network divides an input IQ signal into a plurality of local windows with different scales, and local attention calculation which takes depth convolution, depth expansion convolution and point state convolution as cores is carried out in the windows; the WCAN not only can effectively capture local structure information, but also has the capability of processing long-distance dependence, and shows remarkable superiority in a modulation signal classification task; in the experiment of 4 open source datasets, WCAN was superior to the other seven CNN, RNN, CNN +rnn mix and transducer based models in terms of overall classification accuracy. In particular on the hisarmod2019.1 dataset, WCAN achieved a minimum recognition accuracy of at least 85% and an overall recognition accuracy of more than 95%.

Drawings

Fig. 1 is a schematic diagram of the overall structure of a WCAN in an embodiment of the present invention;

fig. 2 is a block diagram of a signal attention block and a block diagram of DDCA in an embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

The terms "first" and "second" and the like in the description and in the claims are used for distinguishing between different objects and not for describing a particular sequential order of objects. For example, a first channel and a second channel, etc. are used to distinguish between different channels, and are not used to describe a particular order of channels.

In embodiments of the invention, words such as "exemplary" or "such as" are used to mean serving as an example, instance, or illustration. Any embodiment or design described herein as "exemplary" or "e.g." in an embodiment should not be taken as preferred or advantageous over other embodiments or designs. Rather, the use of words such as "exemplary" or "such as" is intended to present related concepts in a concrete fashion.

In the description of the embodiments of the present invention, unless otherwise specified, the meaning of "plurality" means two or more, for example, the plurality of stages means two or more stages and the like; the plurality of block embedding layers means two or more block embedding layers and the like; the plurality of signal attention layers means two or more signal attention layers. Next, the technical scheme provided in the embodiment of the present invention is described.

First, the window attention convolutional network (Window Convolution Attention Network, WCAN) proposed by the present invention is introduced:

as shown in fig. 1, the overall structure of the WCAN is shown, and the input 1Q signal sequentially passes through four similar stages and a linear head, and finally is output as modulation type of IQ signal; wherein each stage is composed of a block embedding layer and a signal attention block, and the signal attention block comprises L _i (i=1, 2,3, 4) signal attention layers, 4 stages with resolution decreasing layer by layer, 4 stages outputting C respectively ₁ ×(Len/4)，C ₂ ×(Len/8)，C ₃ X (Len/16) and C ₄ High-dimensional signal of x (Len/32); wherein Len is the length of the input IQ signal; c (C) _i The number of dimensions projected by the block embedding layer in the i stage is also the number of channels input and output by each layer in the signal attention block.

The block embedding layer divides the input signal into a plurality of local windows according to different scales, the block embedding layer receives the low-dimensional signal, the low-dimensional signal is converted into high-dimensional blocks with set dimensions through a one-dimensional convolution layer, and the blocks are transmitted to other parts of the network for further processing. The part can also realize embedding in different stages by adjusting parameters so as to realize that the network captures the characteristics of different scales in different stages.

In stage 1, the Patch Embedding portion first accepts an input signal tensor X

Wherein I is _i And Q _i Amplitude values of real part and imaginary part of signal sampling point data respectively, I _Len Is the sampled signal length, T represents the matrix transposition; the block embedding process can be expressed by a mathematical formula as follows:

P _i ＝Norm(Conv1D(X))

wherein, norm () represents a batch normalization operation for normalizing the convolution output; conv1D () represents a one-dimensional convolution operation, the output of which in this embodimentThe number of the input channels is 2, and the number of the output channels is C ₁ The convolution kernel size is 7, the step is 4, and the padding is 3, so that the output length of the 1 st stage block embedding layer can be calculated as follows:

wherein,is rounded downwards; stage 1 block embedded output P ₁ Is a shape of [ C ] ₁ ，Len/4]Tensors of (c).

In the rest stage, the operation forms of the block embedding layers are similar, but the number of input channels is C _i-1 The number of output channels is C _i The convolution kernel size is 3, the step is 2, the padding is 1, and the output length can be calculated as:

Len _i ＝＝Len _i-1 /2

thus tensor P after conversion _i The shape of (C) _i ，Len/n]Where n=4, 8, 16, 32.

The signal Attention block is a part for extracting signal characteristics, wherein the structure of the signal Attention layer is shown in fig. 2, and replaces the multi-head Attention part of a transducer decoder with a depth expansion convolution Attention (DDCA) to participate in the constructed Attention sub-layer (Attention);

the whole is composed of an Attention sub-layer (Attention) and a multi-layer perceptron sub-layer (MLP), each sub-layer adopts batch normalization (LN), dropPath and parameterized scaling factors are also applied in codes, and finally, the sub-layers are combined with the original input through residual connection. The Attention sub-layer (Attention) consists of 2 layers of one-dimensional convolutions, gel activation functions, DDCA.

DDCA aims to capture local dependencies in a signal by convolutionally replacing local self-attention in Swin-fransformer, but splits the convolution into a structure of deep-expanded convolution + point state convolution based on high parametric efficiency, enhanced receptive field to achieve computational efficiency.

Let us assume that we have the following input sequence:

in＝[x ₁ ，x ₂ ，x ₃ ，…，x _L ]

wherein each x _i Are all C _in Vector of dimensions, L is the sequence length. In DDCA, the input sequence first enters a deep convolutional layer (d·conv) that performs a convolutional operation specifically for each channel to minimize the number of parameters, and next, we introduce a deep-expanded convolutional layer (d·d·conv) to expand the receptive field of the model to capture more distant context information in the input signal. Subsequently, we recombine these features by a point state convolution (1·conv), generating a table of attention weights:

Atten 3＝Conv ₁ (DDConv(DConv(in)))

DConv () represents a deep convolution operation; DDConv () represents a depth-expansion convolution operation; conv ₁ () Representing point state convolution operation; the value in Atten3 represents the importance of each feature of the input sequence.

Finally, performing element-level multiplication on the obtained attention weight and the input signal to obtain final output so as to realize enhancement of key features:

note that in each link of Block, the shape consistency of the input and output is maintained by means of zero padding or the like.

Based on the parameters and learning costs, 3 architectures of WCAN-T, WCAN-S, WCAN-B were designed, the details of which are shown in Table 1.

TABLE 1

In the WCAN model, the initialization parameters of the DDCA portion are listed in Table 2.

		Convolution kernel size	Filling	Expansion factor
					1	Point state convolution	1	0	0
2	Depth expansion convolution	7	9	3
					3	Depth convolution	5	2	0

TABLE 2

And then analyzing the effectiveness of the WCAN model in the signal modulation identification direction based on experiments:

1. experiment initialization:

data set: we performed a verification of the method on the four main IQ signal datasets listed in table 3.

During the experiment we only have to make appropriate adjustments to the input dimensions (e.g. 2-channel 128-bit or 2-channel 1024-bit) and output dimensions (e.g. 11, 10, 24 or 26 modulation schemes) of each model to fit the input data. Notably, all models maintained the same intermediate parameter settings throughout the experiments for all data sets.

The detailed setting: all training and prediction is implemented on the Pytorch platform. Initial learning rates were 0.001, 0.0001, 0.002, corresponding to four data sets rml2016.10a, rml2016.10b, rml2018.01a, and hisarmod2019.1, respectively. The batch size for all experiments was set at 500. The model we hold is the state when the verification accuracy is highest. The loss function adopts class cross entropy, and the optimizer adopts Adam algorithm. To prevent overfitting, we set a strict early stop system: on the hisarmod2019.1 and rml2018.01a datasets, if the verification loss is not reduced within 6 steps, the learning rate is changed to be 0.25 times of the original one; if there is no decrease in step 8, the current network training is terminated and the test phase is entered. For rml2016.10a and rml2016.10b data sets, the two steps were 5 and 15, respectively. The maximum number of training steps on all data sets is limited to 1000. All experiments were performed on a machine equipped with 3 RTX 3090 GPUs with 24G video memory. Performance testing was performed on a 6G RTX 2060GPU using the get_model_complex_info function in the ptcaps library. The function may accept a PyTorch model and then return the number of parameters and floating point number of operations for the model.

The performance of the three network structures WCAN-T, WCAN-S, and WCAN-B are shown in Table 4. It is noted that there is not a simple positive correlation between the model size and the modulation recognition performance. Since there are relatively few data sets currently used for modulation recognition, the model, if too large, can easily lead to an overfitting phenomenon.

Type(s)	Number of parameters	Minimum validation loss	Average accuracy (%)	Optimum accuracy (%)
					WCAN-T	3386712	0.3898	89.37	100
WCAN-S	11834264	0.3666	91.55	100
					WCAN-B	24265624	0.4838	89.92	100

TABLE 4 Table 4

To provide an efficient and accurate experimental assessment, we have specifically chosen experiments based on the hisarmod2019.1 dataset and referenced to the minimum WCAN-T architecture (Ref) to save experimental time. Notably, in this series of experiments, the best test accuracy for most experiments reached 100%.

This high accuracy performance further demonstrates the superiority of the WCAN network model in the task of classifying modulated signals. At the same time, this also means that the WCAN-T can achieve excellent performance even in the most simplified configuration. The method not only proves the powerful capability of the model, but also shows that the model has good calculation efficiency, and is suitable for scenes with time and resource limitation in practical application.

2. Ablation experiment:

the ablation experiment is carried out by using an initialized and defined WCAN-T network model (refer to Ref for short), a Trans model is used as a control group, and the specific experimental design and results are shown in Table 5:

TABLE 5

In the first set of experiments, we moved the WCAN local window architecture and adopted the same network architecture as the Trans script. To ensure that the number of parameters is comparable to Ref, the MLP size is set to 180. Under this configuration, the modulation recognition accuracy of the network reaches 75.73%, which is 1.49% higher than the Trans model, but 13.64% lower than the Ref model. In addition, FLOGs was also increased by nearly 4-fold compared to Ref in experiment No. 1. These results clearly demonstrate that the DDCA mechanism is superior to Self-illumination in modulation recognition, while also verifying the effectiveness of WCAN local window architecture in improving modulation recognition performance and reducing computational effort.

In experiments 2 to 5, we removed the Attention, 1C, DDC and DC portions inside DDCA one by one. The recognition accuracy of these experiments was reduced by from 1.71% to 7.73% compared to the Ref model. This further demonstrates that every component inside DDCA is significant.

The ablation experimental results not only confirm the structural importance of the WCAN local window, but also reveal the indispensable components inside the DDCA.

3. Parameter setting:

since the classification accuracy varies with the parameter settings, fine tuning of the WCAN model is necessary. The influence of the parameters of the model on the classification performance, including the core size of DC, the core size of DDC, and the expansion factor, is studied below.

The results of the DC convolution kernel size correlation are shown in table 6. The traditional large-kernel convolution 7*7 is large, but since we are dealing with 1D data, a larger convolution kernel can be tried. As the DC core size increases, the number of parameters increases gradually, but performance is affected at certain core sizes. Although verification loss and average accuracy performance were poor for core sizes of 11, performance was relatively stable and performed well for core sizes of 3 and 21. This suggests that the WCAN may perform better for certain specific core sizes. The selection of the appropriate core size is critical to optimizing the performance of the WCAN.

	Convolution kernel size	Number of parameters	FLOGs	Minimum validation loss	Average accuracy (%)
						1	3	3383512	194.45	0.3867	89.63
Ref	5	3386712	194.69	0.3898	89.37
						3	7	3389912	194.92	0.4120	89.21
4	11	3396312	194.39	0.4855	87.03
						5	15	3402712	195.85	0.4201	88.00
6	21	3412312	196.55	0.3845	89.39

Table 6DDC convolution kernel size correlation results are shown in table 7.

	Convolution kernel size	Expansion factor	RF	Number of parameters	FLOG _s	Minimum validation loss	Average accuracy (%)
								1	3	3	7	3380312	194.22	0.5187	86.06
Ref	7	3	19	3386712	194.69	0.3898	89.37
								3	11	3	31	3393112	195.15	0.3550	89.70
4	15	3	43	3399512	195.62	0.3253	9128
								5	21	3	61	3409112	196.32	0.2656	92.36
6	25	3	73	3415512	196.79	0.2868	9181
								7	31	3	91	3425112	197.49	0.2133	93.96
8	41	3	121	3441112	198.66	0.2194	94.06
								9	49	3	145	3453912	199.59	0.2228	93.76
10	61	3	181	3473112	200.99	0.2064	94.10
								11	71	3	211	3489112	202.16	0.1806	95.19
12	81	3	241	3505112	203.33	0.1670	95.58
								14	171	3	511	3649112	213.83	0.1830	95.20
15	239	3	715	3757912	221.77	0.1775	95.25
								16	341	3	1023	3921112	233.68	0.1795	95.16

TABLE 7

The expansion factor in this experiment remained unchanged to 3, where the single layer convolution receptive field was calculated as:

RF＝K+(K-1)×(d-1)

where K is the size of the convolution kernel and d is the expansion factor. In studying the response performance of the WCAN model to different DDC core sizes, we found that DDC core size and Receptive Field (RF) had a significant impact on classification performance. It can be seen from the data that as the core size increases from 3 to 81, the validation penalty decreases from 0.5187 to 0.1670, which represents a near 68.78% decrease. At the same time, the average accuracy increased from 86.06% to 95.58%, and it was seen that increasing the core size within a specific core size range significantly improved the performance of the model. However, it is worth noting that as the core size continues to increase to 341, while the verification loss remains in a lower range, the average accuracy does not show a further significant increase, only slightly above the level at core size 81. This may suggest that the gain of performance will tend to saturate beyond a certain core size.

The results of The DDC expansion factor size correlation are shown in Table 8.

	Convolution kernel size	Expansion factor	RF	Number of parameters	FLOG _s	Minimum validation loss	Average accuracy (%)
								1	7	1	7	3386712	194.69	0.5510	84.06
Ref	7	3	19	3386712	194.69	0.3898	89.37
								3	7	5	31	3386712	194.69	0.3507	90.12
4	7	7	43	3386712	194.69	0.2990	91.73
								5	7	11	67	3386712	194.69	0.2703	92.27
6	7	15	91	3386712	194.69	0.2555	93.03
								7	7	21	127	3386712	194.69	0.2818	92.23
8	7	23	145	3386712	194.69	0.2833	91.99
								9	7	27	163	3386712	194.69	0.2731	92.35
10	7	35	211	3386712	194.69	0.3813	89.12
								11	7	39	241	3386712	194.69	0.4054	88.50

TABLE 8

As the expansion factor increases from 1 to 39, the model exhibits superior performance at some expansion factors. Specifically, as the expansion factor increases from 1 to 7, the validation loss decreases from 0.5510 to 0.2990, which represents a nearly 45.74% decrease. At the same time, the average accuracy increased from 84.06% to 91.73%. This trend clearly shows that, within a certain range, increasing the expansion factor can significantly improve the performance of the model. However, as the expansion factor continues to increase, especially beyond 15, while the validation loss remains relatively low for a period of time (e.g., 0.2731), as the expansion factor increases to 35 and 39, the validation loss increases to 0.3813 and 0.4054, respectively, and the average accuracy decreases to 89.12% and 88.50%, respectively. This may indicate that after a certain expansion factor threshold is exceeded, the performance improvement may begin to reverse.

4. Performance comparison:

the test results show that under the same test parameter conditions, the highest recognition accuracy of four different modulation recognition data sets, namely RML2016.10a, RML2016.10b, RML2018.01a and HisarMod2019.1, are respectively as follows: 91.05% (16 dB signal to noise ratio using ResNet), 93.64% (14 dB signal to noise ratio using WCAN), 95.10% (30 dB signal to noise ratio using transducer), and 100% (in high signal to noise ratio environment using WCAN).

In the rml2016.10a dataset, the res net model based on Convolutional Neural Network (CNN) and incorporating the residual mechanism performed optimally. In contrast, the WCAN model based on the attention mechanism proposed in this study has slightly poorer modulation recognition accuracy for small data sets than res net, with an average accuracy of about 3% lower than res net, with a smaller data volume. In the rml2016.10b data set, the modulation recognition performance of the WCAN model is remarkably improved due to the expansion of the data volume. The average and highest accuracy of the WCAN model was the highest in all the comparison models, 0.12 and 0.01 percentiles higher than the next highest model. Furthermore, we also note a significant improvement in the recognition performance of the transducer modulation, presumably because the attention mechanism requires a certain amount of signaling to exhibit the advantage. For the rml2018.01a dataset, the dataset adopts a more complex 2 x 1024 data format. In this case, the attention mechanism based transducer and WCAN model exhibits significant advantages. The average accuracy of the transducer model reaches 60.3%, and the highest accuracy is 95.42%. The WCAN model also shows unusual, the average accuracy is 61.17%, and the highest accuracy reaches 94.05%. Finally, on the hisarmod2019.1 dataset, the performance of the WCAN model is significantly better than all other networks, reaching the average accuracy of 95.58% and the highest accuracy of 100%, and reaching the accuracy of more than 85% under the condition of the lowest signal to noise ratio of-20 dB. The next highest Transformer model only achieves an average accuracy of 74.24% and a highest accuracy of 99.95%.

In summary, the experimental results show that the WCAN-T model exhibits consistent and excellent modulation recognition performance on the modulation recognition data set currently in use. It is comparable to other high performance models on small datasets and shows significant performance advantages on large datasets. It can be inferred that the larger the data volume in the data set, the longer the data, and the better the modulation recognition performance of the WCAN model.

Based on the method in the above embodiment, the embodiment of the invention provides an electronic device. The apparatus may include: a memory for storing a program and a processor for executing the program stored by the memory. Wherein the processor is adapted to perform the method described in the above embodiments when the program stored in the memory is executed.

Based on the method in the above embodiment, the embodiment of the present invention provides a storage medium storing a computer program, which when executed on a processor causes the processor to perform the method in the above embodiment.

It is to be appreciated that the processor in embodiments of the invention may be a central processing unit (centralprocessing unit, CPU), other general purpose processor, digital signal processor (digital signalprocessor, DSP), application specific integrated circuit (application specific integrated circuit, ASIC), field programmable gate array (field programmable gate array, FPGA) or other programmable logic device, transistor logic device, hardware components, or any combination thereof. The general purpose processor may be a microprocessor, but in the alternative, it may be any conventional processor.

The method steps in the embodiments of the present invention may be implemented by hardware, or may be implemented by executing software instructions by a processor. The software instructions may be comprised of corresponding software modules that may be stored in random access memory (random access memory, RAM), flash memory, read-only memory (ROM), programmable ROM (PROM), erasable programmable PROM (EPROM), electrically erasable programmable EPROM (EEPROM), registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. The processor and the storage medium may reside in an ASIC.

In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the present invention, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in a storage medium or transmitted over the storage medium. The computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by a wired (e.g., coaxial cable, fiber optic, digital Subscriber Line (DSL)), or wireless (e.g., infrared, wireless, microwave, etc.). The storage media may be any available media that can be accessed by a computer or a data storage device such as a server, data center, or the like that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., a floppy disk, a hard disk, a magnetic tape), an optical medium (e.g., a DVD), or a semiconductor medium (e.g., a Solid State Disk (SSD)), or the like.

It will be appreciated that the various numerical numbers referred to in the embodiments of the present invention are merely for ease of description and are not intended to limit the scope of the embodiments of the present invention.

It will be readily appreciated by those skilled in the art that the foregoing is merely a preferred embodiment of the invention and is not intended to limit the invention, but any modifications, equivalents, improvements or alternatives falling within the spirit and principles of the invention are intended to be included within the scope of the invention.

Claims

1. A modulation and recognition model training method based on WCAN, the method comprising:

2. The method of claim 1, wherein the signal attention block includes a plurality of signal attention layers, the signal attention layers include attention sub-layers and multi-layer perceptron sub-layers based on a depth-expanded convolution attention mechanism, the attention sub-layers and the multi-layer perceptron sub-layers are connected by residual errors, and output data of the attention sub-layers is input into the multi-layer perceptron sub-layers.

3. The method according to claim 2, wherein the signal attention layer is operated by:

Atten2＝MLP(LN(Atten1))

wherein Output and Input represent Output and Input respectively; LN () represents a normalization operation; attention () represents an Attention sub-layer operation based on a depth-expanded convolution Attention mechanism;representing an element-level product; MLP () represents a multi-layer perceptron operation.

4. A method according to claim 2 or 3, wherein the attention sub-layer is operated by:

Output1＝Conv1D(DDCA(GELU(Conv1D(X _in ))))

wherein Output1 is the Output, X _in Conv1D () represents one for inputPerforming dimensional convolution operation; GELU () represents an activation function; DDCA () represents a depth-expanded convolution attention operation.

5. The method of claim 4, wherein the depth-expanded convolution attention algorithm is:

Atten3＝Conv ₁ (DDConv(DConv(in)))

6. The method according to claim 1, wherein the operation procedure of the block embedding layer is:

P _i ＝Norm(Conv1D(X))

P _i is of the shape [ C ] _i ，Len/n _i ]Tensor of (C), where C _i Representing the number of channels output; len/n _i Representing the size of the output window; len represents the length of the IQ signal; n is n _i ＝4×2 ^i-1 ；

7. The method of claim 1, wherein IQ signals undergo four stages in sequence for different scale local feature extraction.

8. A modulation identification method based on WCAN, the method comprising:

receiving an IQ signal to be identified and preprocessing the signal;

outputting the identified modulation mode;

9. An electronic device, comprising:

a memory for storing a program;

a processor for executing the program stored in the memory, the processor being adapted to perform the method of any one of claims 1-7, or the method of claim 8, when the program stored in the memory is executed.

10. A storage medium storing a computer program, which, when run on a processor, causes the processor to perform the method of any one of claims 1-7, or the method of claim 8.