CN117278154A - Spectrum prediction method based on attention mechanism - Google Patents
Spectrum prediction method based on attention mechanism Download PDFInfo
- Publication number
- CN117278154A CN117278154A CN202311379839.6A CN202311379839A CN117278154A CN 117278154 A CN117278154 A CN 117278154A CN 202311379839 A CN202311379839 A CN 202311379839A CN 117278154 A CN117278154 A CN 117278154A
- Authority
- CN
- China
- Prior art keywords
- spectrum
- attention
- information
- sequence
- input
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000001228 spectrum Methods 0.000 title claims abstract description 120
- 230000007246 mechanism Effects 0.000 title claims abstract description 73
- 238000000034 method Methods 0.000 title claims abstract description 24
- 238000012549 training Methods 0.000 claims abstract description 31
- 238000000605 extraction Methods 0.000 claims abstract description 27
- 230000006870 function Effects 0.000 claims description 24
- 238000010606 normalization Methods 0.000 claims description 14
- 239000011159 matrix material Substances 0.000 claims description 9
- 238000013528 artificial neural network Methods 0.000 claims description 8
- 230000004913 activation Effects 0.000 claims description 7
- 230000009466 transformation Effects 0.000 claims description 4
- 238000004590 computer program Methods 0.000 claims 4
- 239000000047 product Substances 0.000 description 14
- 239000013598 vector Substances 0.000 description 12
- 238000004364 calculation method Methods 0.000 description 8
- 210000004027 cell Anatomy 0.000 description 7
- 210000002569 neuron Anatomy 0.000 description 7
- 230000000694 effects Effects 0.000 description 6
- 230000008569 process Effects 0.000 description 6
- 238000010586 diagram Methods 0.000 description 5
- 230000007774 longterm Effects 0.000 description 5
- 238000012512 characterization method Methods 0.000 description 4
- 230000006399 behavior Effects 0.000 description 3
- 230000001149 cognitive effect Effects 0.000 description 3
- 230000007547 defect Effects 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 108020004705 Codon Proteins 0.000 description 2
- 239000000654 additive Substances 0.000 description 2
- 230000000996 additive effect Effects 0.000 description 2
- 238000013459 approach Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 230000000717 retained effect Effects 0.000 description 2
- 239000013589 supplement Substances 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 238000012795 verification Methods 0.000 description 2
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000005065 mining Methods 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 238000004321 preservation Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
- 230000026683 transduction Effects 0.000 description 1
- 238000010361 transduction Methods 0.000 description 1
- 230000001131 transforming effect Effects 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04B—TRANSMISSION
- H04B17/00—Monitoring; Testing
- H04B17/30—Monitoring; Testing of propagation channels
- H04B17/382—Monitoring; Testing of propagation channels for resource allocation, admission control or handover
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Computational Linguistics (AREA)
- Evolutionary Computation (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Software Systems (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Physics (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Signal Processing (AREA)
- Computer Networks & Wireless Communication (AREA)
- Electromagnetism (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
The spectrum prediction method based on the attention mechanism solves the problem of how to more accurately predict the spectrum occupation state at as many times as possible in the future, and belongs to the field of spectrum prediction. The invention comprises the following steps: embedding a gating recursion unit into each sub-module of a transducer model based on an attention mechanism to obtain a spectrum prediction network, extracting local correlation of spectrum occupation information by the gating recursion unit, and outputting an information pre-extraction result with position codes; the input data of the training set comprises an input sequence and an output sequence, wherein the input sequence is a historical spectrum information sequence, the output sequence is a future spectrum information sequence shifted by one bit to the right, and the output data is future spectrum information; after training the spectrum prediction network by using the training set, taking the current spectrum information as an input sequence, taking the spectrum information at the last moment in the input sequence as the first input of the decoding submodule output sequence, and performing spectrum prediction in an autoregressive mode by using the spectrum prediction network.
Description
Technical Field
The invention relates to a spectrum prediction method based on an attention mechanism, and belongs to the field of spectrum prediction.
Background
The spectrum prediction technology is used as the supplement of the cognitive radio, and the spectrum occupation condition of future time slots is predicted by mining and analyzing the correlation among historical spectrum data, so that the spectrum sensing only needs to scan and sense the frequency band predicted to be idle, thereby greatly reducing the energy and time loss required by the spectrum sensing and enabling the spectrum decision to be accurately and efficiently carried out in a shorter time. After the spectrum prediction, the spectrum sharing operation is performed, and a Secondary User (SU) can make an appropriate sharing policy in advance according to different service requirements of the Secondary User, so as to make up the time required by response. In addition, the switching based on spectrum prediction is actively performed, and the channel occupation state in the future time slot is judged by self-analyzing the result of spectrum prediction, so that whether the SU needs to perform spectrum switching at one or more future time instants is determined in advance, and the probability of collision between the SU and a Primary User (PU) is reduced.
At present, although the development of the neural network greatly drives the promotion of the spectrum prediction technology, the problem of long-term dependence of the application of a better LSTM and a variant structure thereof on an input sequence is not solved well, and the correlation information storage and conversion capability of a sequence transduction model-Seq-to-Seq are limited by the length of an intermediate vector. In a practical environment, the allocation of spectrum resources is completed under a plurality of channels, and the occupation condition of the final spectrum resources is affected by a spectrum allocation strategy and user behaviors, so that the correlation of the spectrum resources is reflected in time, and the correlation exists between the channels. In addition, if only single-step prediction is performed, only the channel occupation condition in one time slot in the future is scanned each time, frequent prediction and sensing are needed, and when the frequency spectrum is switched, the short-step prediction forces the user to frequently switch channels, and SU also needs to make decisions according to own service requirements for many times, so that the mode has low efficiency and weak practicability.
Disclosure of Invention
Aiming at the problem of how to more accurately predict the spectrum occupation state at as many times as possible in the future, the invention provides a spectrum prediction method based on an attention mechanism.
The invention discloses a spectrum prediction method based on an attention mechanism, which comprises the following steps:
s1, establishing a spectrum prediction network, wherein the spectrum prediction network is formed by embedding a gating recursion unit in each sub-module of a transducer model based on an attention mechanism, the length of the gating recursion unit is equal to the length of an input sequence, and in a coding sub-module and a decoding sub-module, the input sequence firstly enters the gating recursion unit, and the gating recursion unit performs local correlation extraction on spectrum occupation information and outputs an information pre-extraction result with a position code;
s2, taking frequency spectrum occupation state data sequenced in a channel priority descending mode as a training set, wherein input data in the training set comprises an input sequence and an output sequence, wherein the input sequence is a sequence of historical frequency spectrum information, the output sequence is a future frequency spectrum information sequence shifted by one bit to the right, and the output data in the training set is future frequency spectrum information;
s3, training the spectrum prediction network by using a training set;
s4, predicting: and taking the current spectrum information as an input sequence, taking the spectrum information at the last moment in the input sequence as the first input of the output sequence of the decoding submodule, and carrying out spectrum prediction in an autoregressive mode by utilizing a trained spectrum prediction network.
Preferably, the attention mechanisms in the coding submodule are multi-head attention mechanisms, two attention mechanisms in the decoding submodule are a cross multi-head attention mechanism and a shielding multi-head attention mechanism respectively, global correlation extraction is performed by using the attention mechanisms on the basis of local information extraction by the gating recursive unit, wherein the multi-head attention mechanism in the coding submodule is used for extracting global correlation among historical information, the shielding multi-head attention mechanism in the decoding submodule is used for extracting correlation of future time slot frequency spectrum occupation, and the cross multi-head attention mechanism is used for extracting correlation between historical frequency spectrum information and future frequency spectrum information.
Preferably, the nth coding submodule is:
for the output of the gating recursion unit t in the nth coding submodule,/>Representing the output at time t of the nth encoding submodule; LSTM () represents a gating recursion unit, layerNorm () represents layer normalization, multi-head () represents multi-head attention, and FFN () represents FFN transformation using a scaled dot product attention scoring function.
Preferably, the nth codon module is:
wherein,then the MaskMultiHead () represents the masked multi-headed attention mechanism, using the scaled dot product attention scoring function, representing the output at time t of the nth decoding submodule.
Preferably, the normalization in the transducer model is layer normalization.
Preferably, a dropout mechanism is added in the training process of the spectrum prediction network.
Preferably, the forward propagation formula of the gating recursion unit is:
wherein, as follows, the Hadamard product operator, gating cell i t 、f t 、o t C) corresponding to the outputs of the input gate, the forget gate and the output gate respectively t Represents the output of the memory cell at the current moment, h t-1 Representing the hidden state, x, of the last moment t Indicating whenThe input of the previous moment in time,and->Respectively corresponding to the weight matrix of the input sequence in the input gate and the memory unit when extracting the characteristics,and->Weight matrix corresponding to the input sequence in forgetting gate and memory unit for feature extraction>And->B, respectively corresponding to the input sequence in the output gate and the weight matrix when the memory unit performs feature extraction i 、b f 、b o 、b c Bias term, sigma, & lt/EN for each gate>Representing an activation function.
Preferably, the feedforward neural network in the transducer model is:
FFN(x)=w 2 relu(w 1 x+b 1 )+b 2
wherein FFN (x) is the output of the feedforward neural network, x is the input, w 1 、w 2 As a weight matrix, b 1 、b 2 As a bias term, relu () is an activation function.
The invention fuses the recursion structure unit into each sub-module of the transducer model based on the attention mechanism, utilizes the superior local correlation processing capacity of the recursion structure and the characteristic of the output with position codes, complements the advantages of parallelization processing data and efficient global information extraction capacity of the transducer model, overcomes the long-term dependence problem of LSTM and the defect that the transducer model is easy to generate overfitting, and particularly realizes high-accuracy spectrum prediction in multi-channel multi-step prediction closer to the actual environment.
Drawings
FIG. 1 is a schematic diagram of a prediction mode of a model for multi-channel and multi-step prediction;
FIG. 2 is an overall block diagram of an LSTM-transducer model;
FIG. 3 is a block diagram of a gating recursion unit in an LSTM-transducer model;
FIG. 4 is a schematic diagram of an algorithmic implementation of the attention mechanism;
fig. 5 is a diagram of a multi-head attention mechanism structure after parallel operation modification of the attention mechanism.
FIG. 6 is a graph of multi-channel and multi-step predictions showing model superiority, with the abscissa representing prediction step size and the ordinate representing accuracy.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
It should be noted that, without conflict, the embodiments of the present invention and features of the embodiments may be combined with each other.
The invention is further described below with reference to the drawings and specific examples, which are not intended to be limiting.
Description of the problem:
in an actual radio environment, the allocation policy of spectrum resources and the frequency usage behavior of users all cause the occupation states of channels to have interdependence. Therefore, here, an M/G/5 queuing theory model close to the real environment will be adopted, and the input process is assumed to obey the poisson distribution with the parameter lambda, namely, the number x of users is reached to obey the poisson distribution in the t time period, and the distribution probability is as follows:
λ represents the number of users that arrive in average per unit time, and x represents the number of users that actually arrive per unit time. In this case, the time interval compliance parameter for the arrival of two adjacent users isThe probability density of the exponential distribution of (a):
i.e. the average time interval that the adjacent two users arrive. Assuming that the service time obeys a general service time distribution with a parameter mu, the distribution probability is:
P(x=k)=μ(1-μ) k-1 ,k=1,2,…,N
μ represents the probability that the user may be served in a unit time, i.e. the average number of users served in a unit time,representing the average time a user occupies a channel, and x represents the time the user actually occupies the channel.
Based on the data set, a hypothetical priority allocation rule is added for queuing theory, and priorities of 5 channels are sequentially reduced, so that spectrum occupation state data is generated and used as a data set for model training and testing, and therefore interdependence relations among channels in an actual environment are simulated. The spectrum occupation state of n historical moments before 5 channels is used as historical referenceable information, wherein the state is used for indicating that the channels are unoccupied by 0, and the state is used for indicating that the channels are occupied by 1. The prediction mode is shown in fig. 1, where the first box represents the history information referenced by the model and the second box represents the information of the future time to be predicted. The objective of this embodiment is to predict the spectrum occupancy state at as many times as possible in the future with higher accuracy after analyzing the time and the correlation between channels of a certain number of historical spectrum occupancy states. However, today, applying the mature LSTM-based Seq-to-Seq model in the field of spectrum prediction to express correlation between historic information is subject to the length of the intermediate vector and the application form of the intermediate vector in the decoding submodule. The transducer model based on the attention mechanism is good in the natural language processing field, but after the transducer model is transferred to the spectrum prediction field, the problems of fitting, loss of relative position information in the calculation process of the attention mechanism and the like easily occur, so that the future multi-step spectrum occupation information cannot be predicted with high precision.
Based on the above problems, the present embodiment constructs an LSTM-fransformer model with a combination of a temporal recursive structure and an attention mechanism, inserts a gating recursive unit equal to the length of the history information into each sub-module of the fransformer, performs local correlation extraction on the spectrum occupation information by using the gating recursive unit, and outputs an information pre-extraction result with a position code. And the long-term complementation effect is realized by utilizing the capability of the transducer model for efficiently calculating the correlation in parallel and the global correlation extraction capability. The model not only makes up the defect of LSTM in the long-term dependence treatment, but also relieves the overfitting problem of the transducer model, and greatly improves the accuracy of spectrum prediction. The spectrum prediction method based on the attention mechanism of the embodiment specifically comprises the following steps:
step 1, establishing a spectrum prediction network: the LSTM-transducer model is shown in FIG. 2. The spectrum prediction network is characterized in that a gating recursion unit is embedded in each sub-module of a transducer model based on an attention mechanism, the length of the gating recursion unit is equal to the length of an input sequence, the chunk_size and pre_step_size parameters of the model are modified according to the required window length and the prediction step length, in a coding sub-module and a decoding sub-module, the input sequence is firstly input into the gating recursion unit, and the gating recursion unit performs local correlation extraction on spectrum occupation information and outputs an information pre-extraction result with position codes;
step 2, taking frequency spectrum occupation state data sequenced in a channel priority descending mode as a training set, wherein input data in the training set comprises an input sequence and an output sequence, wherein the input sequence is a sequence of historical frequency spectrum information, the output sequence is a future frequency spectrum information sequence shifted by one bit to the right, and the output data in the training set is future frequency spectrum information;
step 3, training the spectrum prediction network by using a training set;
and 4, predicting: and taking the current spectrum information as an input sequence, taking the spectrum information at the last moment in the input sequence as the first input of the output sequence of the decoding submodule, and carrying out spectrum prediction in an autoregressive mode by utilizing a trained spectrum prediction network. And checking the output result of the decoding submodule, namely, the predicted value of the spectrum occupation condition in the future pre_step_size time slots.
The spectrum prediction network of the present embodiment can maintain multi-step prediction accuracy. The basic idea is to fuse the gating recursion unit with a Transformer architecture based on the attention mechanism completely, and make up the defects of the two structures by utilizing the complementary advantages of the gating recursion unit and the Transformer architecture. Conventional multi-channel spectrum prediction often adopts a Seq-to-Seq structure based on LSTM and LSTM variants, the prediction accuracy of this approach is not high enough, and as the prediction step increases, the prediction accuracy drops dramatically, which is contrary to the multi-step prediction requirements in an actual radio environment. Therefore, in order to acquire the spectrum occupation condition of a plurality of time slots in the future at one time and ensure that the prediction accuracy is high enough, the embodiment firstly uses the superior local information extraction capability of the gating recursion unit and the characteristic of the output self-contained position code to pre-extract the correlation between the grasped information and the unknown information and increase the richness for the input of a transducer. Meanwhile, the high-accuracy multi-channel multi-step prediction is realized by utilizing the high-efficiency global information extraction capability of the transducer model, so that the utilization efficiency of spectrum resources is improved. The prediction accuracy obtained by the model is higher than that of the traditional model, and the accuracy can be maintained to be higher in the future prediction of tens of steps, so that the working efficiency of the subsequent cognitive radio is greatly improved, and the utilization rate of spectrum resources is further improved.
In the present embodiment, if the gating recursion unit outputs the information after the local time correlation extraction as the output result, it is not necessary to consider whether the correlation at a remote distance can be extracted. Meanwhile, the output result is provided with position coding information, and no position coding is required to be additionally added when the output result is used as the input of a transducer model. And the correlation pre-extraction is equivalent to increasing the richness of the data set of the transducer model, and reducing the possibility of over-fitting phenomenon. The transducer layer can supplement the correlation information learned by the gating recursion unit layer and extract the long-term dependence information. The model also follows the coding submodule-decoding submodule structure, and the nth coding submodule is:
for the output of the gating recursion unit t in the nth coding submodule,/>Representing the output at time t of the nth encoding submodule; LSTM () represents a gating recursion unit, layerNorm () represents layer normalization, multi-head () represents multi-head attention, and FFN () represents FFN transformation using a scaled dot product attention scoring function;
the nth codon module is:
wherein,then the MaskMultiHead () represents the masked multi-headed attention mechanism, using the scaled dot product attention scoring function, representing the output at time t of the nth decoding submodule. The form of the attention mechanism adopted in the coding submodule and the decoding submodule is the same as that of the transducer model.
The structure of the gating recursion unit in the model is shown in FIG. 3, and three inputs are provided at each time, namely the hidden state h at the previous time t-1 Memory cell c t-1 And input x at the current time t . The gating is three types: forget gate, input gate, output gate. They determine the last memory cell c by means of a sigmoid function and a dot product operation t-1 How much information is to be retained and input information x t And the last hidden state h t-1 How much is added to the current memory cell c t The current memory cell c t How much to be output or to be the next hidden state h t . The specific form of the gate control unit is as follows:
g(x)=σ(ωx+b)
the obtained real value is mapped between 0 and 1 through a sigmoid function to represent the preservation or discarding of the information at the last moment. If the value of g (x) is close to 0, no information passes, and if it is close to 1, all information passes.
The forward propagation formula of the gating recursion unit is:
wherein, as follows, the Hadamard product operator, gating cell i t ,f t ,o t Corresponding to input door, forget door and output door c t A memory unit h representing the current time t-1 Representing the hidden state, x, of the last moment t An input representing the current moment in time is presented,andweight matrixes respectively corresponding to the input sequence in the input gate and the memory unit for feature extraction, and other weight matrixes are also expressed in a similar form, b i 、b f 、b o 、b c Bias terms for each gate. Sigma, & gt>Representing an activation function. Wherein, the sigma multipurpose sigmoid function controls the gate valve between 0 and 1 to describe the passing amount of information and the ++>The tan h or the Relu function is used in multiple ways and is selected according to practical situations.
The attention mechanisms in the coding sub-module of the embodiment are multi-head attention mechanisms, two attention mechanisms in the decoding sub-module are a cross multi-head attention mechanism and a shielding multi-head attention mechanism respectively, global correlation extraction is performed by using the attention mechanisms on the basis of local information extraction by the gating recursion unit, wherein the multi-head attention mechanism in the coding sub-module is used for extracting global correlation between historical information, the shielding multi-head attention mechanism in the decoding sub-module is used for extracting correlation of future time slot spectrum occupation, and the cross multi-head attention mechanism is used for extracting correlation between historical spectrum information and future spectrum information.
The attention mechanism is the part in which the most important capture sequences are related, and the structure is shown in FIG. 4. The attention mechanism may be described as a correlation calculation of a query and a set of keys (keys) to obtain an attention score value, i.e. an attention weight, from which the values (values) are weighted together. The attention weight is calculated by an attention scoring function. The calculation formula of the obtained output is:
where a is an attention scoring function, and the value obtained by the function is converted into an attention weight of 1 after softmax.
There are generally two types of calculation of the attention scoring function: additive attention mechanisms and dot product attention mechanisms. The additive attention mechanism can effectively summarize important information in a sequence of linear complexity, and when the query and key vector lengths are different, this approach is typically chosen as the scoring function. Since matrix multiplication is implemented in a number of efficient ways, dot product attention mechanisms are more computationally efficient and are more widely used, but require the vector lengths of the query and key to be the same. Assuming that all sequences in the query and the key are random variables with the mean value of 0, the variance of 1 and mutual independence, the mean value of dot product results of the vector is 0, and the variance of d is the vector dimension. To make the variance independent of the vector dimension, the value of the dot product is divided byThe attention score value with the mean value of 0 and the variance of 1 and not constrained by vector dimension can be obtained. The general formula for the scaled dot product attention scoring function is as follows:
to improve parallelism, modifications are made on the basis of scaled dot product attention, resulting in a multi-headed attention mechanism, the structure of which is shown in FIG. 5. Firstly, linearly transforming the query, the key and the value and cutting into a plurality of parts with the same dimensions, namely, respectively performing scaling dot product attention calculation on each part, and splicing the obtained results to obtain a linear transformation result, wherein the calculation formula is as follows:
the decoding submodule part also uses a masked multi-head attention layer. This is because the entire right shifted output sequence is taken as input to the decoding submodule at once during the training process. In the actual prediction process, when the ith vector is predicted, the vectors after i are unknown. Therefore, there is a need to mask the relevance of the vector after i, i.e. the attention weight value of this part, to avoid "cheating" behavior of the model during prediction.
The calculation formula of the feedforward neural network part is as follows:
FFN(x)=w 2 relu(w 1 x+b 1 )+b 2
the FFN layer contains only one hidden layer and Relu is selected as the activation function.
In the transform block, a residual connection is used, i.e. the output result of the current layer is added to the value input to the layer, so that the effect obtained by the network with a deeper layer is ensured not to be poorer than the effect obtained by the network with a shallower layer.
The application of the attention mechanism in the transducer block is largely divided into three parts. The attention mechanisms of the coding submodules all take the form of a multi-head attention mechanism. Wherein all queries, keys and values are derived from the output of a layer above the coding sub-module. In the decoding submodule, two attention mechanisms are used, namely a cross multi-head attention mechanism and a shielding multi-head attention mechanism. The query of the cross-over multi-head attention mechanism is derived from the output of the layer preceding the decoding submodule, and the key and value are derived from the output of the encoding submodule. The query, key and value of the masked multi-headed attention mechanism are all derived from the output of the previous layer of the decoding submodule.
The normalization in the transducer model of this embodiment is layer normalization (Layer Normalization, LN) and differs from the batch normalization (Batch Normalization, BN) commonly used in CNN. BN is the normalization of the input in all batches of individual neurons in a layer, which is limited by the size of batch size. When the batch_size is smaller, only a small amount of data is normalized, and the obtained result cannot embody the integral characteristic. In addition, like time series, the net input distribution of a neuron is dynamically changing in the neural network and batch normalization operations cannot be used. Instead, LN normalizes all neurons in each batch of each layer separately, without limitation of the sequence length in each batch, and thus is more suitable for time series structures.
In the embodiment, a dropout mechanism is added in the training process of the spectrum prediction network. Considering that the transducer part is easy to be over-fitted, a dropout mechanism is added for the purpose, and meanwhile, the problem of co-adaptability among network nodes is solved. Because different nodes in the network have different characterization capacities, the node with stronger characterization capacity can be continuously strengthened along with the increase of training times, and the node with weak characterization capacity can be continuously weakened until the node with weak characterization capacity can be ignored. This is equivalent to only some nodes in the network being trained, wasting depth and width resources of the network, and limiting model training effect. Dropout can be interpreted as a random discarding of some neurons with a certain probability during model training, in other words, each training can train to a different neuron, and since two neurons are not necessarily retained in one training at the same time, the weight and bias parameter updates in the network do not affect each other. The mechanism breaks through the problem of co-adaptability among neurons, so that the network learning is more robust.
The calculation formula of the feedforward neural network part is as follows:
FFN(x)=w 2 relu(w 1 x+b 1 )+b 2
the FFN layer contains only one hidden layer and Relu is selected as the activation function.
In the transform block, a residual connection is used, i.e. the output result of the current layer is added to the value input to the layer, so that the effect obtained by the network with a deeper layer is ensured not to be poorer than the effect obtained by the network with a shallower layer.
In this embodiment, all the modules are fused to construct an LSTM-transducer model. And using spectrum occupation information of 5 channels generated by an M/G/5 model as a data set, and dividing a training set, a verification set and a test set according to a ratio of 6:2:2. The LSTM-based Seq-to-Seq model, the transducer model, and the LSTM-transducer model are trained separately. In the three models, adam with wide application and excellent application is adopted as an optimizer, and the initial learning rate is set to be 10 -3 The loss functions all adopt mean square error lossAnd selecting the prediction accuracy as an evaluation index. Setting the number of hidden units in the Seq-to-Seq model as 200, and setting the learning rate attenuation of the optimizer as 10 -6 Training was performed 100 times. In the transducer model, the number of hidden layer units is set to be 200, the batch size is 128, and the learning rate attenuation is changed to be 5×10 -6 Training was performed 200 times. For the LSTM-transducer model, the number of hidden layer units was set to 256, the batch size was 64, and the learning rate decay of the optimizer was the same as that of the transducer model, and trained 200 times. In each model, the window length corresponding to the highest prediction accuracy is selected, and in our model, the window length is selected to be 30, namely, the historical time of 30 steps and the correlation information among channels are referenced, so that the channel occupation condition in the future 30 time slots is predicted.
The multi-channel prediction accuracy results of the three on the verification set are compared, as shown in fig. 6. Therefore, the LSTM-transducer model constructed in the embodiment shows excellent performance in multi-channel multi-step spectrum prediction, and when the prediction step length is long, the prediction accuracy is still kept at about 98%, the requirements on the prediction accuracy in an actual radio environment are met, and the efficient performance of the follow-up operation (spectrum sensing, spectrum decision, spectrum sharing and spectrum switching) of the cognitive radio is ensured.
Although the invention herein has been described with reference to particular embodiments, it is to be understood that these embodiments are merely illustrative of the principles and applications of the present invention. It is therefore to be understood that numerous modifications may be made to the illustrative embodiments and that other arrangements may be devised without departing from the spirit and scope of the present invention as defined by the appended claims. It should be understood that the different dependent claims and the features described herein may be combined in ways other than as described in the original claims. It is also to be understood that features described in connection with separate embodiments may be used in other described embodiments.
Claims (10)
1. A method of spectrum prediction based on an attention mechanism, the method comprising:
s1, establishing a spectrum prediction network, wherein the spectrum prediction network is formed by embedding a gating recursion unit in each sub-module of a transducer model based on an attention mechanism, the length of the gating recursion unit is equal to the length of an input sequence, and in a coding sub-module and a decoding sub-module, the input sequence firstly enters the gating recursion unit, and the gating recursion unit performs local correlation extraction on spectrum occupation information and outputs an information pre-extraction result with a position code;
s2, taking frequency spectrum occupation state data sequenced in a channel priority descending mode as a training set, wherein input data in the training set comprises an input sequence and an output sequence, wherein the input sequence is a sequence of historical frequency spectrum information, the output sequence is a future frequency spectrum information sequence shifted by one bit to the right, and the output data in the training set is future frequency spectrum information;
s3, training the spectrum prediction network by using a training set;
s4, predicting: and taking the current spectrum information as an input sequence, taking the spectrum information at the last moment in the input sequence as the first input of the decoder output sequence, and performing spectrum prediction in an autoregressive mode by utilizing a trained spectrum prediction network.
2. The attention mechanism-based spectrum prediction method as claimed in claim 1, wherein the attention mechanisms in the coding sub-module are multi-headed attention mechanisms, the two attention mechanisms in the decoding sub-module are respectively a cross multi-headed attention mechanism and a blocked multi-headed attention mechanism, global correlation extraction is performed by using the attention mechanisms on the basis of local information extraction by the gating recursion unit, wherein the multi-headed attention mechanism in the coding sub-module is used for extracting global correlation between history information, the blocked multi-headed attention mechanism in the decoding sub-module is used for extracting correlation between future time slot spectrum occupation, and the cross multi-headed attention mechanism is used for extracting correlation between history spectrum information and future spectrum information.
3. The attention-based spectrum prediction method of claim 2, wherein the nth coding submodule is:
for the output of the gating recursion unit t in the nth coding submodule,/>Representing the output at time t of the nth encoding submodule; LSTM () represents a gating recursion unit, layerNorm () represents layer normalization, multi-head () represents multi-head attention, and FFN () represents FFN transformation using a scaled dot product attention scoring function.
4. A method of spectrum prediction based on an attention mechanism as claimed in claim 3, wherein the nth decoding submodule is:
wherein,then the MaskMultiHead () represents the masked multi-headed attention mechanism, using the scaled dot product attention scoring function, representing the output at time t of the nth decoding submodule.
5. The attention-based spectrum prediction method of claim 2, wherein the normalization in the transducer model is layer normalization.
6. The attention-based spectrum prediction method as recited in claim 2, wherein a dropout mechanism is added during training of the spectrum prediction network.
7. The attention-based spectrum prediction method of claim 1, wherein the forward propagation formula of the gating recursion unit is:
wherein, as follows, the Hadamard product operator, gating cell i t 、f t 、o t C) corresponding to the outputs of the input gate, the forget gate and the output gate respectively t Represents the output of the memory cell at the current moment, h t-1 Representing the hidden state, x, of the last moment t An input representing the current moment in time is presented,and->Weight matrix corresponding to the input sequence in the input gate and the memory unit for feature extraction respectively, < ->Andweight matrix corresponding to the input sequence in forgetting gate and memory unit for feature extraction>And->B, respectively corresponding to the input sequence in the output gate and the weight matrix when the memory unit performs feature extraction i 、b f 、b o 、b c Bias term, sigma, & lt/EN for each gate>Representing an activation function.
8. The attention-based spectrum prediction method of claim 1, wherein the feedforward neural network in the transducer model is:
FFN(x)=w 2 relu(w 1 x+b 1 )+b 2
wherein FFN (x) is the output of the feedforward neural network, x is the input, w 1 、w 2 As a weight matrix, b 1 、b 2 As a bias term, relu () is an activation function.
9. A computer-readable storage device storing a computer program, characterized in that the computer program when executed implements the attention-based spectrum prediction method according to any of claims 1 to 8.
10. An attention-based spectrum prediction apparatus comprising a storage device, a processor and a computer program stored in the storage device and executable on the processor, wherein execution of the computer program by the processor implements the attention-based spectrum prediction method as claimed in any one of claims 1 to 8.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311379839.6A CN117278154A (en) | 2023-10-23 | 2023-10-23 | Spectrum prediction method based on attention mechanism |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311379839.6A CN117278154A (en) | 2023-10-23 | 2023-10-23 | Spectrum prediction method based on attention mechanism |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117278154A true CN117278154A (en) | 2023-12-22 |
Family
ID=89206201
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311379839.6A Pending CN117278154A (en) | 2023-10-23 | 2023-10-23 | Spectrum prediction method based on attention mechanism |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117278154A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118233035A (en) * | 2024-05-27 | 2024-06-21 | 烟台大学 | Multiband spectrum prediction method and system based on graph convolution inversion transform |
-
2023
- 2023-10-23 CN CN202311379839.6A patent/CN117278154A/en active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118233035A (en) * | 2024-05-27 | 2024-06-21 | 烟台大学 | Multiband spectrum prediction method and system based on graph convolution inversion transform |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109754113B (en) | Load prediction method based on dynamic time warping and long-and-short time memory | |
Whitehead | Genetic evolution of radial basis function coverage using orthogonal niches | |
CN114169330A (en) | Chinese named entity identification method fusing time sequence convolution and Transformer encoder | |
CN110442721B (en) | Neural network language model, training method, device and storage medium | |
CN112733444A (en) | Multistep long time sequence prediction method based on CycleGAN neural network | |
CN117278154A (en) | Spectrum prediction method based on attention mechanism | |
JP7020547B2 (en) | Information processing equipment, control methods, and programs | |
CN110633473B (en) | Implicit discourse relation identification method and system based on conditional random field | |
CN104504442A (en) | Neural network optimization method | |
CN112766496B (en) | Deep learning model safety guarantee compression method and device based on reinforcement learning | |
CN113935513A (en) | CEEMDAN-based short-term power load prediction method | |
CN114328048A (en) | Disk fault prediction method and device | |
WO2023179609A1 (en) | Data processing method and apparatus | |
CN114528835A (en) | Semi-supervised specialized term extraction method, medium and equipment based on interval discrimination | |
CN114154758B (en) | Knowledge-graph-based molecular regulation and control relation prediction method and system | |
CN118132674A (en) | Text information extraction method based on large language model and high-efficiency parameter fine adjustment | |
Cohen et al. | Cross-validation conformal risk control | |
Wu et al. | Discovering Mathematical Expressions Through DeepSymNet: A Classification-Based Symbolic Regression Framework | |
JPWO2019167240A1 (en) | Information processing equipment, control methods, and programs | |
CN115794880A (en) | Approximate query processing-oriented sum-product network and residual error neural network hybrid model | |
CN115081609A (en) | Acceleration method in intelligent decision, terminal equipment and storage medium | |
CN115116619A (en) | Intelligent analysis method and system for stroke data distribution rule | |
CN114444517A (en) | Intelligent law judgment method for numerical perception with increased sentencing standard knowledge | |
CN114116692A (en) | Missing POI track completion method based on mask and bidirectional model | |
Walke et al. | Learning finite linear temporal logic specifications with a specialized neural operator |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |