CN117278154A - Spectrum prediction method based on attention mechanism - Google Patents

Spectrum prediction method based on attention mechanism Download PDF

Info

Publication number
CN117278154A
CN117278154A CN202311379839.6A CN202311379839A CN117278154A CN 117278154 A CN117278154 A CN 117278154A CN 202311379839 A CN202311379839 A CN 202311379839A CN 117278154 A CN117278154 A CN 117278154A
Authority
CN
China
Prior art keywords
spectrum
attention
information
sequence
input
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311379839.6A
Other languages
Chinese (zh)
Inventor
王钢
孔金山
高玉龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Harbin Institute of Technology
Original Assignee
Harbin Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harbin Institute of Technology filed Critical Harbin Institute of Technology
Priority to CN202311379839.6A priority Critical patent/CN117278154A/en
Publication of CN117278154A publication Critical patent/CN117278154A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04BTRANSMISSION
    • H04B17/00Monitoring; Testing
    • H04B17/30Monitoring; Testing of propagation channels
    • H04B17/382Monitoring; Testing of propagation channels for resource allocation, admission control or handover
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Electromagnetism (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The spectrum prediction method based on the attention mechanism solves the problem of how to more accurately predict the spectrum occupation state at as many times as possible in the future, and belongs to the field of spectrum prediction. The invention comprises the following steps: embedding a gating recursion unit into each sub-module of a transducer model based on an attention mechanism to obtain a spectrum prediction network, extracting local correlation of spectrum occupation information by the gating recursion unit, and outputting an information pre-extraction result with position codes; the input data of the training set comprises an input sequence and an output sequence, wherein the input sequence is a historical spectrum information sequence, the output sequence is a future spectrum information sequence shifted by one bit to the right, and the output data is future spectrum information; after training the spectrum prediction network by using the training set, taking the current spectrum information as an input sequence, taking the spectrum information at the last moment in the input sequence as the first input of the decoding submodule output sequence, and performing spectrum prediction in an autoregressive mode by using the spectrum prediction network.

Description

Spectrum prediction method based on attention mechanism
Technical Field
The invention relates to a spectrum prediction method based on an attention mechanism, and belongs to the field of spectrum prediction.
Background
The spectrum prediction technology is used as the supplement of the cognitive radio, and the spectrum occupation condition of future time slots is predicted by mining and analyzing the correlation among historical spectrum data, so that the spectrum sensing only needs to scan and sense the frequency band predicted to be idle, thereby greatly reducing the energy and time loss required by the spectrum sensing and enabling the spectrum decision to be accurately and efficiently carried out in a shorter time. After the spectrum prediction, the spectrum sharing operation is performed, and a Secondary User (SU) can make an appropriate sharing policy in advance according to different service requirements of the Secondary User, so as to make up the time required by response. In addition, the switching based on spectrum prediction is actively performed, and the channel occupation state in the future time slot is judged by self-analyzing the result of spectrum prediction, so that whether the SU needs to perform spectrum switching at one or more future time instants is determined in advance, and the probability of collision between the SU and a Primary User (PU) is reduced.
At present, although the development of the neural network greatly drives the promotion of the spectrum prediction technology, the problem of long-term dependence of the application of a better LSTM and a variant structure thereof on an input sequence is not solved well, and the correlation information storage and conversion capability of a sequence transduction model-Seq-to-Seq are limited by the length of an intermediate vector. In a practical environment, the allocation of spectrum resources is completed under a plurality of channels, and the occupation condition of the final spectrum resources is affected by a spectrum allocation strategy and user behaviors, so that the correlation of the spectrum resources is reflected in time, and the correlation exists between the channels. In addition, if only single-step prediction is performed, only the channel occupation condition in one time slot in the future is scanned each time, frequent prediction and sensing are needed, and when the frequency spectrum is switched, the short-step prediction forces the user to frequently switch channels, and SU also needs to make decisions according to own service requirements for many times, so that the mode has low efficiency and weak practicability.
Disclosure of Invention
Aiming at the problem of how to more accurately predict the spectrum occupation state at as many times as possible in the future, the invention provides a spectrum prediction method based on an attention mechanism.
The invention discloses a spectrum prediction method based on an attention mechanism, which comprises the following steps:
s1, establishing a spectrum prediction network, wherein the spectrum prediction network is formed by embedding a gating recursion unit in each sub-module of a transducer model based on an attention mechanism, the length of the gating recursion unit is equal to the length of an input sequence, and in a coding sub-module and a decoding sub-module, the input sequence firstly enters the gating recursion unit, and the gating recursion unit performs local correlation extraction on spectrum occupation information and outputs an information pre-extraction result with a position code;
s2, taking frequency spectrum occupation state data sequenced in a channel priority descending mode as a training set, wherein input data in the training set comprises an input sequence and an output sequence, wherein the input sequence is a sequence of historical frequency spectrum information, the output sequence is a future frequency spectrum information sequence shifted by one bit to the right, and the output data in the training set is future frequency spectrum information;
s3, training the spectrum prediction network by using a training set;
s4, predicting: and taking the current spectrum information as an input sequence, taking the spectrum information at the last moment in the input sequence as the first input of the output sequence of the decoding submodule, and carrying out spectrum prediction in an autoregressive mode by utilizing a trained spectrum prediction network.
Preferably, the attention mechanisms in the coding submodule are multi-head attention mechanisms, two attention mechanisms in the decoding submodule are a cross multi-head attention mechanism and a shielding multi-head attention mechanism respectively, global correlation extraction is performed by using the attention mechanisms on the basis of local information extraction by the gating recursive unit, wherein the multi-head attention mechanism in the coding submodule is used for extracting global correlation among historical information, the shielding multi-head attention mechanism in the decoding submodule is used for extracting correlation of future time slot frequency spectrum occupation, and the cross multi-head attention mechanism is used for extracting correlation between historical frequency spectrum information and future frequency spectrum information.
Preferably, the nth coding submodule is:
for the output of the gating recursion unit t in the nth coding submodule,/>Representing the output at time t of the nth encoding submodule; LSTM () represents a gating recursion unit, layerNorm () represents layer normalization, multi-head () represents multi-head attention, and FFN () represents FFN transformation using a scaled dot product attention scoring function.
Preferably, the nth codon module is:
wherein,then the MaskMultiHead () represents the masked multi-headed attention mechanism, using the scaled dot product attention scoring function, representing the output at time t of the nth decoding submodule.
Preferably, the normalization in the transducer model is layer normalization.
Preferably, a dropout mechanism is added in the training process of the spectrum prediction network.
Preferably, the forward propagation formula of the gating recursion unit is:
wherein, as follows, the Hadamard product operator, gating cell i t 、f t 、o t C) corresponding to the outputs of the input gate, the forget gate and the output gate respectively t Represents the output of the memory cell at the current moment, h t-1 Representing the hidden state, x, of the last moment t Indicating whenThe input of the previous moment in time,and->Respectively corresponding to the weight matrix of the input sequence in the input gate and the memory unit when extracting the characteristics,and->Weight matrix corresponding to the input sequence in forgetting gate and memory unit for feature extraction>And->B, respectively corresponding to the input sequence in the output gate and the weight matrix when the memory unit performs feature extraction i 、b f 、b o 、b c Bias term, sigma, & lt/EN for each gate>Representing an activation function.
Preferably, the feedforward neural network in the transducer model is:
FFN(x)=w 2 relu(w 1 x+b 1 )+b 2
wherein FFN (x) is the output of the feedforward neural network, x is the input, w 1 、w 2 As a weight matrix, b 1 、b 2 As a bias term, relu () is an activation function.
The invention fuses the recursion structure unit into each sub-module of the transducer model based on the attention mechanism, utilizes the superior local correlation processing capacity of the recursion structure and the characteristic of the output with position codes, complements the advantages of parallelization processing data and efficient global information extraction capacity of the transducer model, overcomes the long-term dependence problem of LSTM and the defect that the transducer model is easy to generate overfitting, and particularly realizes high-accuracy spectrum prediction in multi-channel multi-step prediction closer to the actual environment.
Drawings
FIG. 1 is a schematic diagram of a prediction mode of a model for multi-channel and multi-step prediction;
FIG. 2 is an overall block diagram of an LSTM-transducer model;
FIG. 3 is a block diagram of a gating recursion unit in an LSTM-transducer model;
FIG. 4 is a schematic diagram of an algorithmic implementation of the attention mechanism;
fig. 5 is a diagram of a multi-head attention mechanism structure after parallel operation modification of the attention mechanism.
FIG. 6 is a graph of multi-channel and multi-step predictions showing model superiority, with the abscissa representing prediction step size and the ordinate representing accuracy.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
It should be noted that, without conflict, the embodiments of the present invention and features of the embodiments may be combined with each other.
The invention is further described below with reference to the drawings and specific examples, which are not intended to be limiting.
Description of the problem:
in an actual radio environment, the allocation policy of spectrum resources and the frequency usage behavior of users all cause the occupation states of channels to have interdependence. Therefore, here, an M/G/5 queuing theory model close to the real environment will be adopted, and the input process is assumed to obey the poisson distribution with the parameter lambda, namely, the number x of users is reached to obey the poisson distribution in the t time period, and the distribution probability is as follows:
λ represents the number of users that arrive in average per unit time, and x represents the number of users that actually arrive per unit time. In this case, the time interval compliance parameter for the arrival of two adjacent users isThe probability density of the exponential distribution of (a):
i.e. the average time interval that the adjacent two users arrive. Assuming that the service time obeys a general service time distribution with a parameter mu, the distribution probability is:
P(x=k)=μ(1-μ) k-1 ,k=1,2,…,N
μ represents the probability that the user may be served in a unit time, i.e. the average number of users served in a unit time,representing the average time a user occupies a channel, and x represents the time the user actually occupies the channel.
Based on the data set, a hypothetical priority allocation rule is added for queuing theory, and priorities of 5 channels are sequentially reduced, so that spectrum occupation state data is generated and used as a data set for model training and testing, and therefore interdependence relations among channels in an actual environment are simulated. The spectrum occupation state of n historical moments before 5 channels is used as historical referenceable information, wherein the state is used for indicating that the channels are unoccupied by 0, and the state is used for indicating that the channels are occupied by 1. The prediction mode is shown in fig. 1, where the first box represents the history information referenced by the model and the second box represents the information of the future time to be predicted. The objective of this embodiment is to predict the spectrum occupancy state at as many times as possible in the future with higher accuracy after analyzing the time and the correlation between channels of a certain number of historical spectrum occupancy states. However, today, applying the mature LSTM-based Seq-to-Seq model in the field of spectrum prediction to express correlation between historic information is subject to the length of the intermediate vector and the application form of the intermediate vector in the decoding submodule. The transducer model based on the attention mechanism is good in the natural language processing field, but after the transducer model is transferred to the spectrum prediction field, the problems of fitting, loss of relative position information in the calculation process of the attention mechanism and the like easily occur, so that the future multi-step spectrum occupation information cannot be predicted with high precision.
Based on the above problems, the present embodiment constructs an LSTM-fransformer model with a combination of a temporal recursive structure and an attention mechanism, inserts a gating recursive unit equal to the length of the history information into each sub-module of the fransformer, performs local correlation extraction on the spectrum occupation information by using the gating recursive unit, and outputs an information pre-extraction result with a position code. And the long-term complementation effect is realized by utilizing the capability of the transducer model for efficiently calculating the correlation in parallel and the global correlation extraction capability. The model not only makes up the defect of LSTM in the long-term dependence treatment, but also relieves the overfitting problem of the transducer model, and greatly improves the accuracy of spectrum prediction. The spectrum prediction method based on the attention mechanism of the embodiment specifically comprises the following steps:
step 1, establishing a spectrum prediction network: the LSTM-transducer model is shown in FIG. 2. The spectrum prediction network is characterized in that a gating recursion unit is embedded in each sub-module of a transducer model based on an attention mechanism, the length of the gating recursion unit is equal to the length of an input sequence, the chunk_size and pre_step_size parameters of the model are modified according to the required window length and the prediction step length, in a coding sub-module and a decoding sub-module, the input sequence is firstly input into the gating recursion unit, and the gating recursion unit performs local correlation extraction on spectrum occupation information and outputs an information pre-extraction result with position codes;
step 2, taking frequency spectrum occupation state data sequenced in a channel priority descending mode as a training set, wherein input data in the training set comprises an input sequence and an output sequence, wherein the input sequence is a sequence of historical frequency spectrum information, the output sequence is a future frequency spectrum information sequence shifted by one bit to the right, and the output data in the training set is future frequency spectrum information;
step 3, training the spectrum prediction network by using a training set;
and 4, predicting: and taking the current spectrum information as an input sequence, taking the spectrum information at the last moment in the input sequence as the first input of the output sequence of the decoding submodule, and carrying out spectrum prediction in an autoregressive mode by utilizing a trained spectrum prediction network. And checking the output result of the decoding submodule, namely, the predicted value of the spectrum occupation condition in the future pre_step_size time slots.
The spectrum prediction network of the present embodiment can maintain multi-step prediction accuracy. The basic idea is to fuse the gating recursion unit with a Transformer architecture based on the attention mechanism completely, and make up the defects of the two structures by utilizing the complementary advantages of the gating recursion unit and the Transformer architecture. Conventional multi-channel spectrum prediction often adopts a Seq-to-Seq structure based on LSTM and LSTM variants, the prediction accuracy of this approach is not high enough, and as the prediction step increases, the prediction accuracy drops dramatically, which is contrary to the multi-step prediction requirements in an actual radio environment. Therefore, in order to acquire the spectrum occupation condition of a plurality of time slots in the future at one time and ensure that the prediction accuracy is high enough, the embodiment firstly uses the superior local information extraction capability of the gating recursion unit and the characteristic of the output self-contained position code to pre-extract the correlation between the grasped information and the unknown information and increase the richness for the input of a transducer. Meanwhile, the high-accuracy multi-channel multi-step prediction is realized by utilizing the high-efficiency global information extraction capability of the transducer model, so that the utilization efficiency of spectrum resources is improved. The prediction accuracy obtained by the model is higher than that of the traditional model, and the accuracy can be maintained to be higher in the future prediction of tens of steps, so that the working efficiency of the subsequent cognitive radio is greatly improved, and the utilization rate of spectrum resources is further improved.
In the present embodiment, if the gating recursion unit outputs the information after the local time correlation extraction as the output result, it is not necessary to consider whether the correlation at a remote distance can be extracted. Meanwhile, the output result is provided with position coding information, and no position coding is required to be additionally added when the output result is used as the input of a transducer model. And the correlation pre-extraction is equivalent to increasing the richness of the data set of the transducer model, and reducing the possibility of over-fitting phenomenon. The transducer layer can supplement the correlation information learned by the gating recursion unit layer and extract the long-term dependence information. The model also follows the coding submodule-decoding submodule structure, and the nth coding submodule is:
for the output of the gating recursion unit t in the nth coding submodule,/>Representing the output at time t of the nth encoding submodule; LSTM () represents a gating recursion unit, layerNorm () represents layer normalization, multi-head () represents multi-head attention, and FFN () represents FFN transformation using a scaled dot product attention scoring function;
the nth codon module is:
wherein,then the MaskMultiHead () represents the masked multi-headed attention mechanism, using the scaled dot product attention scoring function, representing the output at time t of the nth decoding submodule. The form of the attention mechanism adopted in the coding submodule and the decoding submodule is the same as that of the transducer model.
The structure of the gating recursion unit in the model is shown in FIG. 3, and three inputs are provided at each time, namely the hidden state h at the previous time t-1 Memory cell c t-1 And input x at the current time t . The gating is three types: forget gate, input gate, output gate. They determine the last memory cell c by means of a sigmoid function and a dot product operation t-1 How much information is to be retained and input information x t And the last hidden state h t-1 How much is added to the current memory cell c t The current memory cell c t How much to be output or to be the next hidden state h t . The specific form of the gate control unit is as follows:
g(x)=σ(ωx+b)
the obtained real value is mapped between 0 and 1 through a sigmoid function to represent the preservation or discarding of the information at the last moment. If the value of g (x) is close to 0, no information passes, and if it is close to 1, all information passes.
The forward propagation formula of the gating recursion unit is:
wherein, as follows, the Hadamard product operator, gating cell i t ,f t ,o t Corresponding to input door, forget door and output door c t A memory unit h representing the current time t-1 Representing the hidden state, x, of the last moment t An input representing the current moment in time is presented,andweight matrixes respectively corresponding to the input sequence in the input gate and the memory unit for feature extraction, and other weight matrixes are also expressed in a similar form, b i 、b f 、b o 、b c Bias terms for each gate. Sigma, & gt>Representing an activation function. Wherein, the sigma multipurpose sigmoid function controls the gate valve between 0 and 1 to describe the passing amount of information and the ++>The tan h or the Relu function is used in multiple ways and is selected according to practical situations.
The attention mechanisms in the coding sub-module of the embodiment are multi-head attention mechanisms, two attention mechanisms in the decoding sub-module are a cross multi-head attention mechanism and a shielding multi-head attention mechanism respectively, global correlation extraction is performed by using the attention mechanisms on the basis of local information extraction by the gating recursion unit, wherein the multi-head attention mechanism in the coding sub-module is used for extracting global correlation between historical information, the shielding multi-head attention mechanism in the decoding sub-module is used for extracting correlation of future time slot spectrum occupation, and the cross multi-head attention mechanism is used for extracting correlation between historical spectrum information and future spectrum information.
The attention mechanism is the part in which the most important capture sequences are related, and the structure is shown in FIG. 4. The attention mechanism may be described as a correlation calculation of a query and a set of keys (keys) to obtain an attention score value, i.e. an attention weight, from which the values (values) are weighted together. The attention weight is calculated by an attention scoring function. The calculation formula of the obtained output is:
where a is an attention scoring function, and the value obtained by the function is converted into an attention weight of 1 after softmax.
There are generally two types of calculation of the attention scoring function: additive attention mechanisms and dot product attention mechanisms. The additive attention mechanism can effectively summarize important information in a sequence of linear complexity, and when the query and key vector lengths are different, this approach is typically chosen as the scoring function. Since matrix multiplication is implemented in a number of efficient ways, dot product attention mechanisms are more computationally efficient and are more widely used, but require the vector lengths of the query and key to be the same. Assuming that all sequences in the query and the key are random variables with the mean value of 0, the variance of 1 and mutual independence, the mean value of dot product results of the vector is 0, and the variance of d is the vector dimension. To make the variance independent of the vector dimension, the value of the dot product is divided byThe attention score value with the mean value of 0 and the variance of 1 and not constrained by vector dimension can be obtained. The general formula for the scaled dot product attention scoring function is as follows:
to improve parallelism, modifications are made on the basis of scaled dot product attention, resulting in a multi-headed attention mechanism, the structure of which is shown in FIG. 5. Firstly, linearly transforming the query, the key and the value and cutting into a plurality of parts with the same dimensions, namely, respectively performing scaling dot product attention calculation on each part, and splicing the obtained results to obtain a linear transformation result, wherein the calculation formula is as follows:
the decoding submodule part also uses a masked multi-head attention layer. This is because the entire right shifted output sequence is taken as input to the decoding submodule at once during the training process. In the actual prediction process, when the ith vector is predicted, the vectors after i are unknown. Therefore, there is a need to mask the relevance of the vector after i, i.e. the attention weight value of this part, to avoid "cheating" behavior of the model during prediction.
The calculation formula of the feedforward neural network part is as follows:
FFN(x)=w 2 relu(w 1 x+b 1 )+b 2
the FFN layer contains only one hidden layer and Relu is selected as the activation function.
In the transform block, a residual connection is used, i.e. the output result of the current layer is added to the value input to the layer, so that the effect obtained by the network with a deeper layer is ensured not to be poorer than the effect obtained by the network with a shallower layer.
The application of the attention mechanism in the transducer block is largely divided into three parts. The attention mechanisms of the coding submodules all take the form of a multi-head attention mechanism. Wherein all queries, keys and values are derived from the output of a layer above the coding sub-module. In the decoding submodule, two attention mechanisms are used, namely a cross multi-head attention mechanism and a shielding multi-head attention mechanism. The query of the cross-over multi-head attention mechanism is derived from the output of the layer preceding the decoding submodule, and the key and value are derived from the output of the encoding submodule. The query, key and value of the masked multi-headed attention mechanism are all derived from the output of the previous layer of the decoding submodule.
The normalization in the transducer model of this embodiment is layer normalization (Layer Normalization, LN) and differs from the batch normalization (Batch Normalization, BN) commonly used in CNN. BN is the normalization of the input in all batches of individual neurons in a layer, which is limited by the size of batch size. When the batch_size is smaller, only a small amount of data is normalized, and the obtained result cannot embody the integral characteristic. In addition, like time series, the net input distribution of a neuron is dynamically changing in the neural network and batch normalization operations cannot be used. Instead, LN normalizes all neurons in each batch of each layer separately, without limitation of the sequence length in each batch, and thus is more suitable for time series structures.
In the embodiment, a dropout mechanism is added in the training process of the spectrum prediction network. Considering that the transducer part is easy to be over-fitted, a dropout mechanism is added for the purpose, and meanwhile, the problem of co-adaptability among network nodes is solved. Because different nodes in the network have different characterization capacities, the node with stronger characterization capacity can be continuously strengthened along with the increase of training times, and the node with weak characterization capacity can be continuously weakened until the node with weak characterization capacity can be ignored. This is equivalent to only some nodes in the network being trained, wasting depth and width resources of the network, and limiting model training effect. Dropout can be interpreted as a random discarding of some neurons with a certain probability during model training, in other words, each training can train to a different neuron, and since two neurons are not necessarily retained in one training at the same time, the weight and bias parameter updates in the network do not affect each other. The mechanism breaks through the problem of co-adaptability among neurons, so that the network learning is more robust.
The calculation formula of the feedforward neural network part is as follows:
FFN(x)=w 2 relu(w 1 x+b 1 )+b 2
the FFN layer contains only one hidden layer and Relu is selected as the activation function.
In the transform block, a residual connection is used, i.e. the output result of the current layer is added to the value input to the layer, so that the effect obtained by the network with a deeper layer is ensured not to be poorer than the effect obtained by the network with a shallower layer.
In this embodiment, all the modules are fused to construct an LSTM-transducer model. And using spectrum occupation information of 5 channels generated by an M/G/5 model as a data set, and dividing a training set, a verification set and a test set according to a ratio of 6:2:2. The LSTM-based Seq-to-Seq model, the transducer model, and the LSTM-transducer model are trained separately. In the three models, adam with wide application and excellent application is adopted as an optimizer, and the initial learning rate is set to be 10 -3 The loss functions all adopt mean square error lossAnd selecting the prediction accuracy as an evaluation index. Setting the number of hidden units in the Seq-to-Seq model as 200, and setting the learning rate attenuation of the optimizer as 10 -6 Training was performed 100 times. In the transducer model, the number of hidden layer units is set to be 200, the batch size is 128, and the learning rate attenuation is changed to be 5×10 -6 Training was performed 200 times. For the LSTM-transducer model, the number of hidden layer units was set to 256, the batch size was 64, and the learning rate decay of the optimizer was the same as that of the transducer model, and trained 200 times. In each model, the window length corresponding to the highest prediction accuracy is selected, and in our model, the window length is selected to be 30, namely, the historical time of 30 steps and the correlation information among channels are referenced, so that the channel occupation condition in the future 30 time slots is predicted.
The multi-channel prediction accuracy results of the three on the verification set are compared, as shown in fig. 6. Therefore, the LSTM-transducer model constructed in the embodiment shows excellent performance in multi-channel multi-step spectrum prediction, and when the prediction step length is long, the prediction accuracy is still kept at about 98%, the requirements on the prediction accuracy in an actual radio environment are met, and the efficient performance of the follow-up operation (spectrum sensing, spectrum decision, spectrum sharing and spectrum switching) of the cognitive radio is ensured.
Although the invention herein has been described with reference to particular embodiments, it is to be understood that these embodiments are merely illustrative of the principles and applications of the present invention. It is therefore to be understood that numerous modifications may be made to the illustrative embodiments and that other arrangements may be devised without departing from the spirit and scope of the present invention as defined by the appended claims. It should be understood that the different dependent claims and the features described herein may be combined in ways other than as described in the original claims. It is also to be understood that features described in connection with separate embodiments may be used in other described embodiments.

Claims (10)

1. A method of spectrum prediction based on an attention mechanism, the method comprising:
s1, establishing a spectrum prediction network, wherein the spectrum prediction network is formed by embedding a gating recursion unit in each sub-module of a transducer model based on an attention mechanism, the length of the gating recursion unit is equal to the length of an input sequence, and in a coding sub-module and a decoding sub-module, the input sequence firstly enters the gating recursion unit, and the gating recursion unit performs local correlation extraction on spectrum occupation information and outputs an information pre-extraction result with a position code;
s2, taking frequency spectrum occupation state data sequenced in a channel priority descending mode as a training set, wherein input data in the training set comprises an input sequence and an output sequence, wherein the input sequence is a sequence of historical frequency spectrum information, the output sequence is a future frequency spectrum information sequence shifted by one bit to the right, and the output data in the training set is future frequency spectrum information;
s3, training the spectrum prediction network by using a training set;
s4, predicting: and taking the current spectrum information as an input sequence, taking the spectrum information at the last moment in the input sequence as the first input of the decoder output sequence, and performing spectrum prediction in an autoregressive mode by utilizing a trained spectrum prediction network.
2. The attention mechanism-based spectrum prediction method as claimed in claim 1, wherein the attention mechanisms in the coding sub-module are multi-headed attention mechanisms, the two attention mechanisms in the decoding sub-module are respectively a cross multi-headed attention mechanism and a blocked multi-headed attention mechanism, global correlation extraction is performed by using the attention mechanisms on the basis of local information extraction by the gating recursion unit, wherein the multi-headed attention mechanism in the coding sub-module is used for extracting global correlation between history information, the blocked multi-headed attention mechanism in the decoding sub-module is used for extracting correlation between future time slot spectrum occupation, and the cross multi-headed attention mechanism is used for extracting correlation between history spectrum information and future spectrum information.
3. The attention-based spectrum prediction method of claim 2, wherein the nth coding submodule is:
for the output of the gating recursion unit t in the nth coding submodule,/>Representing the output at time t of the nth encoding submodule; LSTM () represents a gating recursion unit, layerNorm () represents layer normalization, multi-head () represents multi-head attention, and FFN () represents FFN transformation using a scaled dot product attention scoring function.
4. A method of spectrum prediction based on an attention mechanism as claimed in claim 3, wherein the nth decoding submodule is:
wherein,then the MaskMultiHead () represents the masked multi-headed attention mechanism, using the scaled dot product attention scoring function, representing the output at time t of the nth decoding submodule.
5. The attention-based spectrum prediction method of claim 2, wherein the normalization in the transducer model is layer normalization.
6. The attention-based spectrum prediction method as recited in claim 2, wherein a dropout mechanism is added during training of the spectrum prediction network.
7. The attention-based spectrum prediction method of claim 1, wherein the forward propagation formula of the gating recursion unit is:
wherein, as follows, the Hadamard product operator, gating cell i t 、f t 、o t C) corresponding to the outputs of the input gate, the forget gate and the output gate respectively t Represents the output of the memory cell at the current moment, h t-1 Representing the hidden state, x, of the last moment t An input representing the current moment in time is presented,and->Weight matrix corresponding to the input sequence in the input gate and the memory unit for feature extraction respectively, < ->Andweight matrix corresponding to the input sequence in forgetting gate and memory unit for feature extraction>And->B, respectively corresponding to the input sequence in the output gate and the weight matrix when the memory unit performs feature extraction i 、b f 、b o 、b c Bias term, sigma, & lt/EN for each gate>Representing an activation function.
8. The attention-based spectrum prediction method of claim 1, wherein the feedforward neural network in the transducer model is:
FFN(x)=w 2 relu(w 1 x+b 1 )+b 2
wherein FFN (x) is the output of the feedforward neural network, x is the input, w 1 、w 2 As a weight matrix, b 1 、b 2 As a bias term, relu () is an activation function.
9. A computer-readable storage device storing a computer program, characterized in that the computer program when executed implements the attention-based spectrum prediction method according to any of claims 1 to 8.
10. An attention-based spectrum prediction apparatus comprising a storage device, a processor and a computer program stored in the storage device and executable on the processor, wherein execution of the computer program by the processor implements the attention-based spectrum prediction method as claimed in any one of claims 1 to 8.
CN202311379839.6A 2023-10-23 2023-10-23 Spectrum prediction method based on attention mechanism Pending CN117278154A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311379839.6A CN117278154A (en) 2023-10-23 2023-10-23 Spectrum prediction method based on attention mechanism

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311379839.6A CN117278154A (en) 2023-10-23 2023-10-23 Spectrum prediction method based on attention mechanism

Publications (1)

Publication Number Publication Date
CN117278154A true CN117278154A (en) 2023-12-22

Family

ID=89206201

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311379839.6A Pending CN117278154A (en) 2023-10-23 2023-10-23 Spectrum prediction method based on attention mechanism

Country Status (1)

Country Link
CN (1) CN117278154A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118233035A (en) * 2024-05-27 2024-06-21 烟台大学 Multiband spectrum prediction method and system based on graph convolution inversion transform

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118233035A (en) * 2024-05-27 2024-06-21 烟台大学 Multiband spectrum prediction method and system based on graph convolution inversion transform

Similar Documents

Publication Publication Date Title
CN109754113B (en) Load prediction method based on dynamic time warping and long-and-short time memory
Whitehead Genetic evolution of radial basis function coverage using orthogonal niches
CN114169330A (en) Chinese named entity identification method fusing time sequence convolution and Transformer encoder
CN110442721B (en) Neural network language model, training method, device and storage medium
CN112733444A (en) Multistep long time sequence prediction method based on CycleGAN neural network
CN117278154A (en) Spectrum prediction method based on attention mechanism
JP7020547B2 (en) Information processing equipment, control methods, and programs
CN110633473B (en) Implicit discourse relation identification method and system based on conditional random field
CN104504442A (en) Neural network optimization method
CN112766496B (en) Deep learning model safety guarantee compression method and device based on reinforcement learning
CN113935513A (en) CEEMDAN-based short-term power load prediction method
CN114328048A (en) Disk fault prediction method and device
WO2023179609A1 (en) Data processing method and apparatus
CN114528835A (en) Semi-supervised specialized term extraction method, medium and equipment based on interval discrimination
CN114154758B (en) Knowledge-graph-based molecular regulation and control relation prediction method and system
CN118132674A (en) Text information extraction method based on large language model and high-efficiency parameter fine adjustment
Cohen et al. Cross-validation conformal risk control
Wu et al. Discovering Mathematical Expressions Through DeepSymNet: A Classification-Based Symbolic Regression Framework
JPWO2019167240A1 (en) Information processing equipment, control methods, and programs
CN115794880A (en) Approximate query processing-oriented sum-product network and residual error neural network hybrid model
CN115081609A (en) Acceleration method in intelligent decision, terminal equipment and storage medium
CN115116619A (en) Intelligent analysis method and system for stroke data distribution rule
CN114444517A (en) Intelligent law judgment method for numerical perception with increased sentencing standard knowledge
CN114116692A (en) Missing POI track completion method based on mask and bidirectional model
Walke et al. Learning finite linear temporal logic specifications with a specialized neural operator

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination