CN116709409A

CN116709409A - Knowledge distillation-based lightweight spectrum prediction method

Info

Publication number: CN116709409A
Application number: CN202310808568.5A
Authority: CN
Inventors: 张建照; 邓俊荃; 程润梦; 柳永祥
Original assignee: National University of Defense Technology
Current assignee: National University of Defense Technology
Priority date: 2023-07-04
Filing date: 2023-07-04
Publication date: 2023-09-05

Abstract

The application relates to the technical field of electromagnetic spectrum management and control, in particular to a knowledge distillation-based lightweight spectrum prediction method, which comprises the following steps of: s1, preprocessing received spectrum data; s2, determining the step length of input data according to the time domain autocorrelation of the acquired frequency spectrum data; s3, constructing a spectrum prediction network model based on a knowledge distillation strategy; s4, training the frequency spectrum prediction network model by using training set data; s5, judging whether training is completed or not; if yes, inputting training set data into the frequency spectrum prediction network model after training is completed, obtaining output of a prediction result and ending a training flow; if not, after the iteration number of training is added by one, returning to S4. The application introduces knowledge in the technical field of knowledge distillation, overcomes the defect of complex model of the traditional deep learning network, and has good applicability under the condition of small samples.

Description

Knowledge distillation-based lightweight spectrum prediction method

Technical Field

The application relates to the technical field of electromagnetic spectrum management and control, in particular to a knowledge distillation-based lightweight spectrum prediction method.

Background

The spectrum prediction technology is considered as an effective technology for supplementing spectrum sensing, and usually, on the basis of analyzing the characteristic parameters of a idle spectrum and the usage rules of authorized users, a Secondary User (SU) is assisted to select an optimal usage policy so as to switch the Secondary User to a proper idle spectrum in time for data transmission, thereby reducing communication delay and improving channel throughput.

Spectral prediction techniques are still in the development stage at present, but some achievements such as moving average autoregressive models, hidden markov models, minimum bayesian risks, multi-layer perceptron, cyclic neural networks, convolutional long short-term memory neural networks, and the like have been achieved.

The Chinese patent document with publication number of CN113840297A discloses a frequency spectrum prediction method based on the driving of a radio frequency machine learning model, which comprises the following implementation steps: collecting spectrum data; preprocessing frequency spectrum data; determining the step length of input data according to the red pool information criterion for model order determination; expanding the linear combination structure of the autoregressive model into a multilayer network; adding new trainable parameters into a network to construct a frequency spectrum prediction framework driven by a radio frequency machine learning model; training a network model using the training data; judging whether the training of the network is finished; inputting test set data into a network; and outputting a prediction result.

However, most of the existing studies assume that training samples are ideal and complete, and in practical applications, when a Cognitive Radio (CR) device frequently switches a target frequency or electromagnetic environment changes, it is difficult to ensure that the model has good performance due to a relative lack of expected historical data.

The problem of high complexity of the existing model is paid attention as another important point when aiming at the problem of spectrum prediction based on deep learning. Complex models are often difficult to implement to practical deployments due to limited storage and computational resources, which place higher demands on the scale of the model.

Disclosure of Invention

In view of the above-mentioned shortcomings, it is an object of the present application to provide a method, system, device and storage medium for knowledge-based distillation based spectrum prediction.

The application provides the following technical scheme:

in a first aspect, the present application provides a knowledge distillation-based spectrum prediction method, comprising the steps of:

s1, preprocessing received spectrum data;

s2, determining the step length of input data according to the time domain autocorrelation of the acquired frequency spectrum data;

s3, constructing a spectrum prediction network model based on a knowledge distillation strategy;

s4, training the frequency spectrum prediction network model by using training set data;

s5, judging whether training is completed or not; if yes, inputting training set data into the frequency spectrum prediction network model after training is completed, obtaining output of a prediction result and ending a training flow; if not, after the iteration number of training is added by one, returning to S4.

As an optional implementation manner of the first aspect, preprocessing the received spectrum data includes the following steps:

s1.1, according to the received frequency spectrum data, converting the power spectrum density value of the collected signal into a power value capable of being weighted and averaged;

s1.2, taking continuous H data in a time dimension for weighted average, and calculating to obtain final spectrum data.

Further, in S1.1, the power spectral density value of the collected signal is converted into a power value that can be weighted averaged using the following formula:

in S1.2, let H be 100, the final spectral data is obtained by calculation according to the following formula:

DBm＝10log ₁₀ (mW)。

as an optional implementation manner of the first aspect, in S2, the autocorrelation coefficient of the spectrum signal is calculated using the following formula:

wherein x is _t Spectral data expressed as time t, cov (·) represents covariance, σ represents sample standard deviation;

and selecting the corresponding sliding window size with the autocorrelation coefficient larger than 0.8 as the step size c of the input data.

Further, when the adaptive model performs timing prediction, a sliding window with a length of c+1 is used to predict the frequency spectrum data f _n Dividing the first c data s _t ＝{x _t-c+1 ,x _t-c+2 ,...,x _t As input data, c+1th data as data y to be predicted _t ＝x _t+1 The data of the nth frequency point is converted intoWhere m=t-c, T is the spectral data f _n Is a combination of the total length of (a) and (b).

As an optional implementation manner of the first aspect, in S3, the self-migration optimization is performed on the time convolution network to be used as a teacher model, the dual-branch neural network is constructed to be used as a student model, and the spectrum prediction network model based on the knowledge distillation strategy is constructed according to the teacher model and the student model.

Further, the method for performing self-migration optimization on the time convolution network as a teacher model comprises the following steps: the collected frequency spectrum data is used for pre-training the time convolution network to obtain the time convolution network with high accuracy and huge parameter quantity; and freezing part of the layers of the convolutional neural network to be used as a network to be trained, and training the unfrozen layers of the network again.

Further, the method for constructing the dual-branch neural network as the student model by combining the encoder structure comprises the following steps:

by constructing coding-reconstruction branches f _rec And coding-prediction branch f _pre Forming a double-branch neural network, wherein each branch has a corresponding loss function, and training to obtain optimized network parameters by a method of minimizing the loss function;

the loss function calculation formula of the coding-reconstruction branch is as follows:

wherein θ ₁ θ, the parameter of the encoder ₂ For the reconstructor parameters s _t ' output for characteristics of middle layer of teacher network, s _t Is a true value label, f _rec (s _t ′；θ ₁ ,θ ₂ ) The final output of the encode-reconstruct branch;

the loss function calculation formula of the coding-prediction branch is:

wherein θ ₁ θ, the parameter of the encoder ₃ For predictor parameters, y _t Is the true spectral value, f _pred (s _t ；θ ₁ ,θ ₃ ) To predict future spectral values.

Further, the method for constructing the spectrum prediction network model based on the knowledge distillation strategy according to the teacher model and the student model comprises the following steps:

the soft target loss function is constructed to reduce the difference between the output of the student model and the output of the teacher model, and the soft target loss calculation formula is as follows:

wherein f ^T (. Cndot.) is the teacher model, f ^T (s _t ) For soft target, teacher model outputs result, f ^S (. Cndot.) is a student model, f ^S (s _t ') is the prediction result of the middle layer output of the time convolution network as the input of the student model;

constructing a hard target loss function to reduce the difference between a student model and a true value, wherein a hard target loss calculation formula is as follows:

L _hard ＝L _rec +L _pre 。

further, in S4, the method for training the spectrum prediction network model by using the training set data includes: randomly initializing trainable parameters of the dual-branch neural network; setting iteration times and learning rate, and taking an RMSprop optimization algorithm as a network training optimizer; let the value of the iteration number epoch equal to 1; inputting training data into a frequency spectrum prediction network model in batches for training, and back-propagating training errors of each batch to optimize network parameters; the error loss is calculated as follows using the constructed objective loss function:

L _total ＝αL _soft +(1-α)L _hard ，

where α is a coefficient for adjusting the balance weight between the hard target loss and the soft target loss;

when all the batches of data in the training data are back-propagated, the epoch=epoch+1 is made, and back-propagation is continued until the epoch reaches the maximum iteration number.

In a second aspect, the application provides a knowledge distillation-based spectrum prediction system, which comprises a fusion center, a processing module and a prediction module; the processing module is used for preprocessing the frequency spectrum data received by the fusion center and determining the step length of the input data according to the time domain autocorrelation of the acquired frequency spectrum data;

the prediction module is used for constructing a spectrum prediction network model based on a knowledge distillation strategy, training the spectrum prediction network model by utilizing training set data and judging whether training is completed or not; if yes, inputting training set data into the frequency spectrum prediction network model after training is completed, obtaining output of a prediction result and ending a training flow; if not, after the iteration times of training are added by one, training the frequency spectrum prediction network model by utilizing the training set data.

In a third aspect, the present application proposes a computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing a knowledge-based distillation spectrum prediction method as in the first aspect or any implementation of the first aspect when executing the computer program.

In a fourth aspect, the present application proposes a computer storage medium storing a computer program which, when executed by a processor, implements a knowledge-based distillation spectrum prediction method as in the first aspect or any implementation of the first aspect.

Compared with the prior art, the technical scheme provided by the application has the following beneficial effects:

firstly, the knowledge distillation technology is introduced in the field of frequency spectrum prediction, and compared with the traditional deep learning framework, the method can reduce parameters to be trained and improve the model operation speed while guaranteeing the model prediction performance.

Secondly, the application optimizes the time convolution network parameters by adopting a self-migration technology, and only updates partial weights by freezing certain layers of the network, thereby avoiding the problem of hidden layer information carry-over caused by high void factors. Constructing a dual-branch neural network model, using a low-dimensional representation of data for both code reconstruction and code prediction, and using an intrinsic representation of the data allows the proposed model to achieve more efficient predictions.

Third, compared to models trained using large-scale data, the present application does not require the ideal assumption of adequate training data, and the model is adapted to small amounts of spectral data using knowledge distillation techniques, which can be applied in future practical communication systems.

Drawings

FIG. 1 is a flow chart of a method of knowledge-based distillation spectrum prediction in an embodiment of the application;

FIG. 2 is a diagram of a predictive network framework in accordance with the present application;

FIG. 3 is a diagram comparing the self-migration optimization method of the present application with other existing spectrum prediction methods;

FIG. 4 is a graph showing the comparison of training convergence rates of the spectrum prediction method of the present application and other existing spectrum prediction methods;

fig. 5 is a graph comparing prediction curves of the spectrum prediction method of the present application and other existing spectrum prediction methods.

Detailed Description

For a further understanding of the present application, the present application will be described in detail with reference to the drawings and examples.

The structures, proportions, sizes, etc. shown in the drawings are shown only in connection with the present disclosure, and are not intended to limit the scope of the application, since any modification, variation in proportions, or adjustment of the size, etc. of the structures, proportions, etc. should be considered as falling within the spirit and scope of the application, without affecting the effect or achievement of the objective. Also, the terms "upper", "lower", "left", "right", "middle", and the like are used herein for descriptive purposes only and are not intended to limit the scope of the application for modification or adjustment of the relative relationships thereof, as they are also considered within the scope of the application without substantial modification to the technical context.

Example 1

As shown in fig. 1, a knowledge distillation-based spectrum prediction method includes the following steps:

s1, preprocessing received spectrum data;

The application introduces knowledge in the technical field of knowledge distillation, overcomes the defect of complex model of the traditional deep learning network, and has good applicability under the condition of small samples; the network structure is simple, the trainable parameters are few, the network training speed is accelerated, and the network training method can be applied to actual deployment of communication scenes; compared with the traditional deep learning network model, the spectrum prediction performance is improved.

Example two

In this embodiment, a typical centralized cognitive radio network (CR) is composed of one Fusion Center (FC) and K sparsely distributed Secondary Users (SU). The secondary user continuously detects N frequency points and sends the Received Signal Strength (RSS) to a fusion center, and the fusion center processes the collected data to obtain N frequency point data D= { f ₁ ,f ₂ ,...,f _n ,...,f _N }。

The obtained spectrum data is preprocessed by the following steps:

Further, in step S1.1, the power spectral density value of the collected signal is converted into a power value that can be weighted and averaged using the following formula:

in step S1.2, let H be 100, that is, take 100 continuous data in the time dimension for weighted average, and calculate to obtain the final spectrum data by the following formula:

DBm＝10log ₁₀ (mW)。

for deep neural network structures, the deeper level networks, although capable of further feature extraction, can also inevitably present problems of gradient extinction or gradient explosion. Thus, after the weighted average of the data is completed, we subject the input data to an initial normalization operation.

Example III

The autocorrelation coefficient refers to the degree of correlation between variables at different points in time, and if the autocorrelation coefficient is greater than 0, it means that the value observed at this point in time may be observed at a later point in time. The greater the autocorrelation coefficient, the greater the correlation between sequences, meaning that it is more likely that spectral prediction will be possible with the spectral data of historical time slots. In this embodiment, therefore, the autocorrelation coefficients of the spectrum signal are calculated in step S2 using the following formula:

the autocorrelation coefficient gradually decreases with the step of lag time, and in order to balance the computational cost and the predictive effect, we choose a corresponding sliding window size with an autocorrelation coefficient greater than 0.8 as step c of the input data, where c may be 20.

Further, to adapt the model for timing prediction, a sliding window of length c+1 is used for the spectral data f _n Dividing the first c data s _t ＝{x _t-c+1 ,x _t-c+2 ,...,x _t As input data, c+1th data as data y to be predicted _t ＝x _t+1 The data of the nth frequency point is converted intoWhere m=t-c, T is the spectral data f _n Is a combination of the total length of (a) and (b).

Example IV

In this embodiment, in step S3, the time convolution network is subjected to self-migration optimization as a teacher model, a dual-branch neural network is constructed as a student model, and a spectrum prediction network model based on a knowledge distillation strategy is constructed according to the teacher model and the student model.

Specifically, the method for performing self-migration optimization on the time convolution network as a teacher model comprises the following steps: the collected frequency spectrum data is used for pre-training the time convolution network to obtain the time convolution network with high accuracy and huge parameter quantity; and freezing part of the layers of the convolutional neural network to be used as a network to be trained, and training the unfrozen layers of the network again.

The method for constructing the double-branch neural network as the student model by combining the encoder structure comprises the following steps: to more fully extract spectral data features, a dual-branch neural network is constructed herein. The network consists of two branches, namely a coding-reconstruction branch f _rec And coding-prediction branch f _pre . Coding-reconstruction branch f _rec And coding-prediction branch f _pre With a corresponding loss function for each branch, and the optimized network parameters are trained by minimizing the loss function.

The coding-reconstruction compresses the data into a coding form, and then the data is reconstructed into an output close to the original data through a decoder, so that a model learns a low-dimensional representation of the data, and a loss function calculation formula of a coding-reconstruction branch is as follows:

wherein θ ₁ θ, the parameter of the encoder ₂ For the reconstructor parameters s _t ' output for characteristics of middle layer of teacher network, s _t Is a true value label, f _rec (s _t ′；θ ₁ ,θ ₂ ) Is the final output of the code-reconstruction branch.

The coding-prediction branch shares the coding-reconstructed encoder and predicts the output value. The branch receives data s _t ' predicting future spectral value f _pred (s _t ；θ ₁ ,θ ₃ ). The loss function calculation formula of the coding-prediction branch is:

Coding reconstruction branches may help models learn the intrinsic representation of the data and reduce noise and redundancy of the input data. By sharing the encoder output, the reconstructor and predictor can use the low-dimensional representation of the data simultaneously, thereby improving training efficiency and generalization performance of the model.

As shown in fig. 2, the frame mainly includes: the soft target loss is used to measure the difference between the output of the student model and the output of the teacher model, with the aim of preserving the knowledge of the teacher model. The teacher model output result is called a soft label and is used as a target variable of the student model.

wherein f ^T (. Cndot.) is the teacher model, f ^T (s _t ) For soft target, teacher model outputs result, f ^S (. Cndot.) is a student model, f ^s (s _t ') is the predicted outcome of the time convolution network intermediate layer output as input to the student model.

The hard target loss is used to measure the difference between the output of the student model and the real label, aiming at improving the prediction accuracy of the student model. The real data is called hard tags as target variables of the student model.

L _hard ＝L _rec +L _pre 。

in step S4, the method for training the spectrum prediction network model by using the training set data includes: randomly initializing trainable parameters of the double-branch neural network by receiving the file; setting iteration times and learning rate, and taking an RMSprop optimization algorithm as a network training optimizer; let the value of the iteration number epoch equal to 1. Then inputting training data into a frequency spectrum prediction network model in batches for training, and back-propagating training errors of each batch to optimize network parameters; the error loss is calculated as follows using the constructed objective loss function:

L _total ＝αL _soft +(1-α)L _hard ，

when all the batches of data in the training data are back-propagated, the epoch=epoch+1 is made, and back-propagation is continued until the epoch reaches the maximum iteration number. And finally, inputting the test set data into a network, and outputting a prediction result.

The effects of the present application will be further described by simulation experiments.

The simulation experiments of the present application were performed on a simulation platform of python3.7, tensorFlow 2.3.0. The model of the CPU of the computer is Ruilong R7-6800H, and an independent display card with the model of NVIDIADeforceRTX2050 is mounted.

The input step size of the network is determined to be 20 by calculating the autocorrelation coefficients of the spectral data. The maximum iteration number of the network is 50, the learning rate is 0.01, the RMSprop optimization algorithm is selected as a network training optimizer, and the data volume of each batch is 12.

Fig. 3 is a graph comparing the predictive effect on root mean square error for a self-migrating teacher model and prior art using the present application. The cavitation factor of the Time Convolutional Network (TCN) is set to d= [1,2,4,8,16], and the network structure includes an input layer, a one-dimensional convolutional layer, 5 residual blocks, a full-connection layer and a final regression output layer. Each one-dimensional convolution layer is assigned 64 convolution kernels, the convolution kernel size is set to 3, and the loss factor of the space loss layer is 0.05. The RMSprop optimizer is chosen for back propagation. Through multiple experiments, the RMSE value of the TCN model of the 27 layers before freezing was found to be the smallest, and therefore this model was selected as the teacher model of the present application.

The abscissa in fig. 3 is a different network model, and the ordinate represents the predicted root mean square error. As can be seen from the graph, compared with the original TCN network, the method provided by the application has a certain performance improvement, and the performance improvement is respectively improved by 5.8% and 4.2% on the evaluation indexes of RMSE and MAE. Meanwhile, the TCN network can be found to show obvious advantages in the aspect of processing the problem of predicting the frequency spectrum data, and compared with a long and short time memory neural network (LSTM) and a CNN-LSTM, the optimized TCN is respectively reduced by 19.5 percent and 18.8 percent in RMSE. The method also reduces the MAE by 14.6% compared with LSTM. This shows that the method is more suitable for processing the problem of spectrum prediction, and the TCN network model based on self-migration can further improve the network prediction precision.

Tables 1 and 2 are graphs comparing the root mean square error and the average absolute error of the knowledge distillation spectrum prediction of the present application with those of the prior art at different frequency bands, respectively. Compared with other algorithms, the algorithm provided by the application has the advantages that the parameters to be trained are obviously reduced, the maximum is 133,132 parameters, and the minimum is 22,200 parameters. Compared with the double-branch neural network, the model only increases 2816 parameters to be trained, but the prediction performance is obviously improved. We found that the proposed dual-branched neural network possesses a relatively low RMSE, but is higher on the MAE. The most likely reason is that there are extreme outliers between the predicted and actual values, as MAEs are more sensitive to outliers. However, when we add knowledge distillation technology, namely the method proposed by the application, the model has higher precision and more stable performance. Although TCN-KD is not as good as the teacher model TCN in prediction effect, the prediction error is not much different and the parameter quantity is obviously reduced. The lower prediction error of TCN-KD compared to other networks in the table suggests that knowledge distillation techniques can help TCN-KD improve model predictive performance while maintaining model simplicity.

Table 1 RMSE comparison for each band

TABLE 2 MAE comparison for each band

Fig. 4 is a graph of network training speed versus the prior art using the present application, with the abscissa in fig. 4 being the number of training sessions and the ordinate representing the loss function value. The straight line marked by x represents the convergence curve adopting the method, the straight line marked by diamond represents the training convergence curve of CNN-LSTM, the straight line marked by x represents the training convergence curve of the gate control circulation unit, and the straight line marked by rectangle represents the training convergence curve of the long-short-time memory neural network. By comparing the four methods under the condition of small samples to obtain a training speed convergence curve, the training speed of the method is obviously faster than that of other algorithms. The method of the application completes convergence only in about 10 training periods, while the long and short memory neural network needs to complete convergence in 16 periods.

Fig. 5 is a graph of the predictive effect of the present application with a small sample of the prior art. The abscissa of fig. 5 represents time slots, and the ordinate represents power spectral density values. Because of the large fluctuation of the data, only the true value, the gating circulation unit and the algorithm provided by the application are shown. The algorithm provided by the application can capture a new boundary with more complex spectrum data with the help of knowledge distillation technology, is more consistent with the actual data fluctuation condition, and has larger error in the gating cycle unit.

The application also provides a knowledge distillation-based spectrum prediction system, which comprises a fusion center, a processing module and a prediction module; the processing module is used for preprocessing the frequency spectrum data received by the fusion center and determining the step length of the input data according to the time domain autocorrelation of the acquired frequency spectrum data; the prediction module is used for constructing a spectrum prediction network model based on a knowledge distillation strategy, training the spectrum prediction network model by utilizing training set data and judging whether training is completed or not; if yes, inputting training set data into the frequency spectrum prediction network model after training is completed, obtaining output of a prediction result and ending a training flow; if not, after the iteration times of training are added by one, training the frequency spectrum prediction network model by utilizing the training set data.

The application also proposes a computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the knowledge distillation based spectrum prediction method in the above embodiments when executing the computer program.

The application also provides a computer storage medium, wherein the storage medium stores a computer program, and the computer program realizes the spectrum prediction method based on knowledge distillation in the embodiment when being executed by a processor.

The application and its embodiments have been described above by way of illustration and not limitation, and the application is illustrated in the accompanying drawings and described in the drawings in which the actual structure is not limited thereto. Therefore, if one of ordinary skill in the art is informed by this disclosure, the structural mode and the embodiments similar to the technical scheme are not creatively designed without departing from the gist of the present application.

Claims

1. A knowledge distillation-based spectrum prediction method, comprising the steps of:

s1, preprocessing received spectrum data;

2. The knowledge-based distillation spectrum prediction method according to claim 1, wherein preprocessing the received spectrum data comprises the steps of:

3. The knowledge-based distillation spectrum prediction method according to claim 2, wherein in S1.1, the power spectral density values of the collected signals are converted into power values that can be weighted averaged using the following formula:

DBm＝10log ₁₀ (mW)。

4. the knowledge-based distillation spectrum prediction method according to claim 1, wherein in S2, the autocorrelation coefficients of the spectrum signal are calculated using the following formula:

5. The knowledge distillation based spectrum prediction method according to claim 4 wherein: when the adaptive model performs time sequence prediction, a sliding window with the length of c+1 is used for the spectrum data f _n Dividing the first c data s _t ＝{x _t-c+1 ,x _t-c+2 ,...,x _t As input data, c+1st dataData y to be predicted _t ＝x _t+1 The data of the nth frequency point is converted intoWhere m=t-c, T is the spectral data f _n Is a combination of the total length of (a) and (b).

6. The knowledge-based distillation spectrum prediction method according to claim 1, wherein: and S3, performing self-migration optimization on the time convolution network to serve as a teacher model, constructing a double-branch neural network to serve as a student model, and constructing a spectrum prediction network model based on a knowledge distillation strategy according to the teacher model and the student model.

7. The knowledge distillation based spectrum prediction method according to claim 6 wherein the method of self-migration optimization of a time convolution network as a teacher model is: the collected frequency spectrum data is used for pre-training the time convolution network to obtain the time convolution network with high accuracy and huge parameter quantity; and freezing part of the layers of the convolutional neural network to be used as a network to be trained, and training the unfrozen layers of the network again.

8. The knowledge distillation based spectrum prediction method according to claim 7, wherein the method for constructing a dual-branch neural network as a student model by combining an encoder structure is:

wherein θ ₁ θ, the parameter of the encoder ₂ For the reconstructor parameters s _t ^′ Output for characteristics of middle layer of teacher network s _t Is a true value label, f _rec (s _t ^′ ；θ ₁ ,θ ₂ ) The final output of the encode-reconstruct branch;

the loss function calculation formula of the coding-prediction branch is:

9. The knowledge distillation based spectrum prediction method according to claim 8, wherein the method for constructing a knowledge distillation policy based spectrum prediction network model according to the teacher model and the student model is:

wherein f ^T (. Cndot.) is the teacher model, f ^T (s _t ) For soft target, teacher model outputs result, f ^S (. Cndot.) is a student model, f ^S (s _t ^′ ) The output of the middle layer of the time convolution network is used as a prediction result of the input of the student model;

L _hard ＝L _rec +L _pre 。

10. the knowledge-based distillation spectrum prediction method according to claim 9, wherein in S4, the method for training the spectrum prediction network model by using training set data is: randomly initializing trainable parameters of the dual-branch neural network; setting iteration times and learning rate, and taking an RMSprop optimization algorithm as a network training optimizer; let the value of the iteration number epoch equal to 1; inputting training data into a frequency spectrum prediction network model in batches for training, and back-propagating training errors of each batch to optimize network parameters; the error loss is calculated as follows using the constructed objective loss function:

L _total ＝αL _soft +(1-α)L _hard ，

11. A knowledge distillation-based spectrum prediction system, characterized by: the system comprises a fusion center, a processing module and a prediction module; the processing module is used for preprocessing the frequency spectrum data received by the fusion center and determining the step length of the input data according to the time domain autocorrelation of the acquired frequency spectrum data;

12. A computer device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, characterized by: the processor, when executing the computer program, implements a knowledge distillation based spectrum prediction method as claimed in any one of claims 1-10.

13. A computer storage medium storing a computer program, characterized in that: the computer program, when executed by a processor, implements a knowledge distillation based spectrum prediction method as claimed in any of claims 1-10.