CN110311744A

CN110311744A - A Channel Environment Adaptive Spectrum Sensing Method Based on Catboost Algorithm

Info

Publication number: CN110311744A
Application number: CN201910618558.9A
Authority: CN
Inventors: 邢焕来; 王成玮; 罗寿西; 詹大为; 戴朋林
Original assignee: Southwest Jiaotong University
Current assignee: Southwest Jiaotong University
Priority date: 2019-07-10
Filing date: 2019-07-10
Publication date: 2019-10-08

Abstract

The invention discloses a channel environment adaptive spectrum sensing method based on the Catboost algorithm. The specific steps are: 1. The secondary user collects the energy value in the current channel environment and sends it to a secondary user as a fusion center; 2. The primary user is intermittent 3. The fusion center constructs the information sent by the primary user and the secondary user into a data set, and further constructs a feature vector set; 4. The fusion center uses the Catboost algorithm to train the model; 5. The secondary user continues to send the energy value to the fusion center as a test vector and input it into the trained model; 6. After the fusion center gets the result, it sends whether the channel resource is available to all secondary users, and the secondary user responds according to the judgment of the fusion center ; When the present invention satisfies the false alarm rate of 0.1, the detection rate is increased by 10% compared with SVM, and the misclassification rate and misclassification risk are also significantly reduced.

Description

A Channel Environment Adaptive Spectrum Sensing Method Based on Catboost Algorithm

技术领域technical field

本发明属于无线通信与人工智能技术领域。具体涉及一种基于Catboost算法的信道环境自适应频谱感知方法。The invention belongs to the technical field of wireless communication and artificial intelligence. Specifically, it relates to a channel environment adaptive spectrum sensing method based on a Catboost algorithm.

背景技术Background technique

无线传感网络是一种无线自组织和以数据为中心的网络，其是由大量具备计算和通信能力的微传感节点组成。由于无线传感网络低开销低功耗的特点，其已经在工业农业等领域得到应用。然而随着无线通信技术的发展，频谱资源稀缺成为当前无线传感网络面临最大的挑战。现在主要的无线传感网络频段是2.4GHz，在工业领域如无线HART WIA-PA和ISA100.11是基于物理层的IEEE 802.15.4。广泛使用的短距离通信技术像ZigBee、蓝牙和wifi都是工作在此频段。因此这个频段的大量使用造成了信道拥塞和不可避免的干扰。因此，认知无线网络技术被提出来解决频谱资源稀缺的问题。认知无线网络技术是智能的无线通信技术，可以感知其周围电磁环境并从中学习，然后做出针对电磁环境变化下对自身工作参数状态的调整。频谱感知是其检测主用户信号的基本方式。OPAwe等人提出使用SVM算法用于在样本协方差矩阵中进行特征值估计的频谱感知方法，其中感知用户设备是多天线设备。作者还提出了一种在慢衰落信道下使用卡尔曼滤波器信道估计来进行频谱感知的方法。针对此问题已经有大量的研究和工作，W.Zhang等人注重解决多径衰落、阴影衰落和隐藏终端问题，其主要方式包括不同的合作频谱感知方法和优化的感知方法。此外，Umebayashi等人提出利用优化的门限设置方式去提高检测性能，但是其需要之前状态的先验信息。B.L.Mark等人主要使用不同的评估算法来估算主用户发射机的位置进而决定主用户和次用户之间的空间位置关系然后提高频谱资源空间方面的利用率。C.Liu等人提出转化主用户信号检测方式即利用中心对称特征来提高空间利用率。近些年逐渐有使用机器学习算法解决认知无线网络频谱感知问题的思路，并做出了一些研究成果。比如使用基于能量检测法用SVM算法，相比于其他机器学习算法在频谱感知分类问题上有着较佳的性能，其由于极高的分类准确率所以非常流行和实用。但这些工作都一定程度上提高了频谱感知的检测率，但是还有需要继续改进提升的地方，首先检测率还可以继续提高，其次误分类率和误分类风险率也需要继续提高。本发明针对的问题就是在这样的背景下进行的。A wireless sensor network is a wireless self-organizing and data-centric network, which is composed of a large number of micro-sensing nodes with computing and communication capabilities. Due to the characteristics of low overhead and low power consumption of wireless sensor network, it has been applied in fields such as industry and agriculture. However, with the development of wireless communication technology, the scarcity of spectrum resources has become the biggest challenge for wireless sensor networks. Now the main wireless sensor network frequency band is 2.4GHz, in the industrial field such as wireless HART WIA-PA and ISA100.11 is based on the IEEE 802.15.4 of the physical layer. Widely used short-range communication technologies like ZigBee, Bluetooth and wifi all work in this frequency band. Therefore, the heavy use of this frequency band has caused channel congestion and inevitable interference. Therefore, cognitive wireless network technology is proposed to solve the problem of scarcity of spectrum resources. Cognitive wireless network technology is an intelligent wireless communication technology that can perceive and learn from its surrounding electromagnetic environment, and then make adjustments to its own working parameter status in response to changes in the electromagnetic environment. Spectrum sensing is the basic way to detect primary user signals. OPAwe et al. proposed a spectrum sensing method using the SVM algorithm for eigenvalue estimation in the sample covariance matrix, where the sensing user equipment is a multi-antenna device. The authors also propose a method for spectrum sensing using Kalman filter channel estimation in slow fading channels. There have been a lot of research and work on this problem. W. Zhang and others have focused on solving the problems of multipath fading, shadow fading and hidden terminals. The main methods include different cooperative spectrum sensing methods and optimized sensing methods. In addition, Umebayashi et al. proposed to use an optimized threshold setting method to improve detection performance, but it requires prior information of the previous state. B.L.Mark et al. mainly use different evaluation algorithms to estimate the position of the primary user transmitter and then determine the spatial position relationship between the primary user and the secondary user and then improve the utilization rate of the spectrum resource space. C.Liu et al proposed to transform the primary user signal detection method, that is, to use the central symmetry feature to improve the space utilization. In recent years, there has gradually been an idea of using machine learning algorithms to solve the problem of spectrum sensing in cognitive wireless networks, and some research results have been made. For example, using the SVM algorithm based on the energy detection method has better performance on spectrum sensing classification problems than other machine learning algorithms, and it is very popular and practical due to its extremely high classification accuracy. However, these works have improved the detection rate of spectrum sensing to a certain extent, but there is still room for improvement. First, the detection rate can continue to increase, and secondly, the misclassification rate and misclassification risk rate also need to continue to increase. The problem addressed by the present invention is carried out under such background.

发明内容Contents of the invention

为解决上述问题，本发明提供一种基于Catboost算法的信道环境自适应频谱感知方法，具体包括以下步骤：In order to solve the above problems, the present invention provides a channel environment adaptive spectrum sensing method based on Catboost algorithm, which specifically includes the following steps:

步骤1：次用户前端能量采集设备采集当前信道环境中的能量值，并将感知周期内的能量值发到作为融合中心的一个次用户。这里采用了能量检测方法对信道环境进行计算，其原理是在主用户上线或者下线时，次用户前端能量采集设备采集到的能量值在统计特征上会有不同，能量检测法不需要先验信息，不会像匹配滤波法一样需要针对每种信号需要专门的接收设备，也不像循环谱技术需要大的计算量。Step 1: The front-end energy collection device of the secondary user collects the energy value in the current channel environment, and sends the energy value in the sensing period to a secondary user as a fusion center. Here, the energy detection method is used to calculate the channel environment. The principle is that when the primary user goes online or offline, the energy values collected by the front-end energy collection equipment of the secondary user will have different statistical characteristics, and the energy detection method does not need a priori Information does not require special receiving equipment for each signal like the matched filter method, nor does it require a large amount of calculation like the cyclic spectrum technology.

步骤2：主用户间断性地将占用频道资源情况发送到融合中心。Step 2: The primary user intermittently sends channel resource occupancy to the fusion center.

步骤3：融合中心将主用户和次用户发来的信息构造成数据集，并进一步构建特征向量集。使用Catboost这种机器学习算法和能量检测法作为基本感知方式，所以自然而然地想到使用次用户采集到的能量值作为特征。Step 3: The fusion center constructs the information sent by the primary user and the secondary user into a data set, and further constructs a feature vector set. Using a machine learning algorithm such as Catboost and an energy detection method as the basic perception method, it is natural to think of using the energy value collected by the secondary user as a feature.

步骤4：融合中心用Catboost算法训练模型。Step 4: The Fusion Center trains the model with the Catboost algorithm.

步骤5：次用户继续将能量值发送到融合中心，作为测试向量并输入进已训练模型。Step 5: The secondary user continues to send the energy value to the fusion center as a test vector and input it into the trained model.

步骤6：融合中心得到结果后将是否可用信道资源发送给所有次用户，次用户根据融合中心的判断来做出反应。Step 6: After obtaining the result, the fusion center sends whether channel resources are available to all secondary users, and the secondary users respond according to the judgment of the fusion center.

上述步骤1中次用户和主用户设备上都会包含信号发射机和接收机，同时次用户还包含前端感知设备。在本发明中重点关注在频谱感知阶段，对主用户是否发射信号进行判断，即模型简化为次用户的感知设备对主用户的发射机发射信号的感知。本发明采用了较为实际的非平衡样本，比例为7：1；而之前基于机器学习算法进行频谱感知的工作中正负样本是较为均衡的。In the above step 1, both the secondary user and the primary user equipment include signal transmitters and receivers, and the secondary user also includes front-end sensing equipment. In the present invention, the focus is on judging whether the primary user transmits a signal in the spectrum sensing stage, that is, the model is simplified to the perception of the signal transmitted by the primary user's transmitter by the sensing device of the secondary user. The present invention uses more realistic unbalanced samples with a ratio of 7:1; while the positive and negative samples in the previous spectrum sensing work based on machine learning algorithms are relatively balanced.

上述步骤3具体为：The above step 3 is specifically:

3.1：数据设定和能量归一化：3.1: Data setting and energy normalization:

使用能量检测法作为频谱感知基本手段，在系统中包含P个主用户，记为p＝1,2,...P和Q个次用户q＝1,2,...Q，为了考虑更加通用的模型，所以P至少是大于等于2的，Q可以选择数量较多。假定主用户和次用户共享频带资源同时不会造成干扰，在系统中主用户的工作状态只有两种：上线S_p＝1或者下线S_p＝0；主用户上线时其占用了频谱资源，次用户不能使用；主用户下线时其释放频谱资源，此时次用户可以使用频谱资源；系统中只要有一个主用户占用频谱资源，那么认定为次用户不允许再使用频谱资源；用g_p代表PU的地理位置，g_q代表SU的地理位置。Using the energy detection method as the basic means of spectrum sensing, the system contains P primary users, denoted as p=1,2,...P and Q secondary users q=1,2,...Q, in order to consider more A general model, so P is at least greater than or equal to 2, and Q can choose a larger number. Assuming that the primary user and the secondary user share frequency band resources without causing interference, there are only two working states of the primary user in the system: online S _p =1 or offline S _p =0; when the primary user goes online, it occupies spectrum resources, The secondary user cannot use it; when the primary user goes offline, it releases the spectrum resource, and the secondary user can use the spectrum resource at this time; as long as there is a primary user occupying the spectrum resource in the system, it is considered as the secondary user and is not allowed to use the spectrum resource; use g _p Represents the geographic location of the PU, and g _q represents the geographic location of the SU.

每个SU的能量检测器在感知时间周期τ内采样wτ个复基带信号样本，带宽表示为w；R_q(i)代表SU接收到的第i个信号样本，用如下公式表示：The energy detector of each SU samples wτ complex baseband signal samples within the sensing time period τ, and the bandwidth is denoted as w; R _q (i) represents the i-th signal sample received by the SU, expressed by the following formula:

在这里H₀代表信道中没有PU，所以SU感知设备收到的只是热噪音，用N_q(i)表示；H₁代表至少有一个PU上线时的情况，W_p(i)代表PUp的发射信号样本，h_p,q代表PUp和SUq之间的信道增益，S_p即为PU的工作状态。此外，SU应该在感知时间段内做出正确决策。本发明为基于机器学习算法的能量检测频谱感知方法，在机器学习中，提取特征是第一步。用Y_q代表SUq收到的归一化能量水平：Here H ₀ means that there is no PU in the channel, so what the SU perceives the device to receive is only thermal noise, denoted by N _q (i); H ₁ represents the situation when at least one PU is online, W _p (i) represents the emission of PU Signal samples, h _p,q represent the channel gain between PUp and SUq, and S _p is the working state of the PU. Furthermore, SU should make correct decisions within the perception time period. The invention is an energy detection spectrum sensing method based on a machine learning algorithm. In machine learning, feature extraction is the first step. Denote by Y _q the normalized energy level received by SUq:

此处η为噪声功率谱密度定义为η＝E[|N_q(i)|²]；因此，能量向量包含所有SU接收到的能量水平：Here η is the noise power spectral density defined as η=E[|N _q (i)| ² ]; thus, the energy vector contains the energy levels received by all SUs:

Y＝(Y₁,...,Y_Q)^T (3)Y＝(Y ₁ ,...,Y _Q ) ^T (3)

3.2在获得能量向量后，进一步分析其分布；3.2 After obtaining the energy vector, further analyze its distribution;

因为PU的工作模式，每个能量值Y_q服从非中心的卡方分布，自由度和非中心参数如下：Because of the working mode of the PU, each energy value Y _q obeys a non-central chi-square distribution, and the degrees of freedom and non-central parameters are as follows:

r＝2wτ (4)r=2wτ (4)

这里是PUp的固定发射功率，l_p,q＝|h_p,q|²是功率衰减，其计算公式如下：here is the fixed transmission power of the PUp, l _p,q = |h _p,q | ² is the power attenuation, and its calculation formula is as follows:

l_p,q＝PL(||g_q-g_p||).ν_p,qψ_p,q (6)l _p,q ＝PL(||g _q -g _p ||).ν _p,q ψ _p,q (6)

这里||.||代表欧氏距离，PL(dist)＝dist^-θ代表关于距离dist和损失系数θ的路径损失；ν_p,q和ψ_p,q分别代表多径衰落和阴影衰落；PU和SU满足802.22协议；此外，在感知时间段内衰落系数ν_p,q和ν_p,q不变为准静态，即为1；Here ||.|| represents Euclidean distance, PL(dist)=dist ^-θ represents the path loss about distance dist and loss coefficient θ; ν _p,q and ψ _p,q represent multipath fading and shadow fading respectively; PU and SU meet the 802.22 protocol; in addition, the fading coefficients ν _p,q and ν _p,q remain quasi-static during the perception period, which is 1;

能量水平的分布我们已经在前面描述，在有足够多的样本时比如wτ，能量值分布基本服从高斯分布；因此能量向量可以从多元高斯分布中提取，其均值和方差如下：The distribution of the energy level has been described above. When there are enough samples such as wτ, the energy value distribution basically obeys the Gaussian distribution; therefore, the energy vector can be extracted from the multivariate Gaussian distribution, and its mean and variance are as follows:

μ_Yq＝r+ζ_q (7)μ _Yq ＝r+ζ _q (7)

σ² _Yq＝2(r+2ζ_q) (8)σ ² _Yq ＝2(r+2ζ _q ) (8)

因此能量向量的均值向量和协方差矩阵如下：So the mean vector and covariance matrix of the energy vector are as follows:

μ_Yq＝(μ_Y1,...,μ_YQ)^T (9)μ _Yq ＝(μ _Y1 ,...,μ _YQ ) ^T (9)

上述Catboost算法具体为：在感知周期内，次用户把感知到的信道中能量值发送到融合中心作为特征能量向量，主用户间断性发送占用频谱资源与否的信息给融合中心作为标签，这样完成了训练数据集的构建。在融合中心中用Catboost算法训练模型，现在介绍一下Catboost算法：Catboost算法由Yandex提出，该算法优化了类别特征的处理，并且是在训练阶段处理而不是数据预处理阶段，该算法另一个优点是使用了一种新方法在选择树模型的时候计算叶子节点值，这帮助减少了过拟合。Catboost算法有CPU和GPU两种运行方法，GPU方法比当前最流行的Xgboost的GPU方式还要快，其CPU方式也是如此。Catboost算法和梯度提升方法一样，也是构建新树去拟合当前模型的残差。然而传统的梯度提升方法都会收到有偏点梯度估计的影响容易导致过拟合。每次使用相同的数据点评估梯度，这样模型得以建立。这会导致相比于真正梯度分布空间，待拟合梯度的特征空间分布会偏移，这样会导致过拟合。很多GBDT方法(如Xgboost)构建下一个树的方法主要包括两步：选择树架构和设置叶子的值。为了选择最优的数结构，算法会进行不同的拆分枚举、构建树、设置叶子值、评分并选择分裂方式。在这两个阶段叶子值都会被计算来拟合梯度。Catboost在第二阶段和传统方法相同，但第一阶段使用了改进方法。用F^k代表在构建的第一个k树，g^k(X_h,Z_h)代表在第h个训练样本构建k树梯度值构建的模型。为了使梯度g^k(X_h,Z_h)无偏，我们需要在没有X_h下训练模型F^k，这样使训练过程无法实施，考虑如下技巧来解决这个问题：对于每一次实现X_h，我们训练一个模型M_h，不用梯度估计的方式去更新。用M_h在X_h的基础上拟合梯度，使用这种方式来得到评分。Catboost在训练集中产生了s个随机排列，其使用多种排列采样来获得残差的梯度来强化算法的鲁棒性，使用不同的排列来训练模型，然后使用多种排列来避免过拟合。对于每个排列σ，训练n个模型M_k。这意味着构建新树的时候需要存储并重新计算O(n²)来拟合排列σ，对于每个模型M_k，需要更新M_k(X₁),...,M_k(X_k)，所以计算复杂度为O(sn²)，在实现过程中使用了一个重要技巧来把每个树构建的时间复杂度降到O(sn)：对于每一个排列，不是去存储和更新时间复杂度O(n²)的值M_k(X_j)，而是保持值M_k'(X_j),k＝1,...,[log₂(n)],j＜2^k+1，这里M_k(X_j)是基于第一个2^k样本的样本j的拟合近似。然后，预测的M_k(X_j)会低于在X_h上的梯度被用于选择树结构。The above-mentioned Catboost algorithm is specifically as follows: in the sensing period, the secondary user sends the perceived energy value in the channel to the fusion center as a feature energy vector, and the primary user intermittently sends information about whether spectrum resources are occupied or not to the fusion center as a label, thus completing construction of the training data set. The Catboost algorithm is used to train the model in the fusion center. Now introduce the Catboost algorithm: The Catboost algorithm is proposed by Yandex. This algorithm optimizes the processing of category features and is processed in the training phase instead of the data preprocessing phase. Another advantage of this algorithm is that A new method is used to calculate leaf node values when selecting a tree model, which helps reduce overfitting. The Catboost algorithm has two running methods: CPU and GPU. The GPU method is faster than the most popular Xgboost GPU method at present, and the CPU method is also the same. The Catboost algorithm, like the gradient boosting method, also builds a new tree to fit the residual of the current model. However, traditional gradient boosting methods are affected by biased point gradient estimates, which can easily lead to overfitting. The gradient is evaluated each time using the same data points so that the model is built. This will cause the feature space distribution of the gradient to be fitted to be shifted compared to the true gradient distribution space, which will lead to overfitting. Many GBDT methods (such as Xgboost) to build the next tree mainly include two steps: selecting the tree structure and setting the value of the leaf. In order to choose the optimal number structure, the algorithm will perform different split enumerations, build trees, set leaf values, score and choose splitting methods. In both stages leaf values are computed to fit the gradient. Catboost is the same as the traditional method in the second stage, but the improved method is used in the first stage. Use F ^k to represent the first k-tree constructed, and g ^k (X _h , Z _h ) to represent the model constructed by the gradient value of the k-tree constructed in the hth training sample. In order to make the gradient g ^k (X _h , Z _h ) unbiased, we need to train the model F ^k without X _h , which makes the training process impossible. Consider the following trick to solve this problem: For each realization of X _h , we Train a model M _h and update it without gradient estimation. Use M _h to fit the gradient on the basis of X _h , and use this method to get the score. Catboost generates s random permutations in the training set. It uses a variety of permutations to obtain the gradient of the residual to enhance the robustness of the algorithm, uses different permutations to train the model, and then uses multiple permutations to avoid overfitting. For each permutation σ, n models M _k are trained. This means that when building a new tree, you need to store and recalculate O(n ² ) to fit the permutation σ. For each model M _k , you need to update M _k (X ₁ ),...,M _k (X _k ) , so the computational complexity is O(sn ² ), an important trick is used in the implementation process to reduce the time complexity of each tree construction to O(sn): for each permutation, instead of storing and updating time-complex The value M _k (X _j ) of degree O(n ² ), but keeps the value M _k '(X _j ), k=1,...,[log ₂ (n)], j<2 ^k+1 , Here M _k (X _j ) is a fitted approximation of sample j based on the first ^2k samples. Then, the predicted M _k (X _j ) will be lower than The gradient on _Xh is used to select the tree structure.

与现有技术相比，本发明的有益效果：Compared with prior art, the beneficial effect of the present invention:

本发明在使用满足IEEE 802.11要求的虚警率0.1的情况下，检测率相比SVM提升10％，整体分类性能优于SVM，同时误分类率、误分类风险也显著下降，此外通过实验还得知主用户发射功率不同即次用户在不同信噪比情况下本发明性能也基本较SVM算法强，即不同信道环境下机器学习模型可以重训练以适应当前信道，有着更强的实用性，即使在信噪比较低的情况下性能表现也较好，鲁棒性较强。对于频谱感知应用来说，意义重大。When the present invention uses a false alarm rate of 0.1 that meets the requirements of IEEE 802.11, the detection rate is increased by 10% compared with SVM, the overall classification performance is better than SVM, and the misclassification rate and misclassification risk are also significantly reduced. Knowing that the transmission power of the primary user is different, that is, the performance of the present invention is basically stronger than that of the SVM algorithm under different signal-to-noise ratios for the secondary user, that is, the machine learning model under different channel environments can be retrained to adapt to the current channel, which has stronger practicability, even In the case of low signal-to-noise ratio, the performance is also better, and the robustness is stronger. It is of great significance for spectrum sensing applications.

附图说明Description of drawings

图1是本发明基于机器学习的频谱感知方法步骤图。Fig. 1 is a step diagram of the spectrum sensing method based on machine learning in the present invention.

图2是7*7合作频谱感知系统结构模型。Figure 2 is a structural model of the 7*7 cooperative spectrum sensing system.

图3是Catboost算法与SVM算法(线性核函数)在7*7模型的ROC曲线。Figure 3 is the ROC curve of the Catboost algorithm and the SVM algorithm (linear kernel function) in the 7*7 model.

具体实施方式Detailed ways

下面结合附图来对本发明的实施作进一步的说明。The implementation of the present invention will be further described below in conjunction with the accompanying drawings.

本发明基于Catboost算法的信道环境自适应频谱感知方法，其系统结构框架图如附图1所示。The channel environment adaptive spectrum sensing method based on the Catboost algorithm in the present invention has a system structure frame diagram as shown in Fig. 1 .

下面结合附图2说明本发明中基于地理位置的合作频谱感知模型，结合图3说明本发明相比于此前表现最好的SVM算法在PU不同发射功率下ROC曲线、检测率、误分类风险以及误分类率的性能表现。仿真是用python3.6.2在64位PC，内存RAM 16G，六核i7(3.2GHz)环境下进行的。The cooperative spectrum sensing model based on geographic location in the present invention will be described below in conjunction with accompanying drawing 2, and the ROC curve, detection rate, misclassification risk and Performance on misclassification rate. The simulation is carried out in a 64-bit PC with python3.6.2, memory RAM 16G, six-core i7 (3.2GHz) environment.

本发明比较的性能指标如下：The performance index that the present invention compares is as follows:

a)ROC曲线(Receiver operating characteristic curve)，结果为实验独立运行200次后平均曲线，此指标体现了算法的整体分类性能。a) ROC curve (Receiver operating characteristic curve), the result is the average curve after running the experiment 200 times independently, this indicator reflects the overall classification performance of the algorithm.

b)检测率(Detection Probability)，授权用户上线时非授权用户成功检测到的概率，结果为实验运行200次的平均检测率。b) Detection Probability, the probability that an unauthorized user is successfully detected when an authorized user goes online, and the result is the average detection rate of 200 experiments.

c)误分类风险(Misclassification Risk)，授权用户上线时分类器给出授权用户下线标签的概率，结果为实验运行200次的平均误分类风险率。c) Misclassification Risk, when the authorized user goes online, the classifier gives the probability that the authorized user goes offline, and the result is the average misclassification risk rate of 200 experimental runs.

d)误分类率(Misclassification Error Rate)，分类器判断错误的概率，即把授权用户下线判断为上线以及把授权用户上线判断为下线的概率，结果为实验运行200次的平均误分类率。d) Misclassification Error Rate (Misclassification Error Rate), the probability of misjudgment by the classifier, that is, the probability of judging the authorized user offline as online and the authorized user online as offline. The result is the average misclassification rate of 200 experiments. .

实施例Example

为了验证本发明基于Catboost算法解决认知无线网络下的频谱感知问题的可用性和可行性，进行了仿真实验并与SVM算法进行算法性能比较。仿真参数设置如下：感知时间段τ为100μs，带宽为5MHz，噪声功率谱密度为-174dBm，每一个PU发射功率为200mW，路径损失系数为4，多径衰落和阴影衰落系数都为1，每一个PU上线的概率为0.5.SVM的核函数选择为线性核函数，因为在前期的工作中已经证明了线性核函数在此问题的优秀表现。训练向量为160条测试向量为640条。正负样本比例为7:1.在附图2的7*7合作频谱感知系统结构模型中，有49个SU，均匀分布在7*7的网格中，有3个PU分别在(-1100m,-1000m)、(750m，890m)、(1500m，-1000m)位置。在附图3中ROC曲线可以看到，用实线标出SVM算法，用虚线标出了Catboost算法，在虚警率为0.1时，Catboost比SVM检测率提高了10％，整体分类性能优于SVM。In order to verify the usability and feasibility of the present invention based on the Catboost algorithm to solve the spectrum sensing problem under the cognitive wireless network, a simulation experiment is carried out and the algorithm performance is compared with the SVM algorithm. The simulation parameters are set as follows: the perception period τ is 100μs, the bandwidth is 5MHz, the noise power spectral density is -174dBm, the transmit power of each PU is 200mW, the path loss coefficient is 4, and the multipath fading and shadow fading coefficients are both 1. The probability of a PU going online is 0.5. The kernel function of SVM is selected as the linear kernel function, because the excellent performance of the linear kernel function in this problem has been proved in the previous work. There are 160 training vectors and 640 testing vectors. The ratio of positive and negative samples is 7:1. In the structural model of the 7*7 cooperative spectrum sensing system in Figure 2, there are 49 SUs, evenly distributed in the 7*7 grid, and 3 PUs are respectively in (-1100m ,-1000m), (750m, 890m), (1500m, -1000m) positions. As can be seen from the ROC curve in Figure 3, the SVM algorithm is marked with a solid line, and the Catboost algorithm is marked with a dotted line. When the false alarm rate is 0.1, the detection rate of Catboost is 10% higher than that of SVM, and the overall classification performance is better than SVMs.

在PU不同的发射功率下相比于SVM算法，本发明有着更高的检测率、误分类风险和误分类率具体如表1所示：Compared with the SVM algorithm under different transmission powers of the PU, the present invention has a higher detection rate, misclassification risk and misclassification rate as shown in Table 1:

表1 Catboost算法与SVM算法在7*7模型中主用户发射功率不同的情况下性能表现指标Table 1 Performance indicators of Catboost algorithm and SVM algorithm in the case of different primary user transmit power in 7*7 model

随着信噪比提高两个算法指标都会提升，当信噪比较高时Catboost达到收敛。这都证明了本发明的可行性和可用性，可将本发明用于解决认知无线网络下的频谱感知问题。As the signal-to-noise ratio increases, both algorithm indicators will increase, and Catboost will converge when the signal-to-noise ratio is high. All these prove the feasibility and usability of the present invention, and the present invention can be used to solve the spectrum sensing problem under the cognitive wireless network.

Claims

1. a kind of channel circumstance adaptive spectrum cognitive method based on Catboost algorithm, which is characterized in that including following step It is rapid:

Step 1: the energy value in secondary user front end energy acquisition equipment acquisition present channel environment, and the energy in the period will be perceived Magnitude is dealt into one user as fusion center；

Step 2: primary user sends fusion center for occupied channel resource situation by phased manner；

Step 3: the information structuring that fusion center sends primary user and time user is at data set, and further construction feature vector Collection；

Step 4: fusion center Catboost algorithm training pattern；

Step 5: secondary user continues to send fusion center for energy value, as test vector and inputs into training pattern；

Step 6: fusion center obtain after result will whether available channel resources are sent to all secondary users, secondary user is according to fusion The judgement at center is made a response.

2. a kind of channel circumstance adaptive spectrum cognitive method based on Catboost algorithm according to claim 1, It is characterized in that, non-equilibrium sample, ratio 7:1 is used in the step 1；And frequency spectrum is carried out based on machine learning algorithm before Positive negative sample is balanced in the work of perception.

3. a kind of channel circumstance adaptive spectrum cognitive method based on Catboost algorithm according to claim 1, It is characterized in that, the step 3 specifically:

3.1: data setting and energy normalized:

Use energy measuring method as frequency spectrum perception basic means, includes in systems P primary user, be denoted as p=1,2 ... P With Q user q=1,2 ... Q；Primary user and time user sharing band resource will not interfere simultaneously herein, be In system there are two types of the working conditions of primary user: online S_p=1 or offline S_p=0；Which occupies frequency spectrum moneys when primary user is online Source, secondary user cannot use；It discharges frequency spectrum resource when primary user is offline, and frequency spectrum resource can be used in secondary user at this time；System There is a primary user to occupy frequency spectrum resource in as long as, then regarding as time user does not allow to reuse frequency spectrum resource；Use g_pIt represents The geographical location of PU, g_qRepresent the geographical location of SU；

The energy detector of each SU samples w τ complex baseband signal sample in detecting period period tau, and bandwidth is expressed as w；R_q (i) i-th of sample of signal that SU is received is represented, is indicated with following formula:

H herein₀Representing does not have PU in channel, so the only thermal noise that SU awareness apparatus receives, uses N_q(i) it indicates；H₁It represents The case where when at least one PU is online, W_p(i) the transmitting sample of signal of PUp, h are represented_p,qRepresent the channel between PUp and SUq Gain, S_pThe as working condition of PU；Use Y_qIt is horizontal to represent the normalized energy that SUq is received:

Herein η be noise power spectral density be defined as η=E [| N_q(i)|²]；Therefore, energy vectors include what all SU were received Energy level:

Y=(Y₁,...,Y_Q)^T (3)

3.2 after obtaining energy vectors, further analyze its distribution；

Because of the operating mode of PU, each energy value Y_qNon-central chi square distribution is obeyed, freedom degree and non-centrality parameter are as follows:

R=2w τ (4)

HereIt is the fixed transmission power of PUp, l_p,q=| h_p,q|²It is power attenuation, calculation formula It is as follows:

l_p,q=PL (| | g_q-g_p||).ν_p,qψ_p,q (6)

Here | | | | represent Euclidean distance, PL (dist)=dist^-θIt represents and is damaged about the path of distance dist and loss coefficient θ It loses；ν_p,qAnd ψ_p,qRespectively represent multipath fading and shadow fading；PU and SU meets 802.22 agreements；In addition, in detecting period section Interior fading coefficients ν_p,qAnd ν_p,qIt is constant to be quasi-static, as 1；

Reach w τ, energy Distribution value Gaussian distributed in sample size；Therefore energy vectors can be from multivariate Gaussian distribution It extracts, mean value and variance are as follows:

μ_Yq=r+ ζ_q (7)

σ² _Yq=2 (r+2 ζ_q) (8)

Therefore the mean vector of energy vectors and covariance matrix are as follows:

4. a kind of channel circumstance adaptive spectrum cognitive method based on Catboost algorithm according to claim 1, It is characterized in that, the Catboost algorithm training pattern specifically:

The second stage of Catboost is identical with conventional method, and the first stage has used improved method: using F^kIt represents the of building One k tree, g^k(X_h,Z_h) represent the model constructed in h-th of training sample building k tree gradient value；In order to make g^k(X_h,Z_h) nothing Partially, for realizing X each time_h, we train a model M_h, go to update without the mode that gradient is estimated；Use M_hIn X_hBasis Upper fitting gradient makes to be scored in this way；Catboost produces s random alignment in training set, uses Various arrangement sampling strengthens the robustness of algorithm to obtain the gradient of residual error, carrys out training pattern using different arrangements, then Over-fitting is avoided using various arrangement；

For each arrangement σ, n model M of training_k, store when constructing new tree and recalculate O (n²) come be fitted arrangement σ, For each model M_k, need to update M_k(X₁),...,M_k(X_k), so computation complexity is O (sn²), make during realization The time complexity of each tree building is dropped to O (sn) with following method: not being storage and update for each arrangement Time complexity O (n²) value M_k(X_j), but retention value M_k'(X_j), k=1 ..., [log₂(n)], j < 2^k+1, M here_k(X_j) It is based on first 2^kThe fitting of the sample j of sample is approximate；Then, the M of prediction_k(X_j) can be lower than In X_hOn gradient be used to select tree construction.