CN118337514A

CN118337514A - Method and device for detecting intrusion of automobile CAN (controller area network) network, electronic equipment and storage medium

Info

Publication number: CN118337514A
Application number: CN202410600070.4A
Authority: CN
Inventors: 赵剑; 汪想; 刘蓬勃
Original assignee: Dalian University of Technology
Current assignee: Dalian University of Technology
Priority date: 2024-05-15
Filing date: 2024-05-15
Publication date: 2024-07-12

Abstract

The embodiment of the disclosure provides an automobile CAN network intrusion detection method and device, electronic equipment and storage media, and relates to the technical field of automobile information security. The method comprises the following steps: acquiring a flow data packet on a CAN bus of a target vehicle in a normal running process, and establishing a normal flow data set according to the flow data packet; counting the ID of each flow data packet in the normal flow data set, and creating an original data set according to the ID of each flow data packet, the attack type and the normal flow data set; extracting ID and data domain information in an original data set as classification features, and establishing a feature data set according to the classification features; constructing a CAN network Stacking intrusion detection model of the target vehicle according to the characteristic data set; and inputting the ID and the data field information in the attack data set into a Stacking intrusion detection model, and predicting whether the CAN of the target vehicle has network intrusion according to the output result. The method provided by the disclosure CAN effectively improve the accuracy and stability of the intrusion detection of the automobile CAN network.

Description

Automobile CAN network intrusion detection method and device, electronic device and storage medium

技术领域Technical Field

本公开涉及智能网联汽车信息安全技术领域，具体涉及一种汽车CAN网络入侵检测方法和装置、及电子设备和存储介质。The present disclosure relates to the field of information security technology for intelligent networked vehicles, and in particular to a method and device for detecting intrusion into a vehicle CAN network, and electronic equipment and a storage medium.

背景技术Background technique

随着网络技术与汽车技术的蓬勃发展，现代汽车智能化功能越来越丰富，汽车与外部信息交互越来越频繁，车载网络受到入侵的风险越来越高。在车载网络中，CAN总线以其多主控、高可靠性和低成本的优点得到广泛应用，但CAN总线在设计之初并未考虑网络安全保护机制，导致车载网络安全面临极大挑战。With the vigorous development of network technology and automobile technology, modern automobiles have more and more intelligent functions, and the interaction between automobiles and external information is becoming more and more frequent, and the risk of intrusion of vehicle networks is getting higher and higher. In vehicle networks, CAN bus is widely used for its advantages of multiple master controls, high reliability and low cost, but the network security protection mechanism was not considered at the beginning of the design of CAN bus, resulting in great challenges to vehicle network security.

目前，CAN总线信息安全检测技术主要有以下几种：数据加密、身份认证和入侵检测。前两种主要是采用加密技术和认证方法对CAN网络数据进行安全保护，与外部系统进行隔离，避免不符合通信协议的消息进入。而入侵检测通过设计相应算法，基于数据特征进行安全检测。相关技术如下：At present, there are mainly the following CAN bus information security detection technologies: data encryption, identity authentication and intrusion detection. The first two mainly use encryption technology and authentication methods to protect the CAN network data, isolate it from the external system, and prevent messages that do not comply with the communication protocol from entering. Intrusion detection designs corresponding algorithms and performs security detection based on data features. The relevant technologies are as follows:

申请公布号CN115314311A公开了一种基于CAN总线数据帧的车载入侵检测方法及系统。他使用One-hot编码对消息进行预处理，基于GAN算法构建包含生成器网络和判别器网络的入侵检测模型，利用训练好的判别器实时监测CAN网络数据传输，保护车辆安全。Application publication number CN115314311A discloses a vehicle-mounted intrusion detection method and system based on CAN bus data frames. He uses one-hot encoding to preprocess messages, builds an intrusion detection model based on the GAN algorithm that includes a generator network and a discriminator network, and uses the trained discriminator to monitor CAN network data transmission in real time to protect vehicle safety.

申请公布号CN113612786B公开了一种车辆总线的入侵检测系统及方法。该方法包括以下步骤：获取CAN报文并将所述CAN报文中的CAN帧进行分类，校验所述CAN帧的帧格式，校验所述CAN帧的CRC码，将CAN帧输入构建的对抗生成网络模型进行异常检测，判断所述CAN报文是否先后遭受已知攻击或未知攻击，若遭受已知攻击或未知攻击则报警。该发明所述系统包括CAN帧识别模块、帧格式检验模块、CRC检验模块和对抗生成神经网络检测模块。Application publication number CN113612786B discloses an intrusion detection system and method for a vehicle bus. The method includes the following steps: obtaining a CAN message and classifying the CAN frames in the CAN message, verifying the frame format of the CAN frame, verifying the CRC code of the CAN frame, inputting the CAN frame into the constructed adversarial generation network model for abnormality detection, determining whether the CAN message has been subjected to known attacks or unknown attacks, and alarming if it has been subjected to known attacks or unknown attacks. The system described in the invention includes a CAN frame recognition module, a frame format verification module, a CRC verification module, and an adversarial generation neural network detection module.

申请公布号CN109067773A公开了一种基于神经网络的车载CAN网络入侵检测方法及系统。该发明以CAN网络数据包的发送频率作为BP神经网络的输入，使用主成分分析法PCA对数据进行降维，检测各种CAN数据包的发送频率，使用遗传算法优化BP神经网络，以发动机转速、进气量、车速、节气门具有相关性的数据作为BP神经网络的输入，判断当前网络是否存在异常并给予报警。Application publication number CN109067773A discloses a vehicle-mounted CAN network intrusion detection method and system based on neural network. The invention uses the transmission frequency of CAN network data packets as the input of BP neural network, uses principal component analysis PCA to reduce the dimension of data, detects the transmission frequency of various CAN data packets, uses genetic algorithm to optimize BP neural network, uses engine speed, intake volume, vehicle speed, throttle valve related data as the input of BP neural network, determines whether there is an abnormality in the current network and gives an alarm.

申请公布号CN115051852A公开了一种基于深度学习的车载CAN总线入侵检测算法。该方法包括以下步骤：将数据集中CAN ID与其相应的flag标签分离出来，将分离出来的CAN ID转化为十进制浮点数，然后以步长为64对数据集进行分割，使用GAF编码将分割后的CAN ID序列转化为二维图像，划分训练集和测试集后对模型进行训练。Application publication number CN115051852A discloses a vehicle-mounted CAN bus intrusion detection algorithm based on deep learning. The method includes the following steps: separating the CAN ID and its corresponding flag tag in the data set, converting the separated CAN ID into a decimal floating point number, then segmenting the data set with a step size of 64, using GAF coding to convert the segmented CAN ID sequence into a two-dimensional image, and dividing the training set and the test set to train the model.

申请公布号CN113162902B公开了一种基于深度学习的低时延、安全的车载入侵检测方法。该方法包括以下步骤：采用独热向量编码将CAN流量的仲裁位编码为2-D图像，编码器通过生成式对抗网络提取CAN图像特征，且引入随机相位θ和虚数b隐藏和混淆真实特征，处理器在云端采用卷积神经网络和注意力机制提取深度特征，解码器对深度特征解码并利用浅层网络识别异常流量。Application publication number CN113162902B discloses a low-latency, secure vehicle-mounted intrusion detection method based on deep learning. The method includes the following steps: using one-hot vector encoding to encode the arbitration bit of CAN traffic into a 2-D image, the encoder extracts CAN image features through a generative adversarial network, and introduces random phase θ and imaginary number b to hide and confuse real features, the processor uses a convolutional neural network and attention mechanism in the cloud to extract deep features, and the decoder decodes the deep features and uses a shallow network to identify abnormal traffic.

申请公布号CN113824684A公布了一种基于迁移学习的车载网络入侵检测方法及系统。该发明提取连续的29条CAN ID，将ID序列转化为特征矩阵作为输入，由基于DenseNet的检测模型提取该特征矩阵的时序特征，由基于GAN的检测模型提取该特征矩阵的时序特征，判断是否符合未知攻击特征，若符合则警报且作为未知攻击样本存储，当存储样本达到一定数量，使用PCA方法进行降维，使用Meanshift方法对降维后的样本进行分类，得到具有预分类的未知攻击数据集，完成入侵检测系统的更新。Application publication number CN113824684A discloses a vehicle network intrusion detection method and system based on transfer learning. The invention extracts 29 consecutive CAN IDs, converts the ID sequence into a feature matrix as input, extracts the time series features of the feature matrix by a detection model based on DenseNet, and extracts the time series features of the feature matrix by a detection model based on GAN, determines whether it meets the unknown attack characteristics, and if so, alarms and stores it as an unknown attack sample. When the number of stored samples reaches a certain number, PCA method is used for dimensionality reduction, and Meanshift method is used to classify the samples after dimensionality reduction, to obtain a pre-classified unknown attack data set, and complete the update of the intrusion detection system.

申请公布号CN114157469A公布了一种基于域对抗神经网络的车载网络变种攻击入侵检测方法及系统。该发明利用USB-CANTOOL软件获取真实车辆上的正常数据，并挑选注入攻击的ID和数据段，将采集到的数据集划分为源域数据集、目标域数据和测试数据集，提取连续的25条CAN消息的数据段，通过三个不同卷积核大小的模块得到的特征经过拼接后输出最终的特征，将已知攻击的特征作为输入，判断攻击类型进行输出。Application publication number CN114157469A discloses a method and system for detecting variant attacks on vehicle networks based on domain adversarial neural networks. The invention uses USB-CANTOOL software to obtain normal data on real vehicles, selects the ID and data segment of the injected attack, divides the collected data set into source domain data set, target domain data and test data set, extracts data segments of 25 consecutive CAN messages, and outputs the final features after splicing the features obtained by modules with three different convolution kernel sizes, takes the features of known attacks as input, and determines the attack type for output.

《基于神经网络的车载CAN网络入侵检测系统的研究》提出一种基于神经网络的入侵检测方法，通过分析CAN网络的总线特性以及各ECU特性，利用神经网络模型来应对篡改、重放和注入攻击。"Research on Intrusion Detection System of Vehicle CAN Network Based on Neural Network" proposes a neural network-based intrusion detection method. By analyzing the bus characteristics of the CAN network and the characteristics of each ECU, the neural network model is used to deal with tampering, replay and injection attacks.

《智能网联汽车安全网关技术的研究与实现》提出一种基于安全网关总线防御机制，采用混合消息认证码以及双向挑战的认证策略判断是否异常。"Research and Implementation of Security Gateway Technology for Intelligent Connected Vehicles" proposes a security gateway bus defense mechanism, which uses a hybrid message authentication code and a two-way challenge authentication strategy to determine whether there is an abnormality.

《面向联网汽车车内网络的防御技术研究与实现》提出一种用于车载以太网基于移动目标防御思想的动态加密数据方法，基于车载CAN网络通信矩阵理论知识的入侵检测方案，结合车内通信数据的特点，对车联网络进行防护。"Research and Implementation of Defense Technology for In-Vehicle Networks of Connected Cars" proposes a dynamic data encryption method for in-vehicle Ethernet based on the idea of mobile target defense, an intrusion detection scheme based on the theoretical knowledge of in-vehicle CAN network communication matrix, and combines the characteristics of in-vehicle communication data to protect the vehicle-connected network.

《面向车联网信息安全问题的安全机制研究》提出一种轻量级的基于身份的无证书公钥认证体制及车辆群组密钥管理机制，对攻击报文进行有效的检测防御。"Research on Security Mechanisms for Information Security Issues in Internet of Vehicles" proposes a lightweight identity-based certificateless public key authentication system and vehicle group key management mechanism to effectively detect and defend against attack messages.

《网联汽车入侵检测系统的研究与实现》提出一种基于字节级别和位级别的入侵检测规则的基于CAN网络的入侵检测系统，包括数据采集、数据预处理、入侵检测引擎、记录与告警、规则更新等模块。"Research and Implementation of Intrusion Detection System for Connected Vehicles" proposes an intrusion detection system based on CAN network based on byte-level and bit-level intrusion detection rules, including modules such as data acquisition, data preprocessing, intrusion detection engine, recording and alarm, and rule update.

《基于关联规则的车载网络入侵检测研究》提出一种基于周期的CAN网络的入侵检测方案，从数据挖掘的角度对汽车网络的报文数据关联性和特征进行分析，从而在数据中发现ECU之间的关联性和特征，进而判断是否存在攻击。"Research on Intrusion Detection of In-vehicle Networks Based on Association Rules" proposes an intrusion detection scheme based on periodic CAN networks. It analyzes the correlation and characteristics of message data in the automotive network from the perspective of data mining, thereby discovering the correlation and characteristics between ECUs in the data, and then determining whether there is an attack.

《基于Renyi信息熵的CAN网络异常检测方法》提出一种基于Renyi信息熵及Renyi散度的异常检测模型，从ID特征、熵值或者报文周期性角度出发对数据域信息的异常进行监测，保证汽车CAN网络的安全性。"CAN Network Anomaly Detection Method Based on Renyi Information Entropy" proposes an anomaly detection model based on Renyi information entropy and Renyi divergence. It monitors the anomalies of data domain information from the perspective of ID characteristics, entropy value or message periodicity to ensure the security of the automotive CAN network.

《基于LSTM网络的车载CAN网络异常检测研究》提出一种基于LSTM网络的车载CAN网络异常检测模型，该方法可以有效检测重放攻击和帧伪造。"Research on Vehicle CAN Network Anomaly Detection Based on LSTM Network" proposes a vehicle CAN network anomaly detection model based on LSTM network. This method can effectively detect replay attacks and frame forgery.

《基于AdaBoost算法的车载CAN网络报文异常检测》提出一种基于AdaBoost算法的车载CAN网络报文异常检测模型，采用CART决策树作为基础弱分类器，将报文数据域的64位分为8个字节分别输入到模型，判断报文是否异常。"Anomaly Detection of In-Vehicle CAN Network Messages Based on AdaBoost Algorithm" proposes an anomaly detection model for in-vehicle CAN network messages based on the AdaBoost algorithm. It uses the CART decision tree as the basic weak classifier, divides the 64 bits of the message data field into 8 bytes and inputs them into the model to determine whether the message is abnormal.

《一种基于支持向量机的车载网络异常检测方法》提出一种基于支持向量机的汽车CAN网络入侵检测算法，将数据域各字节的信息熵作为支持向量机的输入，判断是否异常。"A vehicle network anomaly detection method based on support vector machine" proposes an automotive CAN network intrusion detection algorithm based on support vector machine. The information entropy of each byte in the data domain is used as the input of the support vector machine to determine whether it is abnormal.

《基于支持向量机的CAN-FD网络异常入侵检测》提出一种基于支持向量机的方案对CAN-FD报文进行检测，将时间信息、报文标识符ID信息以及48个数据域信息作为模型输入训练模型，得出分类模型，来对CAN总线进行异常检测。"CAN-FD Network Anomaly Intrusion Detection Based on Support Vector Machine" proposes a solution based on support vector machine to detect CAN-FD messages. Time information, message identifier ID information and 48 data field information are used as model input to train the model to obtain a classification model to detect anomalies on the CAN bus.

结合智能网联汽车行驶和车载网络具体情况，上述技术存在若干缺点：加密技术会涉及调整CAN通信协议，改变CAN帧的信息格式，不利于实际的实施；数据的加解密过程以及身份认证过程，会增加ECU计算负担，影响车载网络通信的实时性，同时也会造成某个消息占用CAN网络时间过长；入侵检测系统，面对繁杂多样的算法，如何合适的机器算法、如何提高准确率仍是难题。每种算法的原理和对数据的敏感性不同，对于同一分类问题，模型的训练误差及泛化误差可能不同，造成预测及决策的困难。Stacking集成学习可以整合多个子学习器，利用群体学习器的输出来补偿误差，相较于单一模型具有更高的决策性能和泛化能力。目前Stacking集成学习应用于汽车CAN网络入侵检测的相关研究还未见报道。Combined with the specific conditions of intelligent connected vehicles and vehicle networks, the above technologies have several shortcomings: encryption technology involves adjusting the CAN communication protocol and changing the information format of the CAN frame, which is not conducive to actual implementation; the data encryption and decryption process and the identity authentication process will increase the ECU computing burden, affect the real-time communication of the vehicle network, and also cause a certain message to occupy the CAN network for too long; intrusion detection systems, facing complex and diverse algorithms, how to choose the right machine algorithm and how to improve the accuracy are still difficult problems. The principles and sensitivity to data of each algorithm are different. For the same classification problem, the training error and generalization error of the model may be different, causing difficulties in prediction and decision-making. Stacking ensemble learning can integrate multiple sub-learners and use the output of the group learner to compensate for the error. Compared with a single model, it has higher decision-making performance and generalization ability. At present, there is no report on the application of Stacking ensemble learning in automotive CAN network intrusion detection.

发明内容Summary of the invention

本公开提供的汽车CAN网络入侵检测方法，能够有效的提升对汽车CAN网络入侵检测的准确性及稳定性。The automobile CAN network intrusion detection method provided by the present disclosure can effectively improve the accuracy and stability of automobile CAN network intrusion detection.

根据本公开实施例的第一方面，提供一种汽车CAN网络入侵检测方法，该方法包括：According to a first aspect of an embodiment of the present disclosure, a method for detecting intrusion in a CAN network of an automobile is provided, the method comprising:

获取目标车辆在正常行驶过程中CAN总线上的流量数据包，并根据所述流量数据包建立正常流量数据集；Acquire traffic data packets on the CAN bus of the target vehicle during normal driving, and establish a normal traffic data set based on the traffic data packets;

统计所述正常流量数据集中每个所述流量数据包的ID，并根据每个所述流量数据包的ID、攻击类型和正常流量数据集创建原始数据集；Counting the ID of each of the traffic data packets in the normal traffic data set, and creating an original data set according to the ID of each of the traffic data packets, the attack type and the normal traffic data set;

提取所述原始数据集中的ID和数据域信息作为分类特征，并根据所述分类特征建立特征数据集；Extracting ID and data domain information from the original data set as classification features, and establishing a feature data set based on the classification features;

根据所述特征数据集构建所述目标车辆的CAN网络Stacking入侵检测模型；Constructing a CAN network Stacking intrusion detection model of the target vehicle according to the feature data set;

将所述攻击数据集中的ID和数据域信息输入Stacking入侵检测模型，并根据输出结果预测此所述目标车辆的CAN是否存在网络入侵。The ID and data domain information in the attack data set are input into the Stacking intrusion detection model, and the output result is used to predict whether the CAN of the target vehicle has been network intruded.

在一个实施例中，所述根据所述特征数据集构建所述目标车辆的CAN网络Stacking入侵检测模型包括：In one embodiment, constructing the CAN network Stacking intrusion detection model of the target vehicle according to the feature data set includes:

按预设比例随机抽取所述特征数据集中的部分特征数据作为训练集S，其余特征数据作为测试集P；Randomly extract part of the feature data in the feature data set according to a preset ratio as a training set S, and the rest of the feature data as a test set P;

将所述训练集划分为n+1个子集S1，S2，…，Sn+1，并依次选取前i个子集训练基分类器；其中，所述基分类器包括支持向量机基分类器、随机森林基分类器、k近邻算法基分类器和多层感知基分类器；The training set is divided into n+1 subsets S1, S2, ..., Sn+1, and the first i subsets are selected in turn to train base classifiers; wherein the base classifiers include support vector machine base classifiers, random forest base classifiers, k-nearest neighbor algorithm base classifiers and multi-layer perception base classifiers;

利用训练好的所述基分类器对第i+1个子集进行预测，并输出预测结果，如此反复进行n次，得到所述基分类器的全部预测结果；Use the trained base classifier to predict the i+1th subset and output the prediction result, and repeat this process n times to obtain all the prediction results of the base classifier;

将所述全部预测结果进行组合，得到目标训练集N；Combining all the prediction results to obtain a target training set N;

利用所述目标训练集N训练梯度决策树分类器，得到所述目标车辆的CAN网络Stacking入侵检测模型。The target training set N is used to train a gradient decision tree classifier to obtain a CAN network Stacking intrusion detection model of the target vehicle.

在一个实施例中，所述将所述特征数据集中的ID和数据域信息输入Stacking入侵检测模型，并根据输出结果预测此所述目标车辆的CAN是否存在网络入侵包括：In one embodiment, inputting the ID and data domain information in the feature data set into the Stacking intrusion detection model, and predicting whether the CAN of the target vehicle has network intrusion according to the output result includes:

利用所述目标车辆的CAN网络Stacking入侵检测模型对所述测试集P进行估计，得到入侵检测结果；The test set P is estimated using the CAN network Stacking intrusion detection model of the target vehicle to obtain an intrusion detection result;

将所述特征数据集中的ID和数据域信息分别输入所述支持向量机基分类器、随机森林基分类器、k近邻算法基分类器和多层感知基分类器，得到测试值P1、测试值P2、测试值P3和测试值P4；Input the ID and data domain information in the feature data set into the support vector machine base classifier, the random forest base classifier, the k-nearest neighbor algorithm base classifier and the multi-layer perception base classifier respectively to obtain test values P1, P2, P3 and P4;

将所述测试值P1、测试值P2、测试值P3和测试值P4分别作为CAN报文特征输入所述目标车辆的CAN网络Stacking入侵检测模型，并根据输出结果估计对应CAN报文的异常状态；Input the test value P1, the test value P2, the test value P3 and the test value P4 as CAN message features into the CAN network Stacking intrusion detection model of the target vehicle, and estimate the abnormal state of the corresponding CAN message according to the output result;

根据所述对应CAN报文的异常状态预测所述目标车辆的CAN是否存在网络入侵。Predict whether there is network intrusion in the CAN of the target vehicle based on the abnormal state of the corresponding CAN message.

在一个实施例中，所述利用训练好的所述基分类器对第i+1个子集进行预测，并输出预测结果，如此反复进行n次，得到所述基分类器的全部预测结果之前，所述方法还包括：In one embodiment, the method further comprises: using the trained base classifier to predict the i+1th subset and outputting the prediction result, and repeating this process n times to obtain all the prediction results of the base classifier.

将所述测试集P中的数据分别输入所述支持向量机基分类器、随机森林基分类器、k近邻算法基分类器和多层感知机基分类器；Input the data in the test set P into the support vector machine base classifier, the random forest base classifier, the k-nearest neighbor algorithm base classifier and the multi-layer perceptron base classifier respectively;

根据输出结果和评价指标，分别对所述支持向量机基分类器、随机森林基分类器、k近邻算法基分类器和多层感知基分类器进行评价。According to the output results and the evaluation indexes, the support vector machine base classifier, the random forest base classifier, the k-nearest neighbor algorithm base classifier and the multi-layer perception base classifier are evaluated respectively.

在一个实施例中，所述将所述测试值P1、测试值P2、测试值P3和测试值P4作为CAN报文特征输入所述目标车辆的CAN网络Stacking入侵检测模型，并根据输出结果估计对应CAN报文的异常状态之前，所述方法还包括：In one embodiment, before inputting the test values P1, P2, P3 and P4 as CAN message features into the CAN network Stacking intrusion detection model of the target vehicle and estimating the abnormal state of the corresponding CAN message according to the output result, the method further includes:

将所述测试集P中的数据输入至目标车辆的CAN网络Stacking入侵检测模型；Input the data in the test set P into the CAN network Stacking intrusion detection model of the target vehicle;

根据输出结果和评价指标，对所述目标车辆的CAN网络Stacking入侵检测模型进行评价。According to the output results and evaluation indicators, the CAN network Stacking intrusion detection model of the target vehicle is evaluated.

在一个实施例中，所述评价指标包括ROC曲线、AUC面积和准确率中的至少之一。In one embodiment, the evaluation index includes at least one of ROC curve, AUC area and accuracy.

根据本公开实施例的第二方面，提供一种汽车CAN网络入侵检测装置，所述装置包括：According to a second aspect of an embodiment of the present disclosure, a vehicle CAN network intrusion detection device is provided, the device comprising:

获取模块，获取目标车辆在正常行驶过程中CAN总线上的流量数据包，并根据所述流量数据包建立正常流量数据集；An acquisition module is used to acquire traffic data packets on the CAN bus of the target vehicle during normal driving, and to establish a normal traffic data set based on the traffic data packets;

统计模块，统计所述正常流量数据集中每个所述流量数据包的ID，并根据每个所述流量数据包的ID、攻击类型和正常流量数据集创建原始数据集；A statistics module, which counts the ID of each of the traffic data packets in the normal traffic data set, and creates an original data set according to the ID of each of the traffic data packets, the attack type and the normal traffic data set;

提取模块，提取所述原始数据集中的ID和数据域信息作为分类特征，并根据所述分类特征建立特征数据集；An extraction module extracts the ID and data domain information in the original data set as classification features, and establishes a feature data set based on the classification features;

构建模块，根据所述攻击数据集和所述特征数据集构建所述目标车辆的CAN网络Stacking入侵检测模型；A construction module is used to construct a CAN network Stacking intrusion detection model of the target vehicle according to the attack data set and the feature data set;

预测模块，将所述特征数据集中的ID和数据域信息输入Stacking入侵检测模型，并根据输出结果预测此所述目标车辆的CAN是否存在网络入侵。The prediction module inputs the ID and data domain information in the feature data set into the Stacking intrusion detection model, and predicts whether the CAN of the target vehicle has network intrusion based on the output result.

在一个实施例中，所述构建模块包括：In one embodiment, the building blocks include:

提取子模块，按预设比例随机抽取所述特征数据集中的部分特征数据作为训练集S，其余特征数据作为测试集P；The extraction submodule randomly extracts part of the feature data in the feature data set as a training set S according to a preset ratio, and the remaining feature data as a test set P;

划分子模块，将所述训练集划分为n+1个子集S1，S2，…，Sn+1，并依次选取前i个子集训练基分类器；其中，所述基分类器包括支持向量机基分类器、随机森林基分类器、k近邻算法基分类器和多层感知基分类器；A sub-module is used to divide the training set into n+1 subsets S1, S2, ..., Sn+1, and the first i subsets are selected in turn to train base classifiers; wherein the base classifiers include a support vector machine base classifier, a random forest base classifier, a k-nearest neighbor algorithm base classifier, and a multi-layer perception base classifier;

预测子模块，利用训练好的所述基分类器对第i+1个子集进行预测，并输出预测结果，如此反复进行n次，得到所述基分类器的全部预测结果；The prediction submodule uses the trained base classifier to predict the i+1th subset and outputs the prediction result, and repeats this process n times to obtain all the prediction results of the base classifier;

组合子模块，将所述全部预测结果进行组合，得到目标训练集N；A combination submodule combines all the prediction results to obtain a target training set N;

训练子模块，利用所述目标训练集N训练梯度决策树分类器，得到所述目标车辆的CAN网络Stacking入侵检测模型。The training submodule uses the target training set N to train a gradient decision tree classifier to obtain a CAN network Stacking intrusion detection model of the target vehicle.

本申请实施例的第三个方面，提供了一种计算机设备，包括：包括存储器和处理器，存储器存储有计算机程序，处理器执行计算机程序时实现如上任一项方法的步骤。According to a third aspect of an embodiment of the present application, a computer device is provided, comprising: a memory and a processor, wherein the memory stores a computer program, and the processor implements the steps of any of the above methods when executing the computer program.

本申请实施例的第四个方面，提供了一种计算机可读存储介质，其上存储有计算机程序，计算机程序被处理器执行时实现如上任一项的方法的步骤。According to a fourth aspect of the embodiments of the present application, a computer-readable storage medium is provided, on which a computer program is stored. When the computer program is executed by a processor, the steps of any of the above methods are implemented.

本公开提供的汽车CAN网络入侵检测方法，通过基于真实车辆CAN报文的ID和数据域特征建立了Stacking入侵检测模型，该Stacking入侵检测模型对四种基分类器预测结果进行了综合，且四种基分类器通过树结构Parzen估计器和交叉验证进行了参数优化，从而有效地提升了对汽车CAN网络入侵检测的准确性及稳定性。The automobile CAN network intrusion detection method provided by the present invention establishes a Stacking intrusion detection model based on the ID and data domain characteristics of real vehicle CAN messages. The Stacking intrusion detection model integrates the prediction results of four base classifiers, and the parameters of the four base classifiers are optimized through a tree structured Parzen estimator and cross-validation, thereby effectively improving the accuracy and stability of automobile CAN network intrusion detection.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1为本公开实施提供的汽车CAN网络入侵检测方法的流程图。FIG1 is a flow chart of a method for detecting intrusion into a vehicle CAN network provided by an embodiment of the present disclosure.

图2为本公开实施提供的汽车CAN网络入侵检测方法的流程图。FIG2 is a flow chart of the automobile CAN network intrusion detection method provided by the present disclosure.

图3为本公开实施提供的汽车CAN网络入侵检测方法的逻辑图。FIG3 is a logic diagram of the automobile CAN network intrusion detection method provided by the present disclosure.

图4为本公开实施提供的汽车CAN网络入侵检测装置的架构图。FIG4 is an architecture diagram of an automobile CAN network intrusion detection device provided by an embodiment of the present disclosure.

图5为本公开实施提供的汽车CAN网络入侵检测装置的架构图。FIG5 is an architecture diagram of an automobile CAN network intrusion detection device provided by an embodiment of the present disclosure.

图6是本公开实施例提供的计算机设备结构示意图。FIG. 6 is a schematic diagram of the computer device structure provided by an embodiment of the present disclosure.

具体实施方式Detailed ways

这里将详细地对示例性实施例进行说明，其示例表示在附图中。下面的描述涉及附图时，除非另有表示，不同附图中的相同数字表示相同或相似的要素。以下示例性实施例中所描述的实施方式并不代表与本公开相一致的所有实施方式。相反，它们仅是与如所附权利要求书中所详述的、本公开的一些方面相一致的装置的例子。Exemplary embodiments will be described in detail herein, examples of which are shown in the accompanying drawings. When the following description refers to the drawings, the same numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present disclosure. Instead, they are merely examples of devices consistent with some aspects of the present disclosure as detailed in the appended claims.

图1为本公开实施例提供的一种汽车CAN网络入侵检测方法的流程图。FIG1 is a flow chart of a method for detecting intrusion into an automobile CAN network provided in an embodiment of the present disclosure.

如图1所示，该方法包括：As shown in FIG1 , the method includes:

步骤101、获取目标车辆在正常行驶过程中CAN总线上的流量数据包，并根据所述流量数据包建立正常流量数据集；Step 101, obtaining a flow data packet on the CAN bus during normal driving of the target vehicle, and establishing a normal flow data set according to the flow data packet;

在本步骤中，通过车载OBD-II端口收集目标车辆在正常行驶过程中CAN总线上的实时通信流量数据包，采集时间为10分钟，将数据包保存至正常数据流量数据集。In this step, the real-time communication traffic data packets on the CAN bus of the target vehicle during normal driving are collected through the vehicle-mounted OBD-II port. The collection time is 10 minutes, and the data packets are saved in the normal data traffic data set.

步骤102、统计所述正常流量数据集中每个所述流量数据包的ID，并根据每个所述流量数据包的ID、攻击类型和正常流量数据集创建原始数据集；Step 102: Count the ID of each traffic data packet in the normal traffic data set, and create an original data set according to the ID of each traffic data packet, the attack type and the normal traffic data set;

在本步骤中，通过在正常流量数据集中注入高优先级的数据帧，构建拒绝攻击数据集；通过在正常流量数据集中注入重复消息，构建重放攻击数据集；通过在正常流量数据集中注入非法消息，构建注入攻击数据集；通过在正常流量数据集中随机丢弃正常消息，构建丢弃攻击数据集，最后得到包含正常流量数据集、拒绝攻击数据集、重复攻击数据集、注入攻击数据集和丢弃攻击数据集的原始数据集。In this step, a rejection attack dataset is constructed by injecting high-priority data frames into the normal traffic dataset; a replay attack dataset is constructed by injecting duplicate messages into the normal traffic dataset; an injection attack dataset is constructed by injecting illegal messages into the normal traffic dataset; a discard attack dataset is constructed by randomly discarding normal messages in the normal traffic dataset. Finally, the original dataset containing the normal traffic dataset, the rejection attack dataset, the duplicate attack dataset, the injection attack dataset and the discard attack dataset is obtained.

示例性地，统计正常数据流量数据集各ID数据包，得到车内电子控制单元的ID个数有55个，通过在正常流量数据集中注入高优先级的数据帧ID＝0x000，构建拒绝攻击数据集；通过在正常流量数据集中注入ID＝0x0BA、ID＝0x2C1、ID＝0x2C4重复消息，构建重放攻击数据集；通过在正常流量数据集中注入ID＝0x001、ID＝0x010、ID＝0x100非法消息，构建注入攻击数据集；通过在正常流量数据集中随机丢弃ID＝0x2D0、ID＝0X2D5、ID＝0x3B3正常消息，构建丢弃攻击数据集；最后得到包含正常类别数据集、拒绝攻击数据集、重复攻击数据集、注入攻击数据集和丢弃攻击数据集的原始数据集。Exemplarily, by counting the ID data packets of the normal data traffic data set, it is obtained that the number of IDs of the electronic control unit in the vehicle is 55. By injecting a high-priority data frame ID=0x000 into the normal traffic data set, a rejection attack data set is constructed; by injecting repeated messages of ID=0x0BA, ID=0x2C1, and ID=0x2C4 into the normal traffic data set, a replay attack data set is constructed; by injecting illegal messages of ID=0x001, ID=0x010, and ID=0x100 into the normal traffic data set, an injection attack data set is constructed; by randomly discarding normal messages of ID=0x2D0, ID=0X2D5, and ID=0x3B3 in the normal traffic data set, a discard attack data set is constructed; finally, an original data set including a normal category data set, a rejection attack data set, a repeated attack data set, an injection attack data set, and a discard attack data set is obtained.

步骤103、提取所述原始数据集中的ID和数据域信息作为分类特征，并根据所述分类特征建立特征数据集；Step 103: extracting the ID and data domain information in the original data set as classification features, and establishing a feature data set according to the classification features;

在本步骤中，分别提取原始数据集中ID和数据域信息作为分类特征，得到特征数据集。In this step, the ID and data domain information in the original data set are extracted as classification features to obtain a feature data set.

步骤104、根据所述特征数据集构建所述目标车辆的CAN网络Stacking入侵检测模型；Step 104: construct a CAN network Stacking intrusion detection model of the target vehicle according to the feature data set;

在本步骤中，Stacking入侵检测模型由两层模型叠加，在第一层基分类器本实例采用“支持向量机基分类器”、“随机森林基分类器”、“k近邻算法基分类器”及“多层感知机基分类器”，第二层元分类器本实例采用“梯度提升决策树”将四种基分类器的预测概率堆叠形成新的特征集进行训练得到最终的集成模型。In this step, the Stacking intrusion detection model is composed of two layers of models. In the first layer, the base classifier in this example uses "support vector machine base classifier", "random forest base classifier", "k nearest neighbor algorithm base classifier" and "multilayer perceptron base classifier". In this example, the second layer meta classifier uses "gradient boosting decision tree" to stack the prediction probabilities of the four base classifiers to form a new feature set for training to obtain the final integrated model.

如图2所示，所述根据所述特征数据集构建所述目标车辆的CAN网络Stacking入侵检测模型包括：As shown in FIG2 , constructing the CAN network Stacking intrusion detection model of the target vehicle according to the feature data set includes:

步骤201、按预设比例随机抽取所述特征数据集中的部分特征数据作为训练集S，其余特征数据作为测试集P；Step 201: randomly extract part of the feature data in the feature data set according to a preset ratio as a training set S, and the rest of the feature data as a test set P;

示例性地，本实施例中可以随机抽取特征数据集中60％的特征数据为训练集S，40％的特征数据作为测试集P。Exemplarily, in this embodiment, 60% of the feature data in the feature data set can be randomly extracted as a training set S, and 40% of the feature data can be used as a test set P.

步骤202、将所述训练集划分为n+1个子集S1，S2，…，Sn+1，并依次选取前i个子集训练基分类器；其中，所述基分类器包括支持向量机基分类器、随机森林基分类器、k近邻算法基分类器和多层感知机基分类器；其中，n和i均为自然数，且N≥1，i≥1。Step 202: Divide the training set into n+1 subsets S1, S2, ..., Sn+1, and select the first i subsets in turn to train base classifiers; wherein the base classifiers include support vector machine base classifiers, random forest base classifiers, k-nearest neighbor algorithm base classifiers and multilayer perceptron base classifiers; wherein n and i are both natural numbers, and N≥1, i≥1.

在步骤中，需要分别建立支持向量机基分类器预测模型、随机森林基分类器预测模型、k近邻算法基分类器预测模型和多层感知机基分类器预测模型，具体建立方法如下：In the steps, it is necessary to establish a support vector machine based classifier prediction model, a random forest based classifier prediction model, a k-nearest neighbor algorithm based classifier prediction model, and a multi-layer perceptron based classifier prediction model respectively. The specific establishment method is as follows:

建立支持向量机(SVM)预测模型，具体方法如下：将训练集S中数据带入支持向量机模型进行训练，运用树结构Parzen估计器和交叉验证后搜索得到SVM模型的最优超参数，其中重要参数惩罚参数c为0.1、核函数(kernel)为“precomputed”及核函数参数(gamma)为“auto”，基于训练集数据利用搜索得到的最优参数组合建立支持向量机模型。A support vector machine (SVM) prediction model was established. The specific method was as follows: the data in the training set S was brought into the support vector machine model for training, and the optimal hyperparameters of the SVM model were searched using the tree structured Parzen estimator and cross-validation, where the important parameter penalty parameter c was 0.1, the kernel function (kernel) was "precomputed", and the kernel function parameter (gamma) was "auto". The support vector machine model was established based on the training set data using the optimal parameter combination obtained by the search.

建立随机森林(RF)预测模型，具体方法如下：将训练集S中数据带入支持随机森林模型进行训练，利用树结构Parzen估计器和交叉验证后筛选得到随机森林模型的最优超参数，其中基础决策树的数量(n_estimators)为100，每一个基础决策树模型的最大深度(max_depth)为100，布尔值(bootstrap)为True，即使用采样法bootstrapsampling来产生决策树的训练数据，基于训练集数据利用搜索得到的最优参数组合建立随机森林模型。A random forest (RF) prediction model was established. The specific method was as follows: the data in the training set S was brought into the supporting random forest model for training, and the optimal hyperparameters of the random forest model were screened using the tree structure Parzen estimator and cross-validation. The number of basic decision trees (n_estimators) was 100, the maximum depth (max_depth) of each basic decision tree model was 100, and the Boolean value (bootstrap) was True, that is, the sampling method bootstrapsampling was used to generate training data for the decision tree, and the random forest model was established based on the training set data using the optimal parameter combination obtained by search.

建立k近邻算法(KNN)预测模型，具体方法如下：将训练集S中数据带入k近邻算法进行训练，利用树结构Parzen估计器和交叉验证后筛选得到k近邻算法的最优超参数，其中近邻样本个数(n_neighbors)为10，指定近邻样本的投票权重(weights)为“uniform”，即所有近邻样本的投票权重一致，近邻样本的搜索算法(algorithm)为“kd_tree”，指定树叶子节点所包含的最小样本量(leaf_size)为30，基于训练集数据利用搜索得到的最优参数组合建立k近邻算法模型。A k-nearest neighbor (KNN) prediction model was established. The specific method was as follows: the data in the training set S was brought into the k-nearest neighbor algorithm for training, and the optimal hyperparameters of the k-nearest neighbor algorithm were screened using the tree structure Parzen estimator and cross-validation. The number of neighbor samples (n_neighbors) was 10, the voting weights of the neighbor samples were specified as "uniform", that is, the voting weights of all neighbor samples were consistent, the search algorithm for neighbor samples was "kd_tree", and the minimum number of samples contained in the tree leaf nodes (leaf_size) was specified as 30. Based on the training set data, the k-nearest neighbor algorithm model was established using the optimal parameter combination obtained by the search.

建立多层感知机(MLP)预测模型，具体方法如下：将训练集S中数据带入多层感知机进行训练，同样利用树结构Parzen估计器和交叉验证后筛选得到多层感知机的最优超参数，其中隐藏层数量(hidden_layer_sizes)为5，每层隐藏层数量包含节点个数分别为32、64及128，优化方式(solver)为“adam”，基于训练集数据利用搜索得到的最优参数组合建立多层感知机模型。A multi-layer perceptron (MLP) prediction model is established. The specific method is as follows: the data in the training set S is brought into the multi-layer perceptron for training. The optimal hyperparameters of the multi-layer perceptron are screened by using the tree structure Parzen estimator and cross-validation. The number of hidden layers (hidden_layer_sizes) is 5, and the number of nodes in each hidden layer is 32, 64, and 128 respectively. The optimization method (solver) is "adam". Based on the training set data, the multi-layer perceptron model is established using the optimal parameter combination obtained by search.

步骤203、利用训练好的所述基分类器对第i+1个子集进行预测，并输出预测结果，如此反复进行n次，得到所述基分类器的全部预测结果；Step 203: Use the trained base classifier to predict the i+1th subset and output the prediction result, and repeat this process n times to obtain all the prediction results of the base classifier;

在本步骤中，利用训练好的支持向量机基分类器预测模型、随机森林基分类器预测模型、k近邻算法基分类器预测模型和多层感知机基分类器分别对第i+1数据子集进行预测，并输出预测结果，如此反复进行n次，得到支持向量机基分类器预测模型、随机森林基分类器预测模型、k近邻算法基分类器预测模型和多层感知机基分类器的全部预测结果。In this step, the trained support vector machine base classifier prediction model, random forest base classifier prediction model, k nearest neighbor algorithm base classifier prediction model and multi-layer perceptron base classifier are used to predict the i+1th data subset respectively, and the prediction results are output. This is repeated n times to obtain all the prediction results of the support vector machine base classifier prediction model, random forest base classifier prediction model, k nearest neighbor algorithm base classifier prediction model and multi-layer perceptron base classifier.

步骤204、将所述全部预测结果进行组合，得到目标训练集N；Step 204: combine all the prediction results to obtain a target training set N;

在本步骤中，将全部预测结果作为新的特征集输出，m个基分类器对应验证m次得到m个新特征，即m个新特征集，再将新特征集进行组合，得到一个训练集N。In this step, all prediction results are output as new feature sets. m base classifiers are verified m times to obtain m new features, that is, m new feature sets. The new feature sets are then combined to obtain a training set N.

步骤205、利用所述目标训练集N训练梯度决策树分类器，得到所述目标车辆的CAN网络Stacking入侵检测模型。Step 205: Use the target training set N to train a gradient decision tree classifier to obtain a CAN network Stacking intrusion detection model of the target vehicle.

可以理解的是，本步骤在进行利用所述目标训练集N训练梯度决策树分类器，得到所述目标车辆的CAN网络Stacking入侵检测模型之前，需要先建立梯度决策树分类器模型。It can be understood that, in this step, before using the target training set N to train the gradient decision tree classifier to obtain the CAN network Stacking intrusion detection model of the target vehicle, it is necessary to first establish a gradient decision tree classifier model.

建立梯度决策树分类器模型，具体方法如下：将训练集数据带入梯度提升决策树进行训练，同样利用树结构Parzen估计器和交叉验证后筛选得到梯度提升决策树的最优超参数，其中最大迭代次数(n_estimators)为100，学习率(learning_rate)为1，子采样(subsample)为0.5，损失函数(loss)为“deviance”，基于训练集数据利用搜索得到的最优参数组合建立梯度决策树分类器模型。A gradient decision tree classifier model is established. The specific method is as follows: the training set data is brought into the gradient boosting decision tree for training. The optimal hyperparameters of the gradient boosting decision tree are screened by using the tree structure Parzen estimator and cross-validation. The maximum number of iterations (n_estimators) is 100, the learning rate (learning_rate) is 1, the subsampling (subsample) is 0.5, and the loss function (loss) is "deviance". Based on the training set data, the gradient decision tree classifier model is established using the optimal parameter combination obtained by search.

步骤105、将所述攻击数据集中的ID和数据域信息输入Stacking入侵检测模型，并根据输出结果预测此所述目标车辆的CAN是否存在网络入侵。Step 105: Input the ID and data domain information in the attack data set into the Stacking intrusion detection model, and predict whether there is network intrusion on the CAN of the target vehicle based on the output result.

根据对应CAN报文的异常状态预测目标车辆的CAN是否存在网络入侵。Predict whether there is network intrusion on the CAN of the target vehicle based on the abnormal status of the corresponding CAN message.

可选地，所述利用训练好的所述基分类器对第i+1个子集进行预测，并输出预测结果，如此反复进行n次，得到所述基分类器的全部预测结果之前，所述方法还包括：Optionally, the method further comprises: using the trained base classifier to predict the i+1th subset and outputting the prediction result, and repeating this process n times to obtain all the prediction results of the base classifier.

可选地，所述将所述测试值P1、测试值P2、测试值P3和测试值P4作为CAN报文特征输入所述目标车辆的CAN网络Stacking入侵检测模型，并根据输出结果估计对应CAN报文的异常状态之前，所述方法还包括：Optionally, before inputting the test value P1, the test value P2, the test value P3 and the test value P4 as CAN message features into the CAN network Stacking intrusion detection model of the target vehicle and estimating the abnormal state of the corresponding CAN message according to the output result, the method further includes:

将测试集P中的数据输入至目标车辆的CAN网络Stacking入侵检测模型；Input the data in the test set P into the CAN network Stacking intrusion detection model of the target vehicle;

其中，所述评价指标包括ROC曲线、AUC面积和准确率中的至少之一。The evaluation index includes at least one of ROC curve, AUC area and accuracy.

在本实施例中，根据输出结果和评价指标，分别对支持向量机基分类器、随机森林基分类器、k近邻算法基分类器、多层感知基分类器和Stacking入侵检测模型进行评价得到的结果如下表1。In this embodiment, according to the output results and evaluation indicators, the support vector machine base classifier, random forest base classifier, k-nearest neighbor algorithm base classifier, multi-layer perception base classifier and Stacking intrusion detection model are evaluated respectively, and the results obtained are shown in Table 1 below.

表1Table 1

根据表1中的数据可知：随机森林基分类器RF的准确率最高，其测试准确率为88.51％，曲线下面积AUC值为0.98，率为88.51％，曲线下面积AUC值为0.98，经过集成后的Stacking入侵检测模型的预测准确率可达到92.18％，比单一模型预测效果更佳，同时也表明了本实施例中的Stacking入侵检测模型集合了四种分类模型的优势，更稳定可靠，对于汽车CAN网络入侵检测的准确度和覆盖能力更加全面和平衡，具有较高的可信度和预测力。According to the data in Table 1, the random forest base classifier RF has the highest accuracy, with a test accuracy of 88.51% and an area under the curve AUC value of 0.98. The prediction accuracy of the integrated Stacking intrusion detection model can reach 92.18%, which is better than the prediction effect of a single model. It also shows that the Stacking intrusion detection model in this embodiment combines the advantages of four classification models, is more stable and reliable, and is more comprehensive and balanced in terms of accuracy and coverage of automotive CAN network intrusion detection, and has higher credibility and predictive power.

图3为本公开实施例提供的一种汽车CAN网络入侵检测方法的逻辑图。FIG3 is a logic diagram of an automobile CAN network intrusion detection method provided by an embodiment of the present disclosure.

图4为本公开实施例提供的一种汽车CAN网络入侵检测装置的架构图。如图4所示，该装置包括：获取模块401、统计模块402、提取模块403、构建模块404和预测模块405；其中，获取模块401用于获取目标车辆在正常行驶过程中CAN总线上的流量数据包，并根据所述流量数据包建立正常流量数据集；统计模块402用于统计所述正常流量数据集中每个所述流量数据包的ID，并根据每个所述流量数据包的ID、攻击类型和正常流量数据集创建原始数据集；提取模块403用于提取所述原始数据集中的ID和数据域信息作为分类特征，并根据所述分类特征建立特征数据集；构建模块404用于根据所述攻击数据集和所述特征数据集构建所述目标车辆的CAN网络Stacking入侵检测模型；预测模块405用于将所述特征数据集中的ID和数据域信息输入Stacking入侵检测模型，并根据输出结果预测此所述目标车辆的CAN是否存在网络入侵。FIG4 is an architecture diagram of an automobile CAN network intrusion detection device provided by an embodiment of the present disclosure. As shown in FIG4, the device includes: an acquisition module 401, a statistics module 402, an extraction module 403, a construction module 404 and a prediction module 405; wherein the acquisition module 401 is used to acquire the traffic data packets on the CAN bus of the target vehicle during normal driving, and establish a normal traffic data set according to the traffic data packets; the statistics module 402 is used to count the ID of each traffic data packet in the normal traffic data set, and create an original data set according to the ID, attack type and normal traffic data set of each traffic data packet; the extraction module 403 is used to extract the ID and data domain information in the original data set as classification features, and establish a feature data set according to the classification features; the construction module 404 is used to construct the CAN network Stacking intrusion detection model of the target vehicle according to the attack data set and the feature data set; the prediction module 405 is used to input the ID and data domain information in the feature data set into the Stacking intrusion detection model, and predict whether the CAN of the target vehicle has network intrusion according to the output result.

图5为本公开实施例提供的一种汽车CAN网络入侵检测装置的架构图。如图5所示，该装置包括：获取模块501、统计模块502、提取模块503、构建模块504和预测模块505；其中，所述构建模块504包括提取子模块5041、划分子模块5042、预测子模块5043、组合子模块5044和训练子模块5045；其中，提取子模块5041用于按预设比例随机抽取所述特征数据集中的部分特征数据作为训练集S，其余特征数据作为测试集P；划分子模块5042用于将所述训练集划分为n+1个子集S1，S2，…，Sn+1，并依次选取前i个子集训练基分类器；其中，所述基分类器包括支持向量机基分类器、随机森林基分类器、k近邻算法基分类器和多层感知基分类器；预测子模块5043用于利用训练好的所述基分类器对第i+1个子集进行预测，并输出预测结果，如此反复进行n次，得到所述基分类器的全部预测结果；组合子模块5044用于将所述全部预测结果进行组合，得到目标训练集N；训练子模块5045用于利用所述目标训练集N训练梯度决策树分类器，得到所述目标车辆的CAN网络Stacking入侵检测模型。FIG5 is an architecture diagram of an automobile CAN network intrusion detection device provided by an embodiment of the present disclosure. As shown in FIG5 , the device includes: an acquisition module 501, a statistics module 502, an extraction module 503, a construction module 504, and a prediction module 505; wherein the construction module 504 includes an extraction submodule 5041, a division submodule 5042, a prediction submodule 5043, a combination submodule 5044, and a training submodule 5045; wherein the extraction submodule 5041 is used to randomly extract part of the feature data in the feature data set as a training set S according to a preset ratio, and the remaining feature data as a test set P; the division submodule 5042 is used to divide the training set into n+1 subsets S1, S2, ..., Sn+1, and select the first i subsets in turn. Subset training base classifier; wherein the base classifier includes a support vector machine base classifier, a random forest base classifier, a k-nearest neighbor algorithm base classifier and a multi-layer perceptron base classifier; the prediction submodule 5043 is used to use the trained base classifier to predict the i+1th subset and output the prediction result, and this is repeated n times to obtain all the prediction results of the base classifier; the combination submodule 5044 is used to combine all the prediction results to obtain the target training set N; the training submodule 5045 is used to use the target training set N to train the gradient decision tree classifier to obtain the CAN network Stacking intrusion detection model of the target vehicle.

本公开还提供了一种计算机设备，该计算机设备的内部结构图可以如图6所示。该计算机设备包括通过系统总线连接的处理器、存储器、网络接口和数据库。其中，该计算机设备的处理器用于提供计算和控制能力。该计算机设备的存储器包括非易失性存储介质、内存储器。该非易失性存储介质存储有操作系统、计算机程序和数据库。该内存储器为非易失性存储介质中的操作系统和计算机程序的运行提供环境。该计算机设备的数据库用于存储数据。该计算机设备的网络接口用于与外部的终端通过网络连接通信。该计算机程序被处理器执行时以实现如上的一种汽车CAN网络入侵检测方法。包括：包括存储器和处理器，存储器存储有计算机程序，处理器执行计算机程序时实现如上汽车CAN网络入侵检测方法中的任一步骤。The present disclosure also provides a computer device, the internal structure diagram of which can be shown in Figure 6. The computer device includes a processor, a memory, a network interface and a database connected through a system bus. Among them, the processor of the computer device is used to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program and a database. The internal memory provides an environment for the operation of the operating system and the computer program in the non-volatile storage medium. The database of the computer device is used to store data. The network interface of the computer device is used to communicate with an external terminal through a network connection. When the computer program is executed by the processor, a method for detecting an intrusion in a car CAN network as described above is implemented. It includes: a memory and a processor, the memory stores a computer program, and when the processor executes the computer program, any step in the method for detecting an intrusion in a car CAN network as described above is implemented.

本公开还提供了一种计算机可读存储介质，其上存储有计算机程序，该计算机程序被处理器执行时可以实现如上汽车CAN网络入侵检测中的任一步骤。The present disclosure also provides a computer-readable storage medium on which a computer program is stored. When the computer program is executed by a processor, any step in the above-mentioned automobile CAN network intrusion detection can be implemented.

本领域普通技术人员可以理解：实现上述各方法实施例的全部或部分步骤可以通过程序指令相关的硬件来完成。前述的程序可以预置于一计算机可读取存储介质中。该程序在执行时，执行包括上述各方法实施例的步骤；而前述的存储介质包括：ROM、RAM、磁碟或者光盘等各种可以存储程序代码的介质。Those skilled in the art can understand that all or part of the steps of implementing the above-mentioned method embodiments can be completed by hardware related to program instructions. The aforementioned program can be pre-stored in a computer-readable storage medium. When the program is executed, the steps of the above-mentioned method embodiments are executed; and the aforementioned storage medium includes: ROM, RAM, disk or optical disk and other media that can store program codes.

本领域技术人员在考虑说明书及实践这里公开的公开后，将容易想到本公开的其它实施方案。本公开旨在涵盖本公开的任何变型、用途或者适应性变化，这些变型、用途或者适应性变化遵循本公开的一般性原理并包括本公开未公开的本技术领域中的公知常识或惯用技术手段。说明书和实施例仅被视为示例性的，本公开的真正范围和精神由下面的权利要求指出。Those skilled in the art will readily appreciate other embodiments of the present disclosure after considering the specification and practicing the disclosure disclosed herein. The present disclosure is intended to cover any variations, uses, or adaptations of the present disclosure that follow the general principles of the present disclosure and include common knowledge or customary techniques in the art that are not disclosed in the present disclosure. The description and examples are to be considered exemplary only, and the true scope and spirit of the present disclosure are indicated by the following claims.

应当理解的是，本公开并不局限于上面已经描述并在附图中示出的精确结构，并且可以在不脱离其范围进行各种修改和改变。本公开的范围仅由所附的权利要求来限制。It should be understood that the present disclosure is not limited to the exact structures that have been described above and shown in the drawings, and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. An intrusion detection method for an automobile CAN network, comprising:

Acquiring a flow data packet on a CAN bus of a target vehicle in a normal running process, and establishing a normal flow data set according to the flow data packet;

counting the ID of each flow data packet in the normal flow data set, and creating an original data set according to the ID of each flow data packet, the attack type and the normal flow data set;

Extracting ID and data domain information in the original data set as classification features, and establishing a feature data set according to the classification features;

constructing a CAN network Stacking intrusion detection model of the target vehicle according to the characteristic data set;

And inputting the ID and the data field information in the attack data set into a Stacking intrusion detection model, and predicting whether the CAN of the target vehicle has network intrusion according to an output result.

2. The method of claim 1, wherein constructing a CAN network Stacking intrusion detection model of the target vehicle from the feature dataset comprises:

Randomly extracting part of characteristic data in the characteristic data set according to a preset proportion to serve as a training set S, and the rest of characteristic data serves as a test set P;

Dividing the training set into n+1 subsets S1, S2, … and Sn+1, and sequentially selecting the training base classifiers of the first i subsets; the base classifier comprises a support vector machine base classifier, a random forest base classifier, a k-nearest neighbor algorithm base classifier and a multi-layer perception base classifier;

predicting the (i+1) th subset by using the trained base classifier, outputting a prediction result, and repeating the process for n times to obtain all the prediction results of the base classifier;

combining all the prediction results to obtain a target training set N;

And training a gradient decision tree classifier by using the target training set N to obtain a CAN network Stacking intrusion detection model of the target vehicle.

3. The method of claim 2, wherein inputting the ID and data field information in the feature dataset into a Stacking intrusion detection model and predicting whether a network intrusion exists in the CAN of the target vehicle based on the output result comprises:

Estimating the test set P by using a CAN network Stacking intrusion detection model of the target vehicle to obtain an intrusion detection result;

respectively inputting the ID and the data domain information in the characteristic data set into the support vector machine base classifier, the random forest base classifier, the k nearest neighbor algorithm base classifier and the multi-layer perception base classifier to obtain a test value P1, a test value P2, a test value P3 and a test value P4;

Respectively inputting the test value P1, the test value P2, the test value P3 and the test value P4 as CAN message characteristics into a CAN network Stacking intrusion detection model of the target vehicle, and estimating an abnormal state of a corresponding CAN message according to an output result;

Predicting whether network intrusion exists in the CAN of the target vehicle according to the abnormal state of the corresponding CAN message.

4. A method according to claim 3, wherein the predicting the i+1st subset by using the trained base classifier and outputting the predicted result is repeated n times, and before obtaining all the predicted results of the base classifier, the method further comprises:

respectively inputting the data in the test set P into the support vector machine base classifier, the random forest base classifier, the k neighbor algorithm base classifier and the multi-layer perception machine base classifier;

and respectively evaluating the support vector machine base classifier, the random forest base classifier, the k neighbor algorithm base classifier and the multi-layer perception base classifier according to the output result and the evaluation index.

5. The method of claim 4, wherein before inputting the test value P1, the test value P2, the test value P3, and the test value P4 as CAN message characteristics into the CAN network Stacking intrusion detection model of the target vehicle, and estimating an abnormal state of a corresponding CAN message according to an output result, the method further comprises:

Inputting the data in the test set P to a CAN network Stacking intrusion detection model of the target vehicle;

and evaluating the CAN network Stacking intrusion detection model of the target vehicle according to the output result and the evaluation index.

6. The method of claim 4 or 5, wherein the evaluation index comprises at least one of ROC curve, AUC area, and accuracy.

7. An automotive CAN network intrusion detection device, the device comprising:

the acquisition module acquires a flow data packet on the CAN bus in the normal running process of the target vehicle, and establishes a normal flow data set according to the flow data packet;

The statistics module is used for counting the ID of each flow data packet in the normal flow data set and creating an original data set according to the ID of each flow data packet, the attack type and the normal flow data set;

The extraction module is used for extracting ID and data domain information in the original data set as classification features and establishing a feature data set according to the classification features;

The construction module is used for constructing a CAN network Stacking intrusion detection model of the target vehicle according to the attack data set and the characteristic data set;

and the prediction module inputs the ID and the data field information in the characteristic data set into a Stacking intrusion detection model, and predicts whether the CAN of the target vehicle has network intrusion according to an output result.

8. The apparatus of claim 7, wherein the build module comprises:

the extraction submodule randomly extracts part of characteristic data in the characteristic data set according to a preset proportion to serve as a training set S, and the rest of characteristic data serves as a test set P;

Dividing the training set into n+1 subsets S1, S2, … and Sn+1 by a sub-module, and sequentially selecting the training base classifiers of the first i subsets; the base classifier comprises a support vector machine base classifier, a random forest base classifier, a k-nearest neighbor algorithm base classifier and a multi-layer perception base classifier;

the prediction sub-module predicts the (i+1) th sub-set by using the trained base classifier, outputs a prediction result, and repeatedly performs the above steps for n times to obtain all the prediction results of the base classifier;

The combination sub-module is used for combining all the prediction results to obtain a target training set N;

And the training sub-module is used for training a gradient decision tree classifier by using the target training set N to obtain the CAN network Stacking intrusion detection model of the target vehicle.

9. A computer device, comprising: comprising a memory and a processor, said memory storing a computer program, characterized in that the processor implements the steps of the method according to any one of claims 1 to 8 when said computer program is executed.

10. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 8.