WO2024178581A1 - Procédé et appareil de traitement de données, et support de stockage et produit-programme - Google Patents

Procédé et appareil de traitement de données, et support de stockage et produit-programme Download PDF

Info

Publication number
WO2024178581A1
WO2024178581A1 PCT/CN2023/078600 CN2023078600W WO2024178581A1 WO 2024178581 A1 WO2024178581 A1 WO 2024178581A1 CN 2023078600 W CN2023078600 W CN 2023078600W WO 2024178581 A1 WO2024178581 A1 WO 2024178581A1
Authority
WO
WIPO (PCT)
Prior art keywords
attack
intrusion detection
samples
type
sample set
Prior art date
Application number
PCT/CN2023/078600
Other languages
English (en)
Chinese (zh)
Inventor
侯硕
岳青伦
李廷森
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to PCT/CN2023/078600 priority Critical patent/WO2024178581A1/fr
Publication of WO2024178581A1 publication Critical patent/WO2024178581A1/fr

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/40Network security protocols

Definitions

  • the present application relates to the field of network security technology and provides a data processing method, device, storage medium and program product.
  • intrusion detection samples In order to improve the network security of automobiles, the industry has built intrusion detection samples for some typical intrusion scenarios, and used these intrusion detection samples to conduct attack tests on automobiles before they leave the factory to evaluate the anti-attack performance of the automobiles. Although this method can ensure that only automobiles with good anti-attack performance leave the factory, since the intrusion detection samples are only built for some typical intrusion scenarios, the intrusion detection samples are obviously not comprehensive and are not conducive to improving the accuracy of network security assessment of automobiles.
  • the present application provides a data processing method, device, storage medium and program product for constructing a more comprehensive intrusion detection sample set to improve the accuracy of network security assessment of devices to be detected (such as vehicles).
  • the present application provides a data processing method applicable to a data processing device, which may be any device with processing capabilities, such as a server or a server cluster composed of multiple servers.
  • the method includes: the data processing device obtains a first type of attack sample by attacking a device to be detected, obtains a second type of attack sample by applying noise to the first type of attack sample, and then constructs an intrusion detection sample set based on the first type of attack sample and the second type of attack sample.
  • the first type of attack samples are obtained by attacking the device to be detected, and belong to attack samples under known intrusion scenarios
  • the second type of attack samples are obtained by adding noise to the first type of attack samples, and can be considered as attack samples under unknown intrusion scenarios obtained by performing some deformation on the attack samples under known intrusion scenarios.
  • an intrusion detection sample set is constructed by combining the first type of attack samples and the second type of attack samples, so that the intrusion detection sample set can cover both attack samples under known intrusion scenarios and attack samples under unknown intrusion scenarios, so that the intrusion detection samples in the intrusion detection sample set are more comprehensive.
  • using a more comprehensive intrusion detection sample set to evaluate the device to be detected can also improve the accuracy of network security assessment of the device to be detected.
  • the first type of attack samples may include real attack samples and simulated attack samples, wherein the real attack samples are obtained by manually attacking the device to be detected, and the simulated attack samples are obtained by attacking the device to be detected by an attack tool.
  • the human attack method can be used to construct attack behaviors that are highly concealed or rely on business logic, so as to obtain attack samples that are easy to mark in a real attack environment, while the attack tool attack method can obtain attack samples that are not easy to mark in a real attack environment by simulating the generation of attack traffic.
  • the human attack method and the attack tool attack method to comprehensively construct the first type of attack samples, the first type of attack samples can fully cover the attack samples of various attack types that may exist in the real attack environment, thereby improving the comprehensiveness of the first type of attack samples.
  • the real attack samples can correspond to one or more of the following attack types: identity document (ID) non-existence attack, replay attack, tampering attack, data length error attack, signal out of defined range attack, context error attack, ID source non-specified electronic control unit (ECU) attack, identical ID attack, controller area network (CAN) scanning attack, unified diagnostic services (UDS) execution of sensitive operation attack, message authentication error attack, ECU identity spoofing attack, man-in-the-middle attack, ECU authentication error attack, brute force attack, application layer protocol error attack, unknown pop connection attack, unknown push connection attack.
  • ID identity document
  • replay attack tampering attack
  • data length error attack data length error attack
  • signal out of defined range attack context error attack
  • ID source non-specified electronic control unit (ECU) attack identical ID attack
  • controller area network (CAN) scanning attack controller area network (CAN) scanning attack
  • UDS unified diagnostic services
  • the simulated attack samples can correspond to one or more of the following attack types: ID fuzzy attack, data fuzzy attack, CAN denial-of-service (Dos) attack, Ethernet (ethnic, ETH) Dos attack, malformed packet injection attack, port scanning attack.
  • a data processing device obtains a first type of attack sample by attacking a device to be detected, including: the data processing device traverses each of a plurality of preset attack types, and when traversing each attack type: executes an attack behavior corresponding to the attack type on the device to be detected, and obtains traffic data generated by the device to be detected for the attack behavior, and then, when it is determined that the traffic data is attack traffic, marks the traffic data as a first type of attack sample.
  • the preset multiple attack types can be, for example, the multiple attack types that are easy to mark in the real attack environment and the multiple attack types that are not easy to mark in the real attack environment given in the aforementioned design.
  • the first type of attack samples can fully cover various known attack types, thereby improving the richness and comprehensiveness of the first type of attack samples.
  • the traffic data processing device obtains the traffic data generated by the device to be detected for any attack behavior, if it is determined that the traffic data is not attack traffic but normal traffic, the traffic data can be marked as a non-attack sample. Then, after traversing all attack types, an intrusion detection sample set is constructed based on the first type of attack samples, the second type of attack samples and the non-attack samples.
  • the intrusion detection sample set includes both attack samples (including the aforementioned first type attack samples and second type attack samples) and non-attack samples.
  • attack samples including the aforementioned first type attack samples and second type attack samples
  • non-attack samples In this way, not only can the sample information in the intrusion detection sample set be more complete, but also when the intrusion detection sample set is used to perform an attack test on the device to be detected, the anti-attack effect of the device to be detected can be more accurately defined based on whether the device to be detected can intercept attack samples and whether it can not intercept non-attack samples.
  • unlabeled samples are unknown samples, that is, samples that cannot be identified as attack samples or non-attack samples under current technical means.
  • unlabeled samples are abnormal samples. For example, when the attack test of the data processing device causes the software and hardware system of the device to be detected to fail, the device to be detected itself may generate some abnormal data. These abnormal data neither meet the characteristics of normal traffic nor the characteristics of attack traffic, but will still be collected by the data processing device.
  • the data processing device can directly The intrusion detection sample set can be discarded directly to save the data volume of the intrusion detection sample set.
  • an intrusion detection sample set can be constructed together according to attack samples, unlabeled samples and non-attack samples, so as to add all samples that actually exist when attacking the device to be detected to the intrusion detection sample set, thereby improving the sample richness in the intrusion detection sample set and facilitating the subsequent marking of unlabeled samples or performing other operations through other analyses.
  • the attack samples and non-attack samples in the intrusion detection sample set occupy the same proportion, for example, the attack samples and non-attack samples each account for 50% of all samples. In this way, by equally dividing the attack samples and non-attack samples, the attack samples and non-attack samples in the intrusion detection sample set can be balanced, which is convenient for subsequent extraction of the same proportion of data for attack testing of the device to be detected, thereby improving the credibility of the attack test results.
  • the samples with a larger proportion can be trimmed so that the proportion of attack samples after trimming is the same as the proportion of non-attack samples.
  • the samples with a larger proportion are usually non-attack samples, also called context data. In this way, by trimming non-attack samples, the balance of attack samples and non-attack samples in the intrusion detection sample set can be maintained.
  • a data processing device obtains a second type of attack sample by applying noise to a first type of attack sample, including: the data processing device first applies noise to the first type of attack sample to obtain a perturbation sample, then inputs the perturbation sample into an attack recognition model, and obtains a recognition result output by the attack recognition model; when the recognition result indicates that it is impossible to determine whether the perturbation sample is an attack sample, the perturbation sample is determined to be a second type of attack sample; otherwise, the perturbation sample is adjusted according to the recognition result, and the adjusted perturbation sample is input into the attack recognition model again, and the above process is repeated until the recognition result corresponding to the adjusted perturbation sample indicates that it is impossible to determine whether the adjusted perturbation sample is an attack sample, and the adjusted perturbation sample is determined to be a second type of attack sample.
  • the second type of attack sample is an unrecognizable sample obtained by adding noise on the basis of the first type of attack sample of known attack type. It can be considered as an attack sample whose attack type cannot be determined under current technical means, that is, an attack sample of unknown attack type. In this way, by adding the attack sample to the intrusion detection sample set, the phenomenon of misreporting these attack samples of unknown attack type as non-attack samples can be avoided in real attack test scenarios.
  • the data processing device constructs an intrusion detection sample set based on the first type of attack samples and the second type of attack samples, it can also extract features from the intrusion detection sample set to obtain an offline detection sample set, which is used to input into the intrusion detection model to evaluate the detection performance of the intrusion detection model according to the detection results output by the intrusion detection model. For example, when more offline detection samples belonging to attack samples in the offline detection sample set are detected as attack samples by the intrusion detection model, and more offline detection samples belonging to non-attack samples are detected as non-attack samples by the intrusion detection model, it means that the detection effect of the intrusion detection model is better.
  • the data processing device can also adjust the parameters of the intrusion detection model, and use the adjusted intrusion detection model to re-detect the offline detection sample set, and repeat the above process until an intrusion detection model with a good detection effect is obtained.
  • the intrusion detection sample set can support the evaluation of the detection effect of the intrusion detection model in an offline state (referred to as offline evaluation), so as to continuously optimize the intrusion detection model according to the detection effect, obtain an intrusion detection model with better detection effect, and improve the implementation effect of the intrusion detection model on the device to be detected.
  • offline evaluation the evaluation of the detection effect of the intrusion detection model in an offline state
  • the data processing device extracts features from the intrusion detection sample set to obtain offline detection samples.
  • the invention discloses a set of intrusion detection samples, comprising: a data processing device first determines the message type of each intrusion detection sample in the intrusion detection sample set, and then, for each intrusion detection sample whose message type is a transmission control protocol (TCP) message, a feature extraction is performed on all intrusion detection samples belonging to the same TCP connection to obtain an offline detection sample, and for each intrusion detection sample whose message type is a CAN (FD) message or a UDP message with a variable data rate, a feature extraction is performed on the intrusion detection sample of each CAN (FD) message or UDP message to obtain an offline detection sample.
  • TCP transmission control protocol
  • the features extracted by offline evaluation may include one or more of the following features: timestamp, frequency feature, protocol type, content feature, packet loss rate, number of error packets, connection duration, connection initiator, and connection receiver.
  • the extracted features may include all the features shown here, while for intrusion detection samples belonging to CAN (FD) messages or UDP messages, they may include timestamp, frequency feature, protocol type, content feature, packet loss rate, and number of error packets.
  • the data processing device can extract all the aforementioned features for each intrusion detection sample. Then, when a certain feature of a certain intrusion detection sample does not exist, the feature of the intrusion detection sample set is configured as a preset character.
  • the preset character can be, for example, a number, letter, symbol, or a combination of one or more of them.
  • the data processing device constructs an intrusion detection sample set based on the first type of attack samples and the second type of attack samples, it can also convert the format of the intrusion detection sample set to obtain an online detection sample set that matches the format of the test tool, and then input the online detection sample set into the device to be detected through the test tool.
  • the online detection sample set is used to evaluate the detection performance of the device to be detected that is deployed with an intrusion detection model.
  • the intrusion detection sample set can support the online evaluation of the detection effect of the device to be detected that is deployed with the intrusion detection model (referred to as online evaluation), so as to determine the anti-attack performance of the device to be detected based on the detection effect, and ensure that only the device to be detected with good anti-attack effect is shipped out of the factory.
  • online evaluation the online evaluation of the detection effect of the device to be detected that is deployed with the intrusion detection model
  • test tool can be CANoe, PCAN, Technica or other tools that can realize online evaluation
  • intrusion detection sample set after format conversion can be .PCAP, .ASC, .BLF or other formats corresponding to other online test tools.
  • the data processing device constructs an intrusion detection sample set based on the first type of attack samples and the second type of attack samples, it can also determine the evaluation value corresponding to the intrusion detection sample set based on the values of the intrusion detection sample set under various preset indicators, and adjust the intrusion detection sample set when the evaluation value is lower than the preset threshold.
  • the preset indicators can be set according to the characteristics of the system architecture to which the device to be detected belongs.
  • the intrusion detection sample set can be effectively optimized according to the evaluation results, so that the optimized intrusion detection sample set is more suitable for the system architecture to which the device to be detected belongs.
  • the preset indicators may include one or more of the following indicators: data redundancy indicator, attack coverage indicator, protocol coverage indicator, business coverage indicator, data marking indicator, balance indicator, feature independence indicator, and ease of use indicator.
  • data redundancy indicator, attack coverage indicator, protocol coverage indicator, business coverage indicator, data marking indicator and balance indicator are quantitative indicators
  • feature independence indicator and ease of use indicator are qualitative indicators.
  • the value ranges of various preset indicators can be configured to be the same range, such as [0,1].
  • the evaluation value corresponding to the intrusion detection sample set can be the weighted average of the values of the intrusion detection sample set under each preset indicator, wherein the weights corresponding to each preset indicator can be the same or different.
  • each quantitative preset indicator can be configured to correspond to a first weight
  • each qualitative preset indicator can correspond to a second weight
  • the first weight is greater than the second weight.
  • the first type of attack sample can be obtained by attacking any of the following areas: the entire device to be detected; one or more physical areas of the device to be detected; or one or more functional areas of the device to be detected.
  • the one or more physical areas may include one or more of the vehicle trunk area, the left front body area, the right front body area, the left rear body area, or the right rear body area
  • the one or more functional areas may include one or more of the vehicle central control gateway area, the body control area, the cockpit control area, the power control area, the chassis control area, or the infotainment area.
  • the data processing device can construct an intrusion detection sample set for the entire device to be detected, or construct an intrusion detection sample set for one or more physical areas in the device to be detected, or construct an intrusion detection sample set for one or more functional areas in the device to be detected, or, it can also construct an intrusion detection sample set for a combination of one or more physical areas and one or more functional areas. It can be seen that the method for constructing the intrusion detection sample set can be applicable to various different construction scenarios, which helps to improve the flexibility, versatility and ease of use of the intrusion detection sample set.
  • the present application provides a data processing device, which can be any device with processing capabilities, such as a server or a server cluster composed of servers.
  • the data processing device includes: an attack unit, which is used to obtain a first type of attack sample by attacking a device to be detected; a perturbation unit, which is used to obtain a second type of attack sample by applying noise to the first type of attack sample; and a construction unit, which is used to construct an intrusion detection sample set based on the first type of attack sample and the second type of attack sample.
  • the first type of attack samples may include real attack samples and simulated attack samples.
  • the real attack samples are obtained by manually attacking the device to be detected, and the simulated attack samples are obtained by attacking the device to be detected with an attack tool.
  • the real attack sample can correspond to one or more of the following attack types: ID non-existence attack, replay attack, tampering attack, data length error attack, signal out of defined range attack, context error attack, etc. Attack, ID source non-specified ECU attack, identical ID attack, CAN scanning attack, UDS sensitive operation attack, message authentication error attack, ECU identity spoofing attack, man-in-the-middle attack, ECU authentication error attack, brute force attack, application layer protocol error attack, unknown out-of-stack connection attack, unknown in-stack connection attack.
  • ID non-existence attack ID non-specified ECU attack
  • identical ID attack identical ID attack
  • CAN scanning attack UDS sensitive operation attack
  • message authentication error attack ECU identity spoofing attack
  • man-in-the-middle attack man-in-the-middle attack
  • ECU authentication error attack brute force attack
  • application layer protocol error attack unknown out-of-stack connection attack, unknown in-stack connection attack.
  • the simulated attack samples may correspond to one or more of the following attack types: ID Fuzz attack, data Fuzz attack, CAN Dos attack, ETH Dos attack, malformed packet injection attack, and port scanning attack.
  • the attack unit is specifically used to: traverse each attack type among a plurality of preset attack types, and when traversing each attack type: execute the attack behavior corresponding to the attack type on the device to be detected, and obtain the traffic data generated by the device to be detected for the attack behavior; if the traffic data is attack traffic, mark the traffic data as a first type attack sample.
  • the attack unit after obtaining the traffic data generated by the device to be detected in response to the attack behavior, the attack unit is also used to: if the traffic data is normal traffic, mark the traffic data as a non-attack sample; correspondingly, the construction unit is specifically used to: construct an intrusion detection sample set based on the first type of attack samples, the second type of attack samples and the non-attack samples.
  • the perturbation unit is specifically used to: apply noise to the first type of attack sample to obtain a perturbation sample, input the perturbation sample into the attack recognition model, obtain the recognition result output by the attack recognition model, adjust the perturbation sample according to the recognition result, until the recognition result corresponding to the adjusted perturbation sample indicates that it is impossible to determine whether the perturbation sample is an attack sample, and then determine the adjusted perturbation sample as a second type of attack sample.
  • the recognition result output by the attack recognition model is used to indicate whether the perturbation sample is an attack sample.
  • the data processing device may further include a feature extraction unit, which is used to: extract features from the intrusion detection sample set to obtain an offline detection sample set, and the offline detection sample set is used to evaluate the detection effect of the intrusion detection model.
  • a feature extraction unit which is used to: extract features from the intrusion detection sample set to obtain an offline detection sample set, and the offline detection sample set is used to evaluate the detection effect of the intrusion detection model.
  • the feature extraction unit is specifically used to: determine the message type of each intrusion detection sample in the intrusion detection sample set, for each intrusion detection sample whose message type is a TCP message, obtain an offline detection sample by performing feature extraction on all intrusion detection samples belonging to the same TCP connection, and for each intrusion detection sample whose message type is a CAN (FD) message or a UDP message, obtain an offline detection sample by performing feature extraction on the intrusion detection sample of each CAN (FD) message or UDP message.
  • the features extracted from the aforementioned features include one or more of the following features: timestamp, frequency feature, protocol type, content feature, packet loss rate, number of error packets, connection duration, connection initiator, and connection receiver.
  • the data processing device may also include a format conversion unit, which is used to: convert the format of the intrusion detection sample set to obtain an online detection sample set that matches the format of the test tool, and input the online detection sample set into the device to be detected through the test tool, wherein the online detection sample set is used to evaluate the detection performance of the device to be detected that is deployed with an intrusion detection model.
  • a format conversion unit which is used to: convert the format of the intrusion detection sample set to obtain an online detection sample set that matches the format of the test tool, and input the online detection sample set into the device to be detected through the test tool, wherein the online detection sample set is used to evaluate the detection performance of the device to be detected that is deployed with an intrusion detection model.
  • the data processing device may further include an adjustment unit, which is used to determine an evaluation value corresponding to the intrusion detection sample set based on the value of the intrusion detection sample set under each preset indicator, and adjust the intrusion detection sample set when the evaluation value is lower than a preset threshold.
  • an adjustment unit which is used to determine an evaluation value corresponding to the intrusion detection sample set based on the value of the intrusion detection sample set under each preset indicator, and adjust the intrusion detection sample set when the evaluation value is lower than a preset threshold.
  • the preset indicators include one or more of the following indicators: data redundancy indicator, attack coverage indicator, protocol coverage indicator, business coverage indicator, data labeling indicator, balance indicator, feature independence indicator, and ease of use indicator.
  • the first type of attack sample may be obtained by attacking any of the following areas: the entire device to be detected; one or more physical areas of the device to be detected; or one or more functional areas of the device to be detected.
  • the present application provides a data processing device, including a processor, the processor is connected to a memory, the memory is used to store a computer program, and the processor is used to execute the computer program stored in the memory, so that the data processing device performs a method as described in any one of the designs of the first aspect above.
  • the present application provides a data processing device, including a processor and a memory, the memory storing computer program instructions, and the processor executing the computer program instructions to implement a method as described in any one of the designs of the first aspect above.
  • the present application provides a data processing device, including a processor, a memory and a transceiver, the memory stores computer program instructions, and the processor runs the computer program instructions to call the transceiver to implement a method as described in any one of the designs in the first aspect above.
  • the present application provides a chip, which may include a processor and an interface, wherein the processor is used to read instructions through the interface to execute a method as described in any one of the designs in the first aspect above.
  • the present application provides a data processing system, which may include a device to be detected and a data processing device, and the data processing device is used to execute the method described in any one of the designs in the first aspect above.
  • the present application provides a computer-readable storage medium storing a computer program.
  • the computer program When the computer program is executed, the method described in any one of the above-mentioned first aspects is implemented.
  • the present application provides a computer program product, which, when executed on a processor, implements a method as described in any one of the above-mentioned first aspects.
  • FIG1 exemplarily shows a possible system architecture diagram provided by an embodiment of the present application
  • FIG2 exemplarily shows a schematic diagram of a partition architecture of a vehicle provided in an embodiment of the present application
  • FIG3 exemplarily shows a flow chart of a data processing method provided in an embodiment of the present application
  • FIG4 exemplarily shows a schematic diagram of a process for obtaining a first type of attack sample provided in an embodiment of the present application
  • FIG5 exemplarily shows a schematic diagram of an application scenario of an intrusion detection sample set provided in an embodiment of the present application
  • FIG6 exemplarily shows a flow chart of evaluating an intrusion detection sample set provided by an embodiment of the present application
  • FIG. 7 exemplarily illustrates a design architecture diagram of a development data processing solution provided in an embodiment of the present application
  • FIG8 exemplarily shows a schematic structural diagram of a data processing device provided in an embodiment of the present application.
  • FIG9 exemplarily shows a schematic structural diagram of another data processing device provided in an embodiment of the present application.
  • the data processing method disclosed in the present application can be used to construct an intrusion detection sample set, which can be used to train an intrusion detection model, and can also be used to detect the detection effect of the intrusion detection model or the device to be detected deployed with the intrusion detection model.
  • the device to be detected can be any terminal device with communication capabilities, and in particular, it can be a terminal device that has certain requirements for network security.
  • the terminal device may include but is not limited to: intelligent transportation equipment, such as cars, ships, drones, trains, vans, trucks, flying cars, etc.; smart home devices, such as TVs, sweeping robots, smart desk lamps, audio systems, smart lighting systems, electrical control systems, home background music, home theater systems, intercom systems, video surveillance, etc.; intelligent manufacturing equipment, Such as robots, industrial equipment, industrial computers, smart logistics, smart factories, etc.
  • the terminal device can also be a computer device, such as a desktop, personal computer, server, etc.
  • the terminal device can also be a portable electronic device, such as a mobile phone, tablet computer, PDA, headset, speaker, wearable device (such as smart watch), vehicle-mounted device, virtual reality device, augmented reality device, etc.
  • portable electronic devices include but are not limited to devices equipped with Or a portable electronic device with other operating systems.
  • the portable electronic device may also be a laptop computer (Laptop) with a touch-sensitive surface (eg, a touch panel).
  • system and “network” in the embodiments of the present application can be used interchangeably.
  • “At least one” means one or more, and “plurality” means two or more.
  • “And/or” describes the association relationship of associated objects, indicating that three relationships may exist.
  • a and/or B can represent: A exists alone, A and B exist at the same time, and B exists alone, where A and B can be singular or plural.
  • the character “/” generally indicates that the associated objects before and after are in an “or” relationship.
  • At least one of the following” or similar expressions refers to any combination of these items, including any combination of single items or plural items.
  • At least one of a, b, or c can represent: a, b, c, a-b, a-c, b-c, or a-b-c, where a, b, c can be single or multiple.
  • ordinal numbers such as “first” and “second” mentioned in the embodiments of the present application are used to distinguish multiple objects, and are not used to limit the priority or importance of multiple objects.
  • first type of attack sample and the second type of attack sample are only used to distinguish different types of attack samples, and do not indicate the difference in priority or importance of the two types of attack samples.
  • connection can be understood as electrical connection, and the connection between two electrical components can be a direct or indirect connection between the two electrical components.
  • a and B are connected, which can be either A and B directly connected, or A and B indirectly connected through one or more other electrical components, such as A and B are connected, or A and C are directly connected, C and B are directly connected, and A and B are connected through C.
  • connection can also be understood as coupling, such as electromagnetic coupling between two inductors. In short, the connection between A and B enables the transmission of electrical energy between A and B.
  • FIG. 1 exemplarily shows a possible system architecture diagram provided by an embodiment of the present application.
  • the system architecture includes a device to be detected 100 and a data processing device 200.
  • the device to be detected 100 may be a device with certain requirements for network security, such as a vehicle.
  • the data processing device 200 may be any device with data processing capabilities, such as a server or a server cluster composed of multiple servers, or may also be a chip or circuit, such as a chip or circuit arranged in a server or a server cluster.
  • the data processing device 200 may be a cloud server, and the cloud server may be connected to the device to be detected 100 by wireless.
  • a data processing device 200 may be connected to only one device to be detected 100 as shown in FIG. 1, or may be connected to multiple devices to be detected 100 at the same time.
  • the data processing device 200 in the embodiment of the present application may integrate all functions on an independent physical device, or may distribute the functions on multiple independent physical devices, which is not specifically limited in the embodiment of the present application.
  • the system architecture may also include one or more of a database 300, a model training device 400, and a test tool 500, or may also include other devices, such as routing devices, wireless relay devices, wireless backhaul devices, and operation management and maintenance devices.
  • the database 300 may The database 300 is used to store the intrusion detection sample set constructed by the data processing device 200.
  • the database 300 can be a storage unit independent of the data processing device 200, such as a database server, or an internal storage unit of the data processing device 200, such as a cache memory, a random access memory, a register, a main memory or a read-only memory.
  • the model training device 400 can be any device with algorithm development and verification capabilities, such as a model training server.
  • the IDS is deployed in the model training device 400.
  • the IDS is a test system released by the automotive open system architecture (AUTOSAR) for automotive network security. It is one of the most mainstream vehicle-mounted test systems at present, and can train a high-accuracy, high-timeliness and high-robustness intrusion detection model based on limited vehicle-mounted hardware resources and vehicle-mounted storage resources in combination with rules and machine learning.
  • the test tool 500 refers to a tool that can inject test traffic into the device to be detected 100, such as a CAN test tool, a TCP test tool or a UDP test tool.
  • the test tool 500 can present a user interface to the outside. By clicking on the corresponding test command (such as traffic injection type, traffic injection quantity and traffic injection frequency, etc.) on the user interface, the user can drive the test tool 500 to automatically inject test traffic into the device to be tested 100 according to the corresponding test command.
  • the corresponding test command such as traffic injection type, traffic injection quantity and traffic injection frequency, etc.
  • the data processing device 200 when including the device to be detected 100, the data processing device 200, the database 300, the model training device 400 and the test tool 500, can be connected to the device to be detected 100, the database 300, the model training device 400 and the test tool 500 respectively, and the device to be detected 100 can also be connected to the model training device 400 and the test tool 500.
  • the data processing device 200 can construct an intrusion detection sample set for the device to be detected 100, and store the intrusion detection sample set in the database 300.
  • the data processing device 200 can convert the intrusion detection sample set in the database 300 into an offline detection sample set, and can select a part of the offline detection samples to send to the model training device 400, and after the model training device 400 uses the part of the offline detection samples to train the intrusion detection model, the intrusion detection model is deployed in the device to be detected 100, so that the device to be detected 100 can use the intrusion detection model to identify attack traffic.
  • the data processing device 200 can also send another part of the converted offline detection samples to the model training device 400, and the model training device 400 uses the previously trained intrusion detection model to detect the part of the offline detection samples to obtain offline evaluation information, and the data processing device 200 evaluates the detection effect of the intrusion detection model trained by the model training device 400 according to the offline evaluation information.
  • the data processing device 200 can convert the intrusion detection sample set in the database 300 into an online detection sample set, and then input part or all of the online detection samples into the device to be detected 100 through an online tool, and obtain the online evaluation information generated by the device to be detected 100 for the online detection samples, and evaluate the detection effect of the device to be detected 100 deployed with the intrusion detection model according to the online evaluation information.
  • the intrusion detection sample set will not only be used to train the intrusion detection model, but also be used as a test sample to test the quality of the intrusion detection model and the quality of the equipment to be detected with the intrusion detection model deployed.
  • the sample types in the intrusion detection sample set are sufficient, the detection effect of the intrusion detection model trained based on sufficient intrusion detection samples will be better, and then the anti-attack performance of the equipment to be detected with the intrusion detection model deployed will also be better.
  • the reliability of the evaluation results obtained by evaluating the intrusion detection model or the equipment to be detected based on sufficient intrusion detection samples will also be better. Therefore, how to construct an intrusion detection sample set with sufficient sample types is crucial to improving the detection effect of the intrusion detection model, improving the anti-attack effect of the equipment to be detected, improving the reliability of the intrusion detection model, and improving the reliability of the equipment to be detected.
  • intrusion detection sample sets when constructing intrusion detection sample sets, the industry only uses attack tools to simulate some typical intrusion scenarios to attack the detection equipment, resulting in the intrusion detection sample sets only containing typical intrusion scenarios.
  • the intrusion detection samples corresponding to the scene have very limited sample types in the intrusion detection sample set, which is not conducive to improving the detection effect of the intrusion detection model, improving the anti-attack effect of the equipment to be detected, improving the reliability of the intrusion detection model, and improving the reliability of the equipment to be detected, which is also not conducive to the implementation of intrusion detection technology on the equipment to be detected.
  • an embodiment of the present application provides a data processing method for constructing an intrusion detection sample set with richer sample types, so that it can cover both intrusion detection samples in known intrusion scenarios and intrusion detection samples in unknown intrusion scenarios, so as to improve the detection effect of the intrusion detection model trained using the intrusion detection sample set, improve the anti-attack effect of the device to be detected deployed with the intrusion detection model, improve the reliability of evaluating the intrusion detection model using the intrusion detection sample set, and improve the reliability of evaluating the device to be detected using the intrusion detection sample set.
  • FIG2 shows a schematic diagram of a partition architecture of a vehicle provided in the embodiment of the present application, wherein:
  • FIG2 (A) shows a physical partition architecture diagram of a vehicle, which divides the entire vehicle into a vehicle-mounted trunk area, a right front body area, a left front body area, a right rear body area, and a left rear body area according to different physical areas.
  • the right front body area, the left front body area, the right rear body area, and the right rear body area exist as branch areas of the vehicle-mounted trunk area.
  • a vehicle control unit (VCU) is deployed in the vehicle-mounted trunk area, and each branch area is deployed with its own control node (i.e., Z 1 , Z 2 , Z 3 , Z 4 ) and several ECUs connected to the control node. All ECUs in each branch area are connected to the control node via CAN (FD), and the control node in each branch area is connected to the VCU in the vehicle-mounted trunk area via ETH.
  • VCU vehicle control unit
  • FIG2 (B) shows a functional partition architecture diagram of a vehicle.
  • the architecture divides the entire vehicle into a vehicle-mounted central control gateway functional area and K other functional areas according to different functions to be implemented.
  • the K other functional areas exist as branch areas of the vehicle-mounted central control gateway functional area, each of which implements different functions.
  • the K other functional areas may include one or more of the body control domain, cockpit control domain, power control domain, chassis control domain or infotainment domain, etc., and K is a positive integer.
  • a GateWay gateway is deployed in the vehicle-mounted central control gateway functional area, and each other functional area is deployed with its own domain controller (i.e., D 1 , D 2 , ..., D 4K ) and several ECUs connected to the domain controller. All ECUs in each other functional area are connected to the domain controller via CAN (FD), and the domain controller in each other functional area is connected to the GateWay in the vehicle-mounted central control gateway functional area via ETH.
  • D 1 , D 2 , ..., D 4K domain controller
  • All ECUs in each other functional area are connected to the domain controller via CAN (FD), and the domain controller in each other functional area is connected to the GateWay in the vehicle-mounted central control gateway functional area via ETH.
  • an intrusion detection sample set can be constructed for the entire vehicle, an intrusion detection sample set can be constructed for one or more physical areas in the vehicle, an intrusion detection sample set can be constructed for one or more functional areas in the vehicle, or an intrusion detection sample set can be constructed for a combination of one or more physical areas and one or more functional areas, etc.
  • the physical area can be the vehicle trunk area, the right front body area, the left front body area, the right rear body area or the left rear body area
  • the functional area can be the vehicle central control gateway functional area, or other functional areas, without specific limitation.
  • the specific areas for which intrusion detection sample sets are constructed can be set according to the processing capacity of the data processing device and the actual needs of the user. For example, when the processing capacity of the data processing device is strong, you can choose to construct an intrusion detection sample set for the entire vehicle, so as to use the efficient processing capacity to construct a relatively complete global intrusion detection sample set. Intrusion detection sample set. Among them, the global intrusion detection sample set can be applicable to global attack scenarios, and can also be applicable to local attack scenarios, and has good versatility. On the contrary, when the processing capacity of the data processing device is not strong, these areas can be selected specifically according to the areas involved in the actual business to construct local intrusion detection sample sets.
  • the local intrusion detection sample set is a subset of the global intrusion detection sample set. Although it can only be applied to local attack scenarios, the scale of the area it needs to process becomes smaller, so it can better reduce the difficulty of constructing the intrusion detection sample set and reduce the complexity of the intrusion detection sample set.
  • the data processing device can be the data processing device 200 shown in FIG1, or it can be other communication nodes, communication devices or communication systems that can support the data processing device to implement the required functions, such as chips, chip systems, circuits or circuit systems, without specific limitation.
  • FIG3 exemplarily shows a flow chart of a data processing method provided in an embodiment of the present application, and the data processing method can be executed by a data processing device, such as the data processing device 200 shown in FIG1 .
  • the method includes:
  • Step 301 The data processing device obtains a first type of attack sample by attacking a device to be detected.
  • the data processing device can initiate an attack behavior for the entire device to be detected to obtain the first type of attack sample corresponding to the entire device to be detected.
  • the data processing device can only initiate an attack behavior for the one or more regions to obtain the first type of attack sample corresponding to the one or more regions.
  • the device to be detected is a vehicle, which may be a vehicle under a distributed electrical and electronic architecture, a domain centralized electrical and electronic architecture, a vehicle centralized electrical and electronic architecture, or any vehicle-mounted architecture that may appear in the future.
  • the data processing device can construct an intrusion detection sample set corresponding to each vehicle-mounted architecture by attacking vehicles belonging to each vehicle-mounted architecture, thereby facilitating users to select one or more intrusion detection sample sets corresponding to the vehicle-mounted architectures for use according to actual needs, thereby improving the user experience.
  • the data processing device can also attack multiple vehicles belonging to the vehicle-mounted architecture together, so that the first type of attack sample can cover multiple vehicles under the vehicle-mounted architecture, avoiding the problem of inaccurate sample collection due to problems with the vehicle when only one vehicle is attacked.
  • the data processing device when attacking multiple vehicles belonging to a vehicle-mounted architecture, after obtaining a large number of first-type attack samples corresponding to the multiple vehicles, the data processing device can also filter these first-type attack samples by means of clustering or model recognition, for example, filtering out first-type attack samples that are significantly different from other first-type attack samples, and only retaining relatively similar first-type attack samples. In this way, by cleaning up first-type attack samples that are obviously problematic in advance, it is possible to avoid subsequent meaningless processing of first-type attack samples that are obviously problematic, and effectively save the computing resources of the data processing device.
  • the first type of attack samples may include real attack samples and simulated attack samples.
  • the real attack samples are obtained by manually attacking the device to be detected, for example, they can be collected during the penetration test process.
  • the penetration test process refers to the attacker standing from the perspective of a hacker to actually attack the vehicle for the purpose of breaking through the device to be detected.
  • This attack method can construct highly concealed attack behaviors and attack behaviors that rely on business logic, and can obtain attack samples that are easy to mark in a real attack environment.
  • the simulated attack samples are obtained by attacking the device to be detected with an attack tool.
  • the devices can be collected from the device to be detected after simulating the generation of attack traffic using an attack tool and automatically injecting the attack traffic into the device to be detected.
  • This attack method can obtain data that is not easy to mark. Attack samples marked in a real attack environment.
  • the human attack method and the attack tool attack method to comprehensively construct the first type of attack samples, the first type of attack samples can fully cover the attack samples of various attack types that may exist in the real attack environment, thereby improving the comprehensiveness of the first type of attack samples.
  • the first type of attack sample can be obtained by attacking the device to be detected according to various known attack types in the intrusion scenarios known at this stage.
  • the various known attack types include attack types that are easy to collect and mark in a real attack environment (referred to as the first attack type) and attack types that are not easy to collect and mark in a real attack environment (referred to as the second attack type). Therefore, by using artificial methods to actually attack the device to be detected according to the first attack type, the above-mentioned real attack samples can be collected, and by using attack tools to simulate the attack on the device to be detected according to the second attack type, the above-mentioned simulated attack samples can be collected in a centralized manner.
  • each ECU in the vehicle communicates through a CAN (FD) bus and an ETH bus
  • Table 1 shows a corresponding relationship table of possible attack types in the device to be detected and the attack type corresponding to each attack sample.
  • the first attack type may include one or more of the following attack types:
  • ID non-existence attack refers to attacking the vehicle by changing the ID in the CAN message to a non-existent ID
  • Replay attack is to deceive the vehicle by sending CAN messages that the vehicle has received before;
  • Tampering attack refers to attacking the vehicle by tampering with the data carried in the CAN message
  • Data length error attack refers to attacking the vehicle by modifying the data length of the CAN message
  • Signal out-of-range attack means attacking the vehicle by modifying the signal value in the CAN message to a value greater than the maximum specified value or less than the minimum specified value;
  • Context error attack refers to attacking the vehicle by publishing a specific message or signal that is not suitable for a certain state on the CAN network, such as sending an acceleration signal when braking;
  • ID source non-specified ECU attack means attacking the vehicle by publishing CAN messages that should be published by the designated ECU through the non-specified ECU;
  • the same ID attack occurs when the same ID is carried in the CAN messages published by two ECUs to attack the vehicle;
  • CAN scanning attack refers to hacking into the vehicle by sending CAN scanning messages
  • UDS performs sensitive operation attacks, which means attacking vehicles by carrying sensitive information in UDS messages;
  • Message authentication error attack refers to modifying the message authentication process of the CAN bus so that the vehicle cannot successfully authenticate the message
  • ECU identity spoofing attack refers to deceiving the vehicle by changing the ECU identity in the ETH message
  • a man-in-the-middle attack is a method of attacking a vehicle by virtually placing a device controlled by an intruder between two ECUs connected via ETH.
  • ECU authentication error attack refers to modifying the ETH ECU authentication process so that the vehicle cannot successfully ECU;
  • Brute force attack refers to attacking the vehicle by deciphering the ETH key in an exhaustive manner
  • Application layer protocol error attack refers to attacking the vehicle by modifying the application layer protocol so that the vehicle cannot obtain the correct application layer protocol for message interaction;
  • Unknown outbound connection attack refers to attacking the vehicle by giving a fake outbound connection
  • Unknown push connection attack refers to attacking the vehicle by giving a false push connection.
  • ID does not exist attack, replay attack, tampering attack, data length error attack, signal out of defined range attack, context error attack, ID source non-specified ECU attack, identical ID attack, CAN scanning attack, UDS execution sensitive operation attack and message authentication error attack belong to the attack types existing in CAN (FD) communication mode
  • ECU identity spoofing attack, man-in-the-middle attack, ECU authentication error attack, brute force attack, application layer protocol error attack, unknown stack connection attack and unknown stack connection attack belong to the attack types existing in ETH communication mode.
  • the second attack type may include one or more of the following attack types:
  • ID fuzzy attack refers to attacking vehicles by fuzzing the ID in the CAN message
  • Data fuzz attack refers to attacking vehicles by fuzzing the data in CAN messages
  • CAN DoS attack refers to attacking the vehicle by stopping the sending and receiving services of a certain part of the CAN bus
  • the DoS attack of ETH refers to attacking the vehicle by stopping the sending and receiving services of a part of the ETH bus;
  • Malformed packet injection attack refers to attacking vehicles by injecting malformed packets
  • a port scan attack is an attempt to break into a vehicle by sending port scan messages.
  • ID fuzzy attack, data fuzz attack and CAN Dos attack belong to the attack types existing in the CAN (FD) communication mode
  • ETH Dos attack, malformed packet injection attack and port scanning attack belong to the attack types existing in the ETH communication mode.
  • the data processing device may attack the device to be detected in a variety of ways to obtain the first type of attack sample.
  • FIG. 4 shows a specific flow diagram of obtaining the first type of attack sample provided in an embodiment of the present application.
  • the flow includes:
  • Step 401 The data processing device obtains multiple preset attack types.
  • the preset multiple attack types may exemplarily include all the attack types shown in Table 1 above, so that the first type attack samples can fully cover various known attack types, thereby improving the richness and comprehensiveness of the first type attack samples.
  • step 402 the data processing device determines whether there is an attack type that has not been traversed among the preset multiple attack types. If so, step 403 is executed; if not, step 409 is executed.
  • Step 403 the data processing device executes an attack behavior corresponding to an attack type that has not been traversed on the device to be detected, and obtains traffic data generated by the device to be detected for the attack behavior.
  • the required attack code can be manually written in advance, and the attack code and the corresponding first attack type can be mapped and stored in the data processing device.
  • the data processing device obtains an attack type that has not been traversed, if it is determined that the attack type belongs to the first attack type, it can directly obtain the attack code corresponding to the first attack type from the local, and automatically generate the corresponding attack behavior according to the attack code and then attack the device to be detected.
  • the data processing device can call the attack tool, and use the attack tool to generate the attack behavior corresponding to the second attack type and then attack the device to be detected.
  • the entire attack process can be automatically implemented by the data processing device, which helps to improve the unified management of the entire attack process, and there is no need to wait for manual on-site programming, which helps to save the delay of sample collection.
  • the data processing device can access the on-board diagnostics (OBD) interface of the device to be detected through a data line, and the OBD interface can be exemplarily an interface of a type II on-board diagnostic system, i.e., an OBD-II interface.
  • OBD on-board diagnostics
  • the data processing device can select an attack type from all attack types in a random manner, in a sequential manner, or in other ways, and then attack the device to be detected according to the attack type, and obtain the flow data generated by the device to be detected for the attack type through the OBD interface.
  • the data processing device can select an attack type from the attack types that have not been traversed to continue attacking the device to be detected, and repeat the above process until all attack types have been traversed.
  • the traffic collection operation of the entire attack process can be automatically implemented by a data processing device.
  • the collection duration can be configured in the data processing device. After the data processing device starts to execute the attack behavior, it can start timing. During the timing time, the data continuously collects the traffic data output by the device to be detected from the OBD interface of the device to be detected until the configured collection duration is reached, and the collection is terminated.
  • the collection duration can be several hours, several days, several weeks or even several months, and can be specifically configured by those skilled in the art according to actual needs. For example, when the number of preset attack types is large, the time required to attack the device to be detected is longer, and the collection time can be configured to be longer, such as several weeks. Conversely, when the number of preset attack types is small, the time required to attack the device to be detected is shorter, and the collection time can be configured to be shorter, such as several days.
  • step 404 the data processing device determines whether the traffic data is attack traffic. If so, step 405 is executed; if not, step 406 is executed.
  • step 405 the data processing device marks the traffic data as a first type attack sample, and then executes step 402 .
  • the data processing device can mark the attack traffic as a real attack sample; conversely, if the traffic data is collected under the second attack type, the data processing device can mark the attack traffic as a simulated attack sample.
  • Step 406 the data processing device determines whether the flow data is normal flow, if so, executes step 407 , if not, executes step 408 .
  • step 407 the data processing device marks the traffic data as a non-attack sample, and then executes step 402 .
  • non-attack samples are also called context data or context samples.
  • step 408 the data processing device determines that the traffic data is an unlabeled sample, and then executes step 402 .
  • the data processing device may not mark the traffic data, but treat it as an unmarked sample.
  • the unmarked sample is an abnormal sample.
  • the attack test of the data processing device causes the hardware and software system failure of the device to be tested, the device to be tested itself may generate some abnormal data. These abnormal data do not meet the characteristics of normal traffic or attack traffic, but will still be collected by the data processing device.
  • Step 409 the data processing device ends the attack process.
  • the data processing device can obtain attack samples, such as real attack samples and simulated attack samples, as well as non-attack samples and even unlabeled samples by attacking the device to be detected.
  • This acquisition method can obtain multiple types of samples, which is convenient for improving the sample richness of the subsequent construction of the intrusion detection sample set.
  • FIG. 4 is only an exemplary introduction to a possible way to obtain the first type of attack sample, and the embodiments of the present application are not limited to only using this method to obtain the first type of attack sample.
  • the data processing device can also combine at least two of the preset multiple attack types, execute the attack behaviors corresponding to at least two attack types on the device to be detected at one time, and then obtain the flow data generated by the device to be detected, separate the flow data corresponding to each of the at least two attack types from the flow data, and obtain the first type of attack sample based on the at least two flow data. It should be understood that there are many possible acquisition methods, which will not be listed one by one here.
  • Step 302 The data processing device obtains a second type of attack sample by applying noise to the first type of attack sample.
  • the data processing device may convert the format of the first type of attack sample to obtain the first type of attack sample in text format or binary format, and store the first type of attack sample in text format or binary format in the original database.
  • the data processing device may traverse all first-type attack samples in the original database, and when traversing each first-type attack sample: applying noise to the first-type attack sample to obtain a perturbation sample, then inputting the perturbation sample into the attack recognition model, and obtaining a recognition result output by the attack recognition model; when the recognition result indicates that it is impossible to determine whether the perturbation sample is an attack sample, the perturbation sample is determined to be a second-type attack sample; otherwise, the perturbation sample is adjusted according to the recognition result, and the adjusted perturbation sample is input into the attack recognition model again, and the above process is repeated continuously. The process is repeated until the identification result corresponding to the adjusted disturbance sample indicates that it is impossible to determine whether the adjusted disturbance sample is an attack sample, and then the adjusted disturbance sample is determined as a second type attack sample.
  • the attack recognition model may be any model with recognition capability, and specifically, may be a neural network model trained by an artificial intelligence (AI) algorithm, such as a generative adversarial network (GAN) algorithm.
  • AI artificial intelligence
  • GAN generative adversarial network
  • the GAN algorithm may learn the features of some known attack samples and non-attack samples, and construct an attack recognition model based on the learned features, so that the attack recognition model can identify the probability that the input sample belongs to the attack sample.
  • the attack recognition model may include a generator and a discriminator, and any first-type attack sample input to the attack recognition model is first received by the generator.
  • the generator may generate a perturbation sample by applying an N-dimensional noise vector corresponding to the N-dimensional feature space to the first-type attack sample, and then input the perturbation sample into the discriminator. After the discriminator identifies the probability that the perturbation sample belongs to the attack sample, if the probability is greater than 50%, the generator is notified to adjust the N-dimensional noise vector in the first direction, and if the probability is less than 50%, the generator is notified to adjust the N-dimensional noise vector in the second direction.
  • the generator uses the adjusted N-dimensional noise vector to re-scramble the first type of attack sample to generate a new perturbation sample, and sends it to the discriminator, which re-identifies the probability that the new perturbation sample belongs to the attack sample. If the probability is 50%, the new perturbation sample can be used as a second type of attack sample. Otherwise, the above process is repeated until a perturbation sample with a probability of 50% being an attack sample is obtained.
  • the attack identification model can generate samples whose sample types cannot be identified.
  • the sample is obtained by scrambling the attack sample (i.e., the real attack sample and the simulated attack sample) obtained by the real attack on the device to be detected, it is highly likely to be an attack sample. Therefore, the sample can be considered as an attack sample whose attack type cannot be determined under the current technical means, and the attack sample is easily misreported in the real identification process.
  • the data processing device can treat the sample as a second type of attack sample and subsequently store it in the intrusion detection sample set, so as to avoid the phenomenon of misreporting these attack samples of unknown attack types as non-attack samples in the real attack test scenario.
  • the second type of attack sample is generated by the confrontation between the generator and the discriminator in the AI algorithm
  • the second type of sample can also be called an AI adversarial sample, or can also have other names, which is not specifically limited in the embodiments of the present application.
  • Step 303 The data processing device constructs an intrusion detection sample set according to the first type attack samples and the second type attack samples.
  • the data processing device obtains the first type of attack samples (including real attack samples and simulated attack samples) and non-attack samples according to the above step 301, and obtains the second type of attack samples according to the above step 302, these samples can be uniformly saved in .pcap format, and then the intrusion detection sample set is constructed based on all samples in .pcap format.
  • the .pcap format is a format that is connected to the existing IDS system. If it is connected to other systems, the data processing device can also save these samples in the format that other systems are connected to. The embodiment of the present application does not specifically limit this.
  • the data processing device can directly discard them to save the amount of data in the intrusion detection sample set, or after traversing all attack types, construct an intrusion detection sample set based on attack samples, unlabeled samples and non-attack samples, so as to add all samples that actually exist when attacking the device to be detected to the intrusion detection sample set, thereby improving the richness of samples in the intrusion detection sample set, and facilitating the subsequent marking of unlabeled samples through other analyses or performing other operations, or in some special cases, they can also be marked as abnormal samples and added to the intrusion detection sample set, without specific limitation.
  • the samples with a larger proportion can be trimmed so that the proportion of the trimmed attack samples and non-attack samples are the same, such as attack samples and non-attack samples each accounting for 50% of all samples.
  • samples with a larger proportion are usually non-attack samples, and may also be attack samples in some special cases.
  • the intrusion detection sample set can cover not only the first type of attack samples in known intrusion scenarios, but also the second type of attack samples in unknown intrusion scenarios, and non-attack samples.
  • the anti-attack effect of the device to be detected can be more accurately defined according to whether the device to be detected can intercept the attack samples and whether it can not intercept the non-attack samples.
  • the above content introduces the specific construction process of the intrusion detection sample set.
  • the following is a detailed introduction to the application of the constructed intrusion detection sample set.
  • FIG5 exemplarily shows a schematic diagram of an application scenario of an intrusion detection sample set provided by an embodiment of the present application.
  • the intrusion detection sample set can be applied to one or more scenarios in a model training scenario, an offline evaluation scenario, or an online evaluation scenario.
  • the solid line in the figure shows the application process of the model training scenario
  • the dotted line in the figure shows the application process of the offline evaluation scenario
  • the double-node line in the figure shows the application process of the online evaluation scenario.
  • model training refers to the use of an intrusion detection sample set to train an intrusion detection model in the early algorithm development and verification. Since the training samples required for the intrusion detection model have their own feature format, and the intrusion detection samples in the intrusion detection sample set may not match the feature format, before training the intrusion detection model, the data processing device may also first extract features from the intrusion detection sample set, and construct an offline detection sample set based on the extracted features. In this way, when it is necessary to train the intrusion detection model, the data processing device can directly select some offline detection samples from the offline detection sample set as training set data, input them to the model training device, and the model training device uses the training set data to train the intrusion detection model.
  • the training set data may include offline detection samples of attack types and offline detection samples of normal types
  • the offline detection samples of attack types are obtained by extracting features from the first type of attack samples and/or the second type of attack samples in the intrusion detection sample set
  • the offline detection samples of normal types are obtained by extracting features from non-attack samples in the intrusion detection sample set
  • the offline detection samples of attack types and normal types can also maintain consistency in quantity, so that a better intrusion detection model can be obtained based on balanced data training.
  • each offline detection sample in the offline detection sample set can be saved in a text format, such as .csv.
  • the .csv format is a format that is connected to the model training device in the existing IDS system. If it is connected to other types of model training devices, it can be saved in a format adapted to other systems, without specific limitation.
  • the data processing device when performing feature extraction, can analyze each intrusion detection sample in isolation, that is, extract features from each intrusion detection sample to obtain an offline detection sample, or can combine multiple intrusion detection samples for centralized analysis, such as extracting features from multiple intrusion detection samples with associated relationships to obtain an offline detection sample.
  • multiple intrusion detection samples with associated relationships can be, for example, multiple intrusion detection samples belonging to the same connection, multiple intrusion detection samples whose traffic data comes from the same ECU, or multiple intrusion detection samples whose traffic data is different from each other. Multiple intrusion detection samples sent to the same ECU, etc.
  • the message types in the vehicle may generally include TCP messages, CAN (FD) messages, and UDP messages, wherein TCP messages are messages transmitted on the connection after a connection is established between at least two ECUs, while CAN (FD) messages and UDP messages are messages sent on the corresponding bus by broadcasting and then obtained by the required ECU node from the bus, and the contents of these three messages include both the source Internet Protocol (IP) address and the destination IP address.
  • TCP messages have the concept of connection
  • CAN (FD) messages and UDP messages do not have the concept of connection.
  • the data processing device can first determine the message type of each intrusion detection sample in the intrusion detection sample set, and then, for each intrusion detection sample whose message type is a TCP message, according to the source IP address and the destination IP address contained in the message content, obtain all intrusion detection samples belonging to the same TCP connection, and extract features from these intrusion detection samples to obtain an offline detection sample.
  • the data processing device can first determine the message type of each intrusion detection sample in the intrusion detection sample set, and then, for each intrusion detection sample whose message type is a TCP message, according to the source IP address and the destination IP address contained in the message content, obtain all intrusion detection samples belonging to the same TCP connection, and extract features from these intrusion detection samples to obtain an offline detection sample.
  • FD CAN
  • UDP UDP
  • the features extracted by the above feature extraction may include one or more of the following features: timestamp, frequency feature, protocol type, content feature, packet loss rate, number of error packets, connection duration, connection initiator, and connection receiver.
  • timestamp refers to the time when the message is collected, including date and time.
  • the frequency feature refers to the communication frequency of sending and receiving messages between the source IP and the destination IP.
  • the protocol type refers to the protocol adapted by the message, such as any of the protocol types in Table 3 above.
  • the content feature refers to the substantive content carried in the message, such as data or instructions.
  • the packet loss rate refers to the proportion of message loss when sending and receiving messages between the source IP and the destination IP.
  • the number of error packets refers to the number of error messages that occur when sending and receiving messages between the source IP and the destination IP.
  • the connection duration refers to the time interval between the establishment and disconnection of a connection.
  • the connection initiator refers to the ECU that requests to establish a connection.
  • the connection receiver refers to the ECU that receives the request sent by the connection initiator.
  • the data processing device can also extract all the above features for each type of intrusion detection sample. Specifically, for intrusion detection samples belonging to TCP messages, since there is a concept of connection, all the above features can be extracted. For intrusion detection samples belonging to CAN (FND) messages or UDP messages, since there is no concept of connection, only the timestamp, frequency feature, protocol type, content feature, packet loss rate and number of error packets in the above features can be extracted, but the connection duration, connection initiator and connection receiver cannot be extracted. In this case, in order to maintain the format consistency of offline detection samples, the data processing device can also configure the features that cannot be extracted as preset characters, which can be, for example, numbers, letters, symbols, or a combination of one or more of them.
  • model training scenario by providing a variety of extractable features, users can choose one or more features according to the application layer requirements in the actual model training scenario to convert the intrusion detection sample set to the offline detection sample set, so as to adapt to different model training scenarios and improve the versatility of the intrusion detection sample set in the model training field.
  • offline evaluation refers to the use of an intrusion detection sample set to test the detection effect of an intrusion detection model trained in a model training scenario in early algorithm development and verification. Since an offline detection sample set has been extracted in the model training scenario, after the intrusion detection model is trained using part of the offline detection samples in the offline detection sample set, the data processing device can also select another part of the offline detection samples from the offline detection sample set as the test set data, input it to the model training device, and the model training device uses the test set data to test the intrusion detection model to obtain offline evaluation information, and then sends the offline evaluation information to the data processing device, which evaluates the detection effect of the intrusion detection model based on the offline evaluation information.
  • the test set data may also include offline detection samples of attack type and offline detection samples of normal type, and the offline detection samples of attack type and the offline detection samples of normal type may also maintain consistency in number.
  • the model training device obtains the detection result of the intrusion detection model for each offline detection sample
  • the total number of correctly identified offline detection samples is obtained by combining the number of offline detection samples of attack type identified as attack samples and the number of offline detection samples of normal type identified as non-attack samples
  • the total number of incorrectly identified offline detection samples is obtained by combining the number of offline detection samples of attack type identified as non-attack samples and the number of offline detection samples of normal type identified as attack samples
  • the total number of correctly identified offline detection samples and the total number of incorrectly identified offline detection samples are carried in the offline evaluation information and sent to the data processing device.
  • the data processing device can determine that the detection effect of the intrusion detection model is better, otherwise, the detection effect is
  • model training device can also directly send the detection result of each offline detection sample as offline evaluation information to the data processing device, and the data processing device will automatically count the total number of offline detection samples that are correctly identified and the total number of offline detection samples that are incorrectly identified to complete the offline evaluation.
  • the model training device can also directly send the detection result of each offline detection sample as offline evaluation information to the data processing device, and the data processing device will automatically count the total number of offline detection samples that are correctly identified and the total number of offline detection samples that are incorrectly identified to complete the offline evaluation.
  • implementation methods which will not be listed here one by one.
  • the intrusion detection sample set can support the evaluation of the detection effect of the intrusion detection model in an offline state. In this way, it is convenient for the model training device to continuously optimize the intrusion detection model according to the detection effect, obtain an intrusion detection model with better detection effect, and provide a basis for the implementation of the intrusion detection model on the device to be detected.
  • online evaluation refers to the use of an intrusion detection sample set to test the detection effect of the device to be detected that is deployed with the intrusion detection model in the later algorithm deployment.
  • the intrusion detection model can have a good detection effect, but the detection effect is only measured on the basis of being separated from the device to be detected, and cannot represent the actual effect after being actually applied to the device to be detected.
  • the intrusion detection model on the device to be detected, and after a real attack on the device to be detected that is deployed with the intrusion detection model, determine the actual detection effect of applying the intrusion detection model in the device to be detected based on the response of the device to be detected.
  • the data processing device can perform format conversion on the intrusion detection samples in the intrusion detection sample set to obtain online detection samples that match the format of the test tool, and then input the online detection samples into the device to be detected through the test tool, and obtain the online evaluation information generated by the intrusion detection model deployed in the device to be detected for the online detection samples, and evaluate the detection performance of the device to be detected deployed with the intrusion detection model according to the online evaluation information.
  • the format conversion can also be performed on a connection basis.
  • these intrusion detection samples are first aggregated to obtain a preliminary online detection sample, and then the format of the preliminary online detection sample is converted to a format that is adapted to the test tool in the current scenario to obtain an online detection sample.
  • the format of each intrusion detection sample can be directly converted to a format that is adapted to the test tool in the current scenario. Convert it into a format adapted by the test tool in the current scenario to obtain a corresponding online detection sample.
  • the test tool can inject different online test samples into the vehicle in real time through the OBD interface of the vehicle. For each online test sample, if the vehicle identifies it as an attack sample, it can alarm the security operations center (SOC) in the cloud through the VCU. Then, after all the online test samples are injected, the SOC combines the number of all online test samples and the alarm record information of the vehicle during this period to determine the total number of online test samples that are correctly identified and the total number of online test samples that are incorrectly identified, and then generates online evaluation information based on these two total numbers and sends it to the data processing device.
  • SOC security operations center
  • the data processing device can determine that the detection effect of the device to be tested with the intrusion detection model deployed is better, otherwise, the detection effect is worse.
  • the test tools in the embodiments of the present application may specifically be CANoe, PCAN, Technica or other tools that can implement online testing.
  • the online detection sample after format conversion may be a format corresponding to .PCAP, .ASC, .BLF or other tools that can implement online testing.
  • the intrusion detection sample set can support the online evaluation of the detection effect of the device to be tested that is deployed with the intrusion detection model. In this way, it is convenient for users to determine the anti-attack performance of the device to be tested based on the detection effect, ensuring that only the devices to be tested with good anti-attack effects are shipped out of the factory.
  • the data processing device constructs and applies the intrusion detection sample set.
  • the data processing device can also evaluate the quality of the constructed intrusion detection sample set. The following exemplarily introduces a specific evaluation process.
  • FIG. 6 is a schematic diagram of a process of evaluating an intrusion detection sample set provided in an embodiment of the present application.
  • the process includes:
  • Step 601 A data processing device obtains an intrusion detection sample set.
  • Step 602 The data processing device calculates the value of the intrusion detection sample set under each preset indicator, and calculates the evaluation value corresponding to the intrusion detection sample set according to the value of the intrusion detection sample set under each preset indicator.
  • each preset indicator can be set according to the characteristics of the system architecture to which the device to be tested belongs, and can exemplarily include quantitative indicators and qualitative indicators.
  • Quantitative indicators refer to evaluation indicators that can be defined by accurate quantities
  • qualitative indicators refer to evaluation indicators that cannot be directly quantified and need to be quantified through other means.
  • Table 2 shows a schematic table of possible preset indicators provided in an embodiment of the present application:
  • each preset indicator may include one or more of a data redundancy indicator, an attack coverage indicator, a protocol coverage indicator, a service coverage indicator, a balance indicator, a feature independence indicator, and an ease of use indicator.
  • the data redundancy indicator, the attack coverage indicator, the protocol coverage indicator, the service coverage indicator, and the balance indicator are quantitative indicators
  • the feature independence indicator and the ease of use indicator are qualitative indicators.
  • the value range of each preset indicator can also be consistent, for example, all are set to [0,1], that is, the value of each preset indicator can be any real number between 0 and 1, including 0 and 1.
  • the data redundancy index is used to indicate the non-redundancy degree of the intrusion detection sample set, and can be expressed as the ratio of the number of non-redundant intrusion detection samples in the intrusion detection sample set to the number of all intrusion detection samples. For example, when there are 100 intrusion detection samples in the intrusion detection sample set, if 5 of the intrusion detection samples correspond to the same ID, and the remaining 95 intrusion detection samples correspond to different IDs, then the value of the intrusion detection sample set under the data redundancy index is 95/100.
  • the data redundancy index can also be expressed in other forms, as long as it can ensure that it is negatively correlated with the ratio of the number of redundant intrusion detection samples to the number of all intrusion detection samples. In this way, when all intrusion detection samples are different, the value of the data redundancy index is the largest, the number of valid samples in the intrusion detection sample set is the largest, and the adequacy of the samples is the best. As the number of identical intrusion detection samples increases, the value of the data redundancy index gradually decreases, the number of valid samples in the intrusion detection sample set gradually decreases, and the adequacy of the samples gradually deteriorates. Until all intrusion detection samples are the same, the value of the data redundancy index is the smallest, the number of valid samples in the intrusion detection sample set is the smallest, and the adequacy of the samples is the worst.
  • the attack coverage index is used to indicate the coverage of the attack types included in the intrusion detection sample set, and can be expressed as the ratio of the number of attack types covered by the intrusion detection sample set to the number of all attack types that may exist in the device to be detected. For example, assuming that all the possible attack types in the device to be detected are the 25 attack types shown in Table 1, and the first type of attack sample in the intrusion detection sample set is obtained by attacking the device to be detected according to 10 of the attack types, then the value of the intrusion detection sample set under the attack coverage index is 10/25.
  • the attack coverage index can also be expressed in other forms, as long as it can ensure a positive correlation with the ratio of the number of covered attack types to the number of all possible attack types.
  • the value of the attack coverage index is the largest, the sample types in the intrusion detection sample set are the largest, and the sample diversity is the best.
  • the value of the attack coverage index gradually decreases, the sample types in the intrusion detection sample set gradually decreases, and the sample diversity gradually deteriorates.
  • the value of the attack coverage index is the smallest, the sample types in the intrusion detection sample set are the smallest, and the sample diversity is the worst.
  • the protocol coverage index is used to indicate the coverage degree of the communication protocols included in the intrusion detection sample set, and can be expressed as the ratio of the number of communication protocols covered by the intrusion detection sample set to the number of all communication protocols that may exist in the device to be detected.
  • Table 3 is a schematic table of communication protocols that may exist in the field of Internet of Vehicles provided in an embodiment of the present application. It should be understood that with the development of Internet of Vehicles technology, new communication protocols may appear in the future, so the communication protocols in Table 3 can also be updated accordingly, and the embodiment of the present application does not specifically limit this.
  • the device to be detected is a vehicle
  • all possible communication protocols in the vehicle are the eight protocols shown in Table 3, and the communication protocols covered by the intrusion detection sample set include CAN (FD), DoCAN, DDS, MQTT and HTTP (S), then the value of the intrusion detection sample set under the protocol coverage index is 5/8.
  • the protocol coverage index can also be expressed in other forms, as long as it can ensure that the ratio of the number of covered communication protocols to the number of all possible communication protocols is positively correlated.
  • the value of the protocol coverage index is the largest, and the samples in the intrusion detection sample set are obtained by attacking all communication protocol messages.
  • the intrusion detection sample set has the most sample sources and the richness of the samples is the best.
  • the value of the protocol coverage index gradually decreases, the source of samples in the intrusion detection sample set gradually decreases, and the richness of samples gradually deteriorates.
  • the value of the protocol coverage index is the smallest, the source of samples in the intrusion detection sample set is the least, and the richness of samples is the worst.
  • the service coverage index is used to indicate the coverage of the services included in the intrusion detection sample set, and can be expressed as the ratio of the number of services included in the intrusion detection sample set to the number of all services that may exist in the device to be detected. For example, assuming that all services that may exist in the device to be detected include remote control, log transmission, over-the-air (OTA) update, diagnostic service, video transmission and network management, and the services included in the intrusion detection sample set are remote control, OTA update and diagnostic service, then the value of the intrusion detection sample set under the service coverage index is 3/6.
  • OTA over-the-air
  • the service coverage index can also be expressed in other forms, as long as it can ensure a positive correlation with the ratio of the number of covered services to the number of all possible services.
  • the value of the service coverage index is the largest, and the service applicability of the sample is the best.
  • the value of the service coverage index gradually decreases, and the service applicability of the intrusion detection sample set gradually deteriorates, until no service is covered, the value of the protocol coverage index is the smallest, and the service applicability of the intrusion detection sample set is the worst.
  • the data labeling index is used to indicate the labeling degree of the intrusion detection samples in the intrusion detection sample set, and can be expressed exemplarily as the ratio of the number of labeled intrusion detection samples in the intrusion detection sample set to the number of all intrusion detection samples.
  • the labeled intrusion detection samples may include the real attack samples, simulated attack samples and non-attack samples obtained in the above step 301 and the second type of attack samples obtained in the above step 302, and the unlabeled intrusion detection samples include the unlabeled samples obtained in the above step 301.
  • the intrusion detection sample set when there are 100 intrusion detection samples in the intrusion detection sample set, if 20 of the intrusion detection samples are first type attack samples, 20 of the intrusion detection samples are second type attack samples, 55 of the intrusion detection samples are non-attack samples, and 5 of the intrusion detection samples are unlabeled. Samples, the number of labeled intrusion detection samples in the intrusion detection sample set is 95, so the value of the intrusion detection sample set under the data labeling indicator is 95/100.
  • the data labeling index can also be expressed in other forms, as long as it can ensure that the ratio of the number of labeled intrusion detection samples to the total number of all intrusion detection samples is positively correlated. In this way, when all intrusion detection samples in the intrusion detection sample set are labeled, it means that all intrusion detection samples have been clearly divided into non-attack samples and attack samples, and do not contain uncertain samples, and the samples in the intrusion detection sample set have the highest clarity.
  • the number of labeled intrusion detection samples decreases, the number of uncertain samples contained in the intrusion detection sample set gradually increases, and the clarity of the samples gradually deteriorates, until all intrusion detection samples are unlabeled, the intrusion detection sample set contains the most uncertain samples, and the clarity of the samples is the worst.
  • the balance index is used to indicate the degree of balance between attack samples and non-attack samples in the intrusion detection sample set, and can be expressed as the ratio of the difference between the total number of all intrusion detection samples in the intrusion detection sample set and the difference between the number of attack samples and non-attack samples to the total number of all intrusion detection samples. For example, when there are 100 intrusion detection samples in the intrusion detection sample set, if 45 of them are attack samples and 55 are non-attack samples, the difference between the number of attack samples and non-attack samples in the intrusion detection sample set is 10, so the value of the balance index of the intrusion detection sample set can be (100-10)/100.
  • the balance index can also be expressed in other forms, as long as it can ensure a negative correlation with the difference in the number of attack samples and non-attack samples.
  • the value of the balance index is the largest, the sample balance in the intrusion detection sample set is the best, and it is easier to take out equal amounts of attack samples and non-attack samples from the intrusion detection sample set for testing later.
  • the sample balance in the intrusion detection sample set gradually deteriorates, and it becomes more difficult to take out equal amounts of attack samples and non-attack samples from the intrusion detection sample set for testing.
  • the sample balance in the intrusion detection sample set is the worst, and it is impossible to take out equal amounts of attack samples and non-attack samples from the intrusion detection sample set for testing.
  • the feature independence index is used to indicate the independence of the features extracted when using the intrusion detection sample set for offline evaluation. The more independent features there are, the larger the value of the feature independence index is, and the fewer independent features there are, the smaller the value of the feature independence index is.
  • whether a feature is independent can be judged by technical personnel in this field based on experience, for example, it can be judged by at least two of engineers, experts or third-party organizations, so as to obtain a more accurate evaluation result by combining the experience of all parties.
  • the usability index is used to indicate the versatility of application scenarios that the intrusion detection sample set is compatible with.
  • the application scenario can be, for example, the format of the test tools that can be supported when the intrusion detection sample set is used for online evaluation.
  • the application scenarios that the intrusion detection sample set is compatible with can also be judged by technical personnel in this field based on experience, for example, by at least two of engineers, experts or third-party organizations, so as to obtain a more accurate evaluation result by combining the experience of all parties.
  • the intrusion detection sample set is calculated in each After calculating the values under the preset indicators, the data processing device can also perform weighted averaging on the values of the intrusion detection sample set under each preset indicator according to the weights corresponding to each preset indicator, and use the calculated weighted average as the evaluation value of the intrusion detection sample set.
  • the weights corresponding to each preset indicator can be the same or different.
  • each quantitative preset indicator can be configured to correspond to a first weight
  • each qualitative preset indicator can correspond to a second weight
  • the first weight is greater than the second weight.
  • step 603 the data processing device determines whether the evaluation value corresponding to the intrusion detection sample set is lower than a preset threshold value. If so, step 604 is executed; if not, step 605 is executed.
  • step 604 the data processing device adjusts the intrusion detection sample set, and then executes step 602 .
  • the data processing device can adjust the intrusion detection sample set according to the value of the intrusion detection sample set under each preset index, so that the value of the adjusted intrusion detection sample set under one or more preset indicators becomes larger, thereby increasing the evaluation value corresponding to the intrusion detection sample set. For example, when the value of the intrusion detection sample set under the balance index is low, the data processing device can increase the value of the intrusion detection sample set under the balance index by cutting a large number of samples so that the attack samples and the non-attack samples are close to each other.
  • the data processing device can reduce the ratio of the number of redundant intrusion detection samples to the number of all intrusion detection samples by deleting redundant intrusion detection samples, so as to increase the value of the intrusion detection sample set under the data redundancy index.
  • Step 605 The data processing device stores the intrusion detection sample set.
  • the data processing device can store the intrusion detection sample set in a database in a text format.
  • the text format can be .cpap or other formats supported by IDS.
  • the intrusion detection sample set can be continuously optimized according to the evaluation results, so that the optimized intrusion detection sample set is more suitable for the system architecture to which the device to be detected belongs. Moreover, by combining quantitative indicators and qualitative indicators to comprehensively judge the construction quality of the intrusion detection sample set, the evaluation results can also be made more comprehensive and more convincing.
  • evaluating the quality of the intrusion detection sample set by preset indicators is only an optional evaluation method. In actual operation, there may be other evaluation methods, such as direct evaluation through human experience, or indirect evaluation through a third-party organization, or evaluation by comparing historical intrusion detection sample sets, etc.
  • evaluation methods such as direct evaluation through human experience, or indirect evaluation through a third-party organization, or evaluation by comparing historical intrusion detection sample sets, etc.
  • the embodiments of the present application do not make specific limitations on this.
  • FIG7 exemplarily illustrates a design architecture diagram for developing the above data processing solution provided by an embodiment of the present application.
  • the design architecture diagram exemplarily can be an interface presented to development testers or operation and maintenance personnel, and the development testers or operation and maintenance personnel write corresponding program codes according to the various functions that need to be implemented in the interface.
  • the design architecture includes a hardware tool layer, a data generation layer, a call interface layer, and an application scenario layer. The content of each layer is described in detail below.
  • the hardware tool layer is responsible for providing hardware interface tools, which are used to access the device to be detected and provide support for collecting the flow data of the device to be detected.
  • the hardware tool layer can mainly include a signal-oriented CAN (FD) tool and a service-oriented ETH tool. These two tools can be connected to the OBD interface of the vehicle to support the data acquisition module to collect the flow data generated by the vehicle for the attack behavior.
  • FD signal-oriented CAN
  • ETH service-oriented ETH tool
  • the data generation layer is responsible for constructing and viewing the intrusion detection sample set, which mainly includes a data acquisition module, an AI generated sample module, a feature extraction module, a format conversion module, a sample set evaluation module and a data viewing module, and may also include a database.
  • the data acquisition module can collect the traffic data generated by the device to be detected for the attack behavior through the hardware tool layer, and by analyzing the traffic data, mark the first type of attack samples and non-attack samples, and store the first type of attack samples and non-attack samples in the database.
  • the AI generated sample module can add noise to the first type of attack samples marked by the data acquisition module, and after obtaining the second type of attack samples that are easily misreported through the AI adversarial algorithm, store the second type of attack samples in the database.
  • the first type of attack samples, the second type of attack samples and non-attack samples in the database constitute the intrusion detection sample set.
  • the feature extraction module can extract features from the intrusion detection sample set in the database to form an offline detection sample set
  • the format conversion module can convert the format of the intrusion detection sample set in the database to form an online detection sample set.
  • the sample set evaluation module can calculate the evaluation value of the intrusion detection sample set under each preset indicator, and when the evaluation value is lower than the preset threshold, adjust the intrusion detection sample set until the evaluation value corresponding to the intrusion detection sample set is adjusted to not lower than the preset threshold.
  • the data viewing module can display some or all intrusion detection samples to the development and testing personnel or operation and maintenance personnel according to their commands.
  • the calling interface layer is responsible for providing the application programming interface (API). Specifically, it can call the corresponding data from the data generation layer through the API and provide it to the upper-layer application according to the usage requirements of the upper-layer application.
  • the calling interface layer can provide part of the offline detection samples in the offline detection sample set in the data generation layer to the upper-layer application through the API to train the intrusion detection model, or provide another part of the offline detection samples in the offline detection sample set in the data generation layer to the upper-layer application through the API to realize the offline evaluation of the intrusion detection model, and can also provide the online detection samples in the data generation layer to the upper-layer application through the API to realize the online evaluation of the device to be detected that is deployed with the intrusion detection model.
  • API application programming interface
  • the application scenario layer is responsible for interacting with external devices to apply the intrusion detection sample set to various possible application scenarios.
  • the application scenario layer can use the offline detection samples provided by the calling interface layer to train the intrusion detection model in IDS development, and can also use the online detection samples provided by the calling interface layer to evaluate the detection effect of the equipment to be detected with the intrusion detection model deployed in the vehicle-cloud operation and maintenance, and can also use the offline detection samples provided by the calling interface layer to evaluate the detection effect of the intrusion detection model in the IDS test, and can also provide the intrusion detection sample set provided by the calling interface layer to a third-party testing and certification agency, so that the equipment to be detected can be successfully shipped after obtaining the certification of the third-party certification agency, etc.
  • the sample types in the intrusion detection sample set can be made richer and more comprehensive.
  • this construction method can also construct adaptive and rich intrusion detection samples for each type of vehicle model, effectively improving the accuracy of using intrusion detection samples to evaluate vehicle network security.
  • the data processing method provided by this application can also be extended to any information system that has a demand for network security.
  • it can also be applied in the field of smart home, by attacking smart home products and scrambling to obtain rich attack samples, and studying the attacker portrait in the smart home scenario.
  • it can also be applied in the field of industrial control, By introducing attack samples into the digital twin model to study intrusion defense, the robustness of the industrial control system can be enhanced.
  • a vehicle or a component to be tested on a vehicle can be virtually constructed through a cloud server, and then the above data processing operation can be performed on the virtual vehicle or vehicle component to obtain an intrusion detection sample set.
  • the anti-attack capability of the vehicle can be known in advance before the actual assembly of the vehicle, so that the vehicle can be actually built only when it is determined that the vehicle can better defend against attacks, effectively saving manpower and material costs.
  • the above mainly introduces the solution provided by the present application from the perspective of the interaction between various network elements.
  • the above-mentioned network elements include hardware structures and/or software modules corresponding to the execution of various functions.
  • the present invention can be implemented in the form of hardware or a combination of hardware and computer software. Whether a function is executed in the form of hardware or computer software driving hardware depends on the specific application and design constraints of the technical solution. Professional and technical personnel can use different methods to implement the described functions for each specific application, but such implementation should not be considered to exceed the scope of the present invention.
  • FIG8 is a schematic diagram of the structure of a data processing device provided in an embodiment of the present application, and the data processing device can be any device with processing capabilities, such as a server, or can also be a chip or circuit, such as a chip or circuit that can be set in a server, or can also be a server cluster composed of multiple servers.
  • the data processing device 800 may include a processor 801, a memory 802, and a transceiver 803, and may further include a bus system, and the processor 801, the memory 802, and the transceiver 803 may be connected through the bus system.
  • each step of the above method can be completed by an integrated logic circuit of hardware in the processor 801 or an instruction in the form of software.
  • the steps of the method disclosed in the embodiment of the present application can be directly embodied as a hardware processor, or a combination of hardware and software modules in the processor 801.
  • the software module can be located in a mature storage medium in the field such as a random access memory, a flash memory, a read-only memory, a programmable read-only memory or an electrically erasable programmable memory, a register, etc.
  • the storage medium is located in the memory 802, and the processor 801 reads the information in the memory 802 and completes the steps of the above method in combination with its hardware.
  • the processor 801 can be a chip.
  • the processor 801 can be a field programmable gate array (FPGA), an application specific integrated circuit (ASIC), a system on chip (SoC), a central processor unit (CPU), a network processor (NP), a digital signal processor (DSP), a microcontroller unit (MCU), a programmable logic device (PLD) or other integrated chips.
  • FPGA field programmable gate array
  • ASIC application specific integrated circuit
  • SoC system on chip
  • CPU central processor unit
  • NP network processor
  • DSP digital signal processor
  • MCU microcontroller unit
  • PLD programmable logic device
  • the memory 802 in the embodiment of the present application can be a volatile memory or a non-volatile memory, or can include both volatile and non-volatile memories.
  • the non-volatile memory can be a read-only memory (ROM), a programmable read-only memory (PROM), an erasable programmable read-only memory (EPROM), an electrically erasable programmable read-only memory (EEPROM), or a flash memory.
  • the volatile memory can be a random access memory (RAM), which is used as an external cache.
  • RAM random access memory
  • many forms of RAM are available, such as static RAM.
  • SRAM Static RAM
  • DRAM dynamic RAM
  • SDRAM synchronous DRAM
  • DDR SDRAM double data rate SDRAM
  • ESDRAM enhanced SDRAM
  • SLDRAM synchlink DRAM
  • DR RAM direct rambus RAM
  • the memory 802 is used to store instructions
  • the processor 801 is used to execute the instructions stored in the memory 802 to implement the method corresponding to the data processing device in any one or more of the above Figures 3, 4, or 6.
  • the processor 801 attacks the device to be detected by calling the transceiver 803 to obtain a first type of attack sample, obtains a second type of attack sample by applying noise to the first type of attack sample, and then constructs an intrusion detection sample set based on the first type of attack sample and the second type of attack sample.
  • the first type of attack samples may include real attack samples and simulated attack samples.
  • the real attack samples are obtained by manually attacking the device to be detected, and the simulated attack samples are obtained by attacking the device to be detected with an attack tool.
  • the real attack sample may correspond to one or more of the following attack types: identity identification number ID non-existence attack, replay attack, tampering attack, data length error attack, signal out of defined range attack, context error attack, ID source non-specified electronic control unit ECU attack, identical ID attack, controller area network CAN scan attack, unified diagnostic service UDS execution sensitive operation attack, message authentication error attack, ECU identity spoofing attack, man-in-the-middle attack, ECU authentication error attack, brute force attack, application layer protocol error attack, unknown stack connection attack, unknown stack connection attack.
  • attack types identity identification number ID non-existence attack, replay attack, tampering attack, data length error attack, signal out of defined range attack, context error attack, ID source non-specified electronic control unit ECU attack, identical ID attack, controller area network CAN scan attack, unified diagnostic service UDS execution sensitive operation attack, message authentication error attack, ECU identity spoofing attack, man-in-the-middle attack, ECU authentication error attack, brute force attack, application layer protocol error
  • the simulated attack samples may correspond to one or more of the following attack types: ID Fuzz attack, data Fuzz attack, Dos attack on CAN, Dos attack on ETH, malformed packet injection attack, and port scanning attack.
  • the processor 801 is specifically used to: traverse each attack type of a preset plurality of attack types, and when traversing each attack type: call the transceiver 803 to execute the attack behavior corresponding to the attack type on the device to be detected, and obtain the traffic data generated by the device to be detected for the attack behavior, if the traffic data is attack traffic, mark the traffic data as a first type attack sample.
  • the processor 801 determines that the traffic data is normal traffic, it marks the traffic data as a non-attack sample, and then constructs an intrusion detection sample set based on the first type of attack samples, the second type of attack samples and the non-attack samples.
  • the processor 801 is specifically configured to: apply noise to the first type of attack sample to obtain a disturbance sample, input the disturbance sample into the attack recognition model, obtain a recognition result output by the attack recognition model, and then adjust the disturbance sample according to the recognition result, until the recognition result corresponding to the adjusted disturbance sample indicates that it is impossible to determine whether the adjusted disturbance sample is an attack sample, and then determine the adjusted disturbance sample as a second type of attack sample.
  • the recognition result is used to indicate whether the disturbance sample is an attack sample.
  • the processor 801 may also perform feature extraction on the intrusion detection sample set to obtain an offline detection sample set, which is used to evaluate the detection performance of the intrusion detection model.
  • the processor 801 is specifically configured to: determine the message type of each intrusion detection sample in the intrusion detection sample set, and for each intrusion detection sample whose message type is a TCP message, Feature extraction is performed on all intrusion detection samples of TCP connections to obtain an offline detection sample. For each intrusion detection sample whose message type is CAN (FD) message or UDP message, feature extraction is performed on the intrusion detection sample of each CAN (FD) message or UDP message to obtain an offline detection sample.
  • FD CAN
  • UDP UDP
  • the extracted features include one or more of the following features: timestamp, frequency feature, protocol type, content feature, packet loss rate, number of error packets, connection duration, connection initiator, and connection receiver.
  • the processor 801 after the processor 801 constructs an intrusion detection sample set based on the first type of attack samples and the second type of attack samples, it can also convert the format of the intrusion detection sample set to obtain an online detection sample set that matches the format of the test tool, and then input the online detection sample set into the device to be detected through the test tool.
  • the online detection sample is used to evaluate the detection performance of the device to be detected that is deployed with an intrusion detection model.
  • the processor 801 can also determine an evaluation value corresponding to the intrusion detection sample set based on the values of the intrusion detection sample set under various preset indicators, and adjust the intrusion detection sample set when the evaluation value is lower than a preset threshold.
  • the preset indicators may include one or more of the following indicators: data redundancy indicator, attack coverage indicator, protocol coverage indicator, service coverage indicator, data labeling indicator, balance indicator, feature independence indicator, and ease of use indicator.
  • the first type of attack sample may be obtained by attacking any of the following areas: the entire device to be detected; one or more physical areas of the device to be detected; or one or more functional areas of the device to be detected.
  • FIG9 is a schematic diagram of another data processing device provided in an embodiment of the present application.
  • the data processing device 900 may be exemplarily a data processing device as described in any of the above embodiments, or may be a chip or circuit, such as a chip or circuit that can be arranged in a data processing device.
  • the data processing device 900 may implement the steps performed by the data processing device in any one or more of the corresponding methods shown in FIG3, FIG4 or FIG6 above.
  • the data processing device 900 may include an attack unit 901, a perturbation unit 902, and a construction unit 903, and may also include one or more of a feature extraction unit 904, a format conversion unit 905, and an adjustment unit 906.
  • the attack unit 901 is used to obtain a first type of attack sample by attacking the device to be detected;
  • the perturbation unit 902 is used to obtain a second type of attack sample by applying noise to the first type of attack sample;
  • the construction unit 903 is used to construct an intrusion detection sample set according to the first type of attack sample and the second type of attack sample.
  • the feature extraction unit 904 is used to: extract features from the intrusion detection sample set to obtain an offline detection sample set, and the offline detection sample set is used to evaluate the detection performance of the intrusion detection model.
  • the format conversion unit 905 is used to: convert the format of the intrusion detection sample set to obtain an online detection sample set that matches the format of the test tool, and input the online detection sample set into the device to be detected through the test tool, and the online detection sample set is used to evaluate the detection performance of the device to be detected with the intrusion detection model deployed.
  • the adjustment unit 906 is used to determine an evaluation value corresponding to the intrusion detection sample set according to the value of the intrusion detection sample set under various preset indicators, and adjust the intrusion detection sample set when the evaluation value is lower than a preset threshold.
  • the attack unit 901 injects traffic into the device to be detected to attack the device to be detected.
  • the attack unit may be a sending unit, a transmitter, an output interface, a pin or a circuit when it is such as traffic.
  • the storage unit is used to store computer instructions
  • the attack unit 901, the perturbation unit 902, the construction unit 903, the feature extraction unit 904, the format conversion unit 905 and the adjustment unit 906 are respectively connected to the storage unit for communication, and respectively execute the computer instructions stored in the storage unit, so that the data processing device 900 can be used to execute the method executed by the data processing device in any of the above embodiments.
  • the attack unit 901, the perturbation unit 902, the construction unit 903, the feature extraction unit 904, the format conversion unit 905 and the adjustment unit 906 can be a general-purpose central processing unit (CPU), a microprocessor, and an application specific integrated circuit (ASIC).
  • the storage unit is a storage unit within the chip, such as a register, a cache, etc.
  • the storage unit can also be a storage unit within the data processing device 900 that is located outside the chip, such as a read-only memory (ROM) or other types of static storage devices that can store static information and instructions, a random access memory (RAM), etc.
  • ROM read-only memory
  • RAM random access memory
  • the division of the units of the above data processing device 900 is only a division of logical functions, and in actual implementation, all or part of them can be integrated into one physical entity, or they can be physically separated.
  • the attack unit 901, the disturbance unit 902, the construction unit 903, the feature extraction unit 904, the format conversion unit 905 and the adjustment unit 906 can be implemented by the processor 801 of Figure 8 above.
  • the present application also provides a data processing device, which includes a processor, the processor is connected to a memory, the memory is used to store computer programs, and the processor is used to execute the computer programs stored in the memory, so that the data processing device implements the method described in any one of the embodiments in Figures 3, 4 or 6.
  • the present application also provides a data processing device, which includes a processor and a memory, the memory is used to store computer program instructions, and the processor is used to run the computer program instructions to implement the method described in any one of the embodiments in Figures 3, 4 or 6.
  • the present application also provides a chip, which may include a processor and an interface, and the processor is used to read instructions through the interface to execute the method described in any of the embodiments in Figures 3, 4 or 6.
  • the present application also provides a data processing system, which may include the aforementioned device to be detected and a data processing apparatus.
  • the present application also provides a computer-readable storage medium, which stores a computer program.
  • a computer program When the computer program is executed, the method described in any one of the embodiments in Figures 3, 4 or 6 is implemented.
  • the present application also provides a computer program product, which, when executed on a processor, implements the method described in any one of the embodiments shown in FIG. 3 , FIG. 4 or FIG. 6 .
  • a component can be, but is not limited to, a process running on a processor, a processor, an object, an executable file, an execution thread, a program and/or a computer.
  • applications running on a computing device and a computing device can be components.
  • One or more components may reside in a process and/or an execution thread, and a component may be located on one computer and/or distributed between two or more computers.
  • these components may be executed from various computer-readable media having various data structures stored thereon.
  • a component may, for example, be based on a program or a process having one or more data packets (e.g., from interacting with another component in a local system, a distributed system and/or a network). Data between two components is communicated through local and/or remote processes, such as the Internet (for example, via signals to interact with other systems).
  • the disclosed systems, devices and methods can be implemented in other ways.
  • the device embodiments described above are only schematic.
  • the division of the units is only a logical function division. There may be other division methods in actual implementation, such as multiple units or components can be combined or integrated into another system, or some features can be ignored or not executed.
  • Another point is that the mutual coupling or direct coupling or communication connection shown or discussed can be through some interfaces, indirect coupling or communication connection of devices or units, which can be electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place or distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
  • the functions are implemented in the form of software functional units and sold or used as independent products, they can be stored in a computer-readable storage medium.
  • the computer software product is stored in a storage medium, including several instructions for a computer device (which can be a personal computer, server, or network device, etc.) to perform all or part of the steps of the methods described in each embodiment of the present application.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM), random access memory (RAM), disk or optical disk, and other media that can store program codes.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

L'invention concerne un procédé et un appareil de traitement de données, et un support de stockage et un produit-programme, qui sont applicables au domaine technique de la sécurité de réseau et sont utilisés pour construire un ensemble d'échantillons de détection d'intrusion plus complet. Le procédé comprend les étapes suivantes : obtention d'échantillons d'attaque de premier type au moyen d'une attaque d'un dispositif à tester ; obtention d'échantillons d'attaque de second type au moyen de l'application d'un bruit aux échantillons d'attaque de premier type ; et sur la base des échantillons d'attaque de premier type et des échantillons d'attaque de second type, mise en œuvre d'une construction afin d'obtenir un ensemble d'échantillons de détection d'intrusion. Au moyen du procédé, un ensemble d'échantillons de détection d'intrusion peut couvrir les deux échantillons d'attaque dans un scénario d'intrusion connu et des échantillons d'attaque dans un scénario d'intrusion inconnu, et des échantillons de détection d'intrusion dans l'ensemble d'échantillons de détection d'intrusion sont plus complets ; en outre, l'utilisation de l'ensemble d'échantillons de détection d'intrusion plus complet pour évaluer un dispositif à tester peut également améliorer la précision d'évaluation de sécurité sur le dispositif à tester.
PCT/CN2023/078600 2023-02-28 2023-02-28 Procédé et appareil de traitement de données, et support de stockage et produit-programme WO2024178581A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2023/078600 WO2024178581A1 (fr) 2023-02-28 2023-02-28 Procédé et appareil de traitement de données, et support de stockage et produit-programme

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2023/078600 WO2024178581A1 (fr) 2023-02-28 2023-02-28 Procédé et appareil de traitement de données, et support de stockage et produit-programme

Publications (1)

Publication Number Publication Date
WO2024178581A1 true WO2024178581A1 (fr) 2024-09-06

Family

ID=92589081

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/078600 WO2024178581A1 (fr) 2023-02-28 2023-02-28 Procédé et appareil de traitement de données, et support de stockage et produit-programme

Country Status (1)

Country Link
WO (1) WO2024178581A1 (fr)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106656981A (zh) * 2016-10-21 2017-05-10 东软集团股份有限公司 网络入侵检测方法和装置
CN114707572A (zh) * 2022-02-24 2022-07-05 浙江工业大学 一种基于损失函数敏感度的深度学习样本测试方法与装置
CN115129607A (zh) * 2022-07-19 2022-09-30 中国电力科学研究院有限公司 电网安全分析机器学习模型测试方法、装置、设备及介质
JP2022163431A (ja) * 2021-04-14 2022-10-26 株式会社日立製作所 計算機システム及び予測プログラムの評価方法
CN115712893A (zh) * 2021-08-20 2023-02-24 华为技术有限公司 一种攻击检测方法及装置

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106656981A (zh) * 2016-10-21 2017-05-10 东软集团股份有限公司 网络入侵检测方法和装置
JP2022163431A (ja) * 2021-04-14 2022-10-26 株式会社日立製作所 計算機システム及び予測プログラムの評価方法
CN115712893A (zh) * 2021-08-20 2023-02-24 华为技术有限公司 一种攻击检测方法及装置
CN114707572A (zh) * 2022-02-24 2022-07-05 浙江工业大学 一种基于损失函数敏感度的深度学习样本测试方法与装置
CN115129607A (zh) * 2022-07-19 2022-09-30 中国电力科学研究院有限公司 电网安全分析机器学习模型测试方法、装置、设备及介质

Similar Documents

Publication Publication Date Title
US11277427B2 (en) System and method for time based anomaly detection in an in-vehicle communication
US11115433B2 (en) System and method for content based anomaly detection in an in-vehicle communication network
WO2019200944A1 (fr) Procédé de détection d'attaque par intrusion physique pour système de commande industriel, basé sur une analyse de signaux de bus de communication en série
CN111385309B (zh) 在线办公设备的安全检测方法、系统及终端
Greensmith et al. The DCA: SOMe comparison: A comparative study between two biologically inspired algorithms
CN112822223B (zh) 一种dns隐蔽隧道事件自动化检测方法、装置和电子设备
CN108462675A (zh) 一种网络访问识别方法及系统
WO2024007615A1 (fr) Procédé et appareil d'entraînement de modèle, et dispositif associé
Jo et al. Automatic whitelist generation system for ethernet based in-vehicle network
CN113098852B (zh) 一种日志处理方法及装置
CN117729540A (zh) 一种基于统一边缘计算框架的感知设备云边安全管控方法
Zhao et al. GVIDS: A reliable vehicle intrusion detection system based on generative adversarial network
WO2024178581A1 (fr) Procédé et appareil de traitement de données, et support de stockage et produit-programme
CN115840965B (zh) 一种信息安全保障模型训练方法和系统
CN113420791B (zh) 边缘网络设备接入控制方法、装置及终端设备
Kneib A survey on sender identification methodologies for the controller area network
Lee et al. A Comprehensive Analysis of Datasets for Automotive Intrusion Detection Systems.
Oujezsky et al. Modeling botnet C&C traffic lifespans from NetFlow using survival analysis
CN114006714A (zh) 用于实现终端验证的方法、装置、系统、设备及存储介质
Yli-Olli Machine Learning for Secure Vehicular Communication: an Empirical Study
CN111565187B (zh) 一种dns异常检测方法、装置、设备及存储介质
CN116909161B (zh) 基于可穿戴设备的智能家居控制方法、系统
CN117633665B (zh) 一种网络数据监控方法及系统
Varghese et al. Novel CAN Bus Fuzzing Framework for Finding Vulnerabilities in Automotive Systems
Yu et al. ARINC-825TBv2: A hardware-in-the-ioop simulation platform for aerospace security research

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23924559

Country of ref document: EP

Kind code of ref document: A1