Disclosure of Invention
Aiming at the defects in the prior art, the invention provides an industrial control protocol vulnerability mining method based on vulnerability semantic intelligent analysis.
The technical scheme of the invention is as follows:
an industrial control protocol vulnerability mining method based on vulnerability semantic intelligent analysis comprises the following steps:
s1: acquiring historical test case data to form a historical data set, dividing the historical data set into a training data set and a test data set, respectively acquiring the training data set and the test data set through a network sniffing module, sequentially performing data preprocessing and feature extraction on the acquired training data set and the acquired test data set to obtain a feature vector of the training data set and a feature vector of the test data set, and entering step S2;
s2, carrying out vector grouping on the feature vectors of the training data set, calculating the feature vectors of the grouped training data set to obtain a centroid vector of the training data set, simultaneously inputting the centroid vector of the training data set and the feature vectors of the test data set, carrying out similarity matching, constructing a semantic analysis model, and entering the step S3;
s3: and inputting a vulnerability mining command of the industrial control protocol, and performing semantic analysis on the industrial control protocol through a semantic analysis model to obtain a vulnerability mining result of the industrial control protocol.
Preferably, the data preprocessing in step S1 includes protocol parsing and data truncation;
the protocol analysis is used for splitting the training data set and the test data set into a plurality of independent data packets according to the basic protocol grammar and deleting the header of the basic protocol;
the data truncation is used to truncate individual packets that exceed a set byte and discard individual packets that are less than the set byte.
Preferably, the mathematical expressions of the training data set and the test data set after data preprocessing in step S1 are as follows:
d[x][y]
wherein x is the length of the independent data packet, and y is the number of the independent data packets participating in the statistical feature extraction.
Preferably, the feature extraction in step S1 is specifically performed from the value ranges, randomness, and statistical parameters of the training data set and the test data set.
Preferably, the network sniffing module in step S1 selects a Wireshark module to collect communication flows of all protocols.
Preferably, in step S2, similarity matching is performed between the centroid vector of the training data set and the feature vector of the test data set, where jffereys & Matusita is used as a similarity function, and a mathematical expression thereof is as follows:
wherein m is a centroid vector, N is a feature vector, d (m, N) is the distance between the centroid vector m and the feature vector N, N is the number of vectors, i is an arbitrary constant, the value is between 0 and N, and m is a valueiIs the ith centroid vector, niIs the ith feature vector.
An industrial control protocol vulnerability mining system based on vulnerability semantic intelligent analysis comprises: the system comprises a data acquisition module, a first network sniffing module, a second network sniffing module, a first data preprocessing module, a second data preprocessing module, a first feature extraction module, a second feature extraction module, a vector grouping module, a calculation module, a similarity matching module, an analysis module, a command acquisition module and a result output module; the first network sniffing module and the second network sniffing module are respectively connected with the data acquisition module; the first network sniffing module, the first data preprocessing module, the first feature extraction module, the vector grouping module and the calculation module are sequentially connected; the second network sniffing module, the second data preprocessing module and the second feature extraction module are sequentially connected; the second feature extraction module and the calculation module are respectively connected with the similarity matching module; the similarity matching module and the command acquisition module are respectively connected with the analysis module; the analysis module is connected with the result output module.
The invention has the beneficial effects that:
the invention provides an industrial control protocol vulnerability mining method based on vulnerability semantic intelligent analysis, which not only avoids manual intervention and improves the efficiency of semantic analysis, but also reduces the restriction conditions of the semantic analysis, increases the scope of the semantic analysis, can effectively select and analyze the grammatical features of the industrial control protocol, and has good practicability.
Detailed Description
Embodiments of the present invention will be described in detail below with reference to the accompanying drawings. The following examples are only for illustrating the technical solutions of the present invention more clearly, and therefore are only examples, and the protection scope of the present invention is not limited thereby.
As shown in fig. 1, an industrial control protocol vulnerability mining method based on vulnerability semantic intelligent analysis includes the following steps:
s1: acquiring historical test case data to form a historical data set, dividing the historical data set into a training data set and a test data set, respectively acquiring the training data set and the test data set through a network sniffing module, sequentially performing data preprocessing and feature extraction on the acquired training data set and the acquired test data set to obtain a feature vector of the training data set and a feature vector of the test data set, and entering step S2;
s2, carrying out vector grouping on the feature vectors of the training data set, calculating the feature vectors of the grouped training data set to obtain a centroid vector of the training data set, simultaneously inputting the centroid vector of the training data set and the feature vectors of the test data set, carrying out similarity matching, constructing a semantic analysis model, and entering the step S3;
s3: and inputting a vulnerability mining command of the industrial control protocol, and performing semantic analysis on the industrial control protocol through a semantic analysis model to obtain a vulnerability mining result of the industrial control protocol.
Preferably, the data preprocessing in step S1 includes protocol parsing and data truncation;
the protocol analysis is used for splitting the training data set and the test data set into a plurality of independent data packets according to the basic protocol grammar and deleting the header of the basic protocol;
the data truncation is used to truncate individual packets that exceed a set byte and discard individual packets that are less than the set byte.
Preferably, the mathematical expressions of the training data set and the test data set after data preprocessing in step S1 are as follows:
d[x][y]
wherein x is the length of the independent data packet, and y is the number of the independent data packets participating in the statistical feature extraction.
Preferably, the feature extraction in step S1 is specifically performed from the value ranges, randomness, and statistical parameters of the training data set and the test data set.
Preferably, the network sniffing module in step S1 selects a Wireshark module to collect communication flows of all protocols.
Preferably, in step S2, similarity matching is performed between the centroid vector of the training data set and the feature vector of the test data set, where jffereys & Matusita is used as a similarity function, and a mathematical expression thereof is as follows:
wherein m is a centroid vector, N is a feature vector, d (m, N) is the distance between the centroid vector m and the feature vector N, N is the vector number, i is an arbitrary constant, the value is taken between 0 and N, and m is a constantiIs the ith centroid vector, niIs the ith feature vector.
An industrial control protocol vulnerability mining system based on vulnerability semantic intelligent analysis comprises: the system comprises a data acquisition module, a first network sniffing module, a second network sniffing module, a first data preprocessing module, a second data preprocessing module, a first feature extraction module, a second feature extraction module, a vector grouping module, a calculation module, a similarity matching module, an analysis module, a command acquisition module and a result output module; the first network sniffing module and the second network sniffing module are respectively connected with the data acquisition module; the first network sniffing module, the first data preprocessing module, the first feature extraction module, the vector grouping module and the calculation module are sequentially connected; the second network sniffing module, the second data preprocessing module and the second feature extraction module are sequentially connected; the second feature extraction module and the calculation module are respectively connected with the similarity matching module; the similarity matching module and the command acquisition module are respectively connected with the analysis module; the analysis module is connected with the result output module.
In addition, in this embodiment, a vulnerability mining result of the industrial control protocol may be summarized and combined with the relevant device data of the industrial control protocol to construct and form a vulnerability management database.
The vulnerability management database includes the following functions:
and (4) project management functions:
mechanisms for project and task management are introduced, each project may be composed of a variety of tasks of the same or different types. Projects are created by workers with administrator privileges, logging in with the identity of the administrator will first enter the project management interface. In the interface, the progress and the task real-time completion condition of each project can be known most intuitively, and a new project can be created as required.
And (4) task management function:
mechanisms for project and task management are introduced, each project may be composed of various tasks of the same or different types, and the completion of the project is driven by the tasks.
The device library management function:
the device libraries (such as SIMATIC, Schneider, ABB, Supcon and the like) of industrial automation control device suppliers are introduced, and managers can add new manufacturers and device models according to actual use conditions.
The user management function:
the manager can manage different authorities for the local and remote users and perform operations such as adding, activating, deleting, editing and the like on the users.
The log management function:
and the manager can check, clear and export the user and task operation logs.
The system management function is as follows:
the current vulnerability management database can be checked, the version of the vulnerability management database is further upgraded, the system time of the vulnerability management database can be modified, the default management address of the vulnerability management database is changed, the vulnerability management database is restarted, the vulnerability management database is closed, and the like.
In addition, besides the semantic analysis based on the industrial control protocol, the method also comprises a testing link of the industrial control protocol, and the testing link comprises the following aspects:
conformance test (Conformance testing): and testing certain implementation of the protocol according to the description of the protocol, and judging whether the implementation of one protocol is consistent with the corresponding protocol standard or not.
Interoperability Testing (Interoperability Testing): the inter-working and inter-operational capabilities between different implementations of the same protocol are examined. Whether a protocol implementation can pass a consistency test and an interoperability test is a decisive guarantee whether it can successfully interwork with other protocols in the same system.
Performance Testing (Performance Testing): some performances of the protocol implementation are tested, and whether the performance characteristics of one protocol implementation conform to the protocol description is judged, such as data transmission rate, connection time, execution speed, throughput, concurrency and the like.
Robustness test (Robustness Testing): the protocol implemented device or system is tested for proper handling and analysis under various invalid, abnormal input or stressful environmental conditions. The protocol robustness test is mainly based on an intelligent fuzzy test engine, and the test means comprises the following steps:
buffer overflow type input: for some variable fields, it is difficult to delimit the cache by entering an excess of characters or numbers, eventually overflowing, the system stops responding or goes down.
Inputting an integer type: for some fields of similar length, the conditional statement is invalidated by entering a boundary or limit value and the service terminates.
Underflow type input: for some mandatory-length fields, such as MAC addresses, the variables may not get enough assignments by missing or truncating part of the information, thereby causing logic failure.
Inputting a format type: for some continuous fields, character delimiting rules are generally provided, for example, boundaries are represented after how many continuous all zeros, and by violating the rules, a program cannot complete the delimitation and the system is down; for some fields with specific formats, such as characters or integers, the program logic is lengthened or directly exited by entering an illegal format.
Message order error type input: by modifying the occurrence sequence of the messages, the system is difficult to judge, and the state machine cannot complete normal transfer, thereby causing service delay or grade reduction.
Repeated input: and circularly generating a certain specific field information in the normal message, so that the program detection is abnormal and the system stops responding.
In the application scenes, the vulnerability management database can play a role in multiple roles, and on one hand, the vulnerability management database can be used for detecting the safety of an industrial control protocol and mining unknown vulnerabilities; on the other hand, the method can be used as a simulated malicious attacker to check and protect whether the industrial control equipment can play a role.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; such modifications and substitutions do not depart from the spirit and scope of the present invention, and they should be construed as being included in the following claims and description.