CN113132392A

CN113132392A - Industrial control network flow abnormity detection method, device and system

Info

Publication number: CN113132392A
Application number: CN202110434596.6A
Authority: CN
Inventors: 唐玉维
Original assignee: Suzhou Liandian Energy Development Co ltd
Current assignee: Suzhou Liandian Energy Development Co ltd
Priority date: 2021-04-22
Filing date: 2021-04-22
Publication date: 2021-07-16
Anticipated expiration: 2041-04-22
Also published as: CN113132392B

Abstract

The application relates to a method, a device and a system for detecting abnormal industrial control network flow, which relate to the technical field of network security and are used for matching message pairs consisting of request messages and response messages in industrial control network flow data; converting the successfully matched message pair into a symbol sequence carrying a timestamp, wherein each symbol in the symbol sequence indicates a unique state event; sequentially acquiring symbols in a symbol sequence carrying timestamps in sequence, and inputting a pre-constructed anomaly detection model for anomaly detection; if the symbol in the symbol sequence obtained currently is detected to belong to a known symbol, inputting the symbol into a corresponding sub-period DFA model; and determining an abnormal detection result according to the state transition result after the sub-period DFA model receives the symbol. The problem that the semantic attack aiming at the SCADA system damages industrial equipment or industrial production at present can be solved.

Description

Industrial control network flow abnormity detection method, device and system

Technical Field

The application relates to a method, a device and a system for detecting abnormal flow of an industrial control network, belonging to the technical field of network security.

Background

SCADA (Supervisory control and data acquisition) systems are used to monitor and control critical infrastructure such as wastewater distribution facilities, natural gas production systems and power plants. The SCADA system is mainly realized through communication between the HMI and the PLC, the HMI sends related instructions to the PLC regularly according to certain logic according to service requirements, the PLC accesses information of the field equipment according to the received instruction content and returns the information to the HMI, and the HMI displays the information after receiving the returned information so as to achieve the purpose of monitoring and controlling. In actual industrial production, there is a definite periodic behavior and operation sequence, so there is also a high periodicity of SCADA traffic in business logic.

The cycle types of SCADA flow are: the system comprises a polling period and a timing period, wherein the polling period refers to that the SCADA system sequentially executes a series of instructions according to the service logic of industrial production and is mainly used for retrieving data from field devices. The timing cycle is the time at which the SCADA system performs some type of operation at regular intervals, and is commonly used to adjust the state of field devices. There may be a mixture of multiple polling periods and multiple timing periods in the HMI and PLC communication channels. For more complex cases, for example, assume that communication between the HMI and the PLC employs a multi-threaded architecture, each thread being responsible for an independent task, with concurrent execution between threads. In this case, the traffic in the industrial control system is multiplexed, i.e., a certain traffic may occur in a plurality of periodic patterns.

In real industrial production, the SCADA system faces not only traditional network attacks, such as function code exception, dos (national office of service), buffer overflow and the like, but also a semantic attack specially aiming at the SCADA system. The attacker has detailed knowledge of the industrial process and the physical equipment, and can purposely damage the industrial equipment or the industrial production by constructing a group of message sequences which are seemingly 'legal'.

Disclosure of Invention

The application provides a method, a device and a system for detecting abnormal industrial control network flow, which can solve the problem that the semantic attack aiming at an SCADA system at present damages industrial equipment or industrial production.

The application provides the following technical scheme:

in a first aspect, a method for detecting an industrial control network traffic anomaly is provided, where the method includes:

matching message pairs consisting of request messages and response messages in industrial control network flow data;

converting the successfully matched message pair into a symbol sequence carrying a timestamp, wherein each symbol in the symbol sequence indicates a unique state event;

sequentially acquiring symbols in a symbol sequence carrying timestamps in sequence, and inputting a pre-constructed anomaly detection model for anomaly detection;

if the symbol in the symbol sequence obtained currently is detected to belong to a known symbol, inputting the symbol into a corresponding sub-period DFA model; the sub-period DFA model is obtained by establishing a DFA model for the symbol set in each sub-period according to a state transition relation after classifying the symbol sequence corresponding to the industrial control network traffic data to obtain the symbol sets corresponding to the plurality of sub-periods;

and determining an abnormal detection result according to the state transition result after the sub-period DFA model receives the symbol.

In a second aspect, an industrial control network traffic anomaly detection device is provided, which comprises

The message matching module is used for matching a message pair consisting of a request message and a response message in the industrial control network flow data;

the mapping module is used for converting the successfully matched message pair into a symbol sequence carrying a timestamp, and each symbol in the symbol sequence indicates a unique state event;

the symbol acquisition module is used for sequentially acquiring symbols in a symbol sequence carrying the time stamps in sequence and inputting a pre-constructed anomaly detection model for anomaly detection;

the judging module is used for inputting the symbols in the currently acquired symbol sequence into a corresponding sub-period DFA model if the symbols are detected to belong to known symbols; the sub-period DFA model is obtained by establishing a DFA model for the symbol set in each sub-period according to a state transition relation after classifying the symbol sequence corresponding to the industrial control network traffic data to obtain the symbol sets corresponding to the plurality of sub-periods;

and the result output module is used for determining an abnormal detection result according to the state transition result after the sub-period DFA model receives the symbol.

In a third aspect, an industrial control network traffic anomaly detection system is provided, where the system includes a processor and a memory; the memory stores a program, and the program is loaded and executed by the processor to implement the steps of the industrial control network traffic anomaly detection method according to the first aspect of the present application.

The beneficial effect of this application lies in: according to the method for detecting the traffic anomaly of the industrial control network, the anomaly detection model is built, the sub-periods are divided based on the Markov principle, and the DFA model is respectively established for each sub-period, so that the abnormal traffic can be accurately detected, and more complex semantic attacks can be detected. Compared with the existing anomaly detection method, the method can detect more types of semantic attacks and has lower false alarm rate and lower missing report rate of the detection model.

The foregoing description is only an overview of the technical solutions of the present application, and in order to make the technical solutions of the present application more clear and clear, and to implement the technical solutions according to the content of the description, the following detailed description is made with reference to the preferred embodiments of the present application and the accompanying drawings.

Drawings

FIG. 1 is a state transition diagram provided by one embodiment of the present application;

fig. 2 is a flowchart of an industrial control network traffic anomaly detection method according to an embodiment of the present application;

FIG. 3 is a block diagram of an anomaly detection model provided in one embodiment of the present application;

FIG. 4 is a state transition diagram provided by another embodiment of the present application;

FIG. 5 is a state transition diagram for a sub-cycle provided by one embodiment of the present application;

fig. 6 is a block diagram of an industrial control network traffic anomaly detection device according to an embodiment of the present application.

Fig. 7 is a block diagram of an industrial control network traffic anomaly detection system according to an embodiment of the present application.

Detailed Description

The following detailed description of embodiments of the present application will be described in conjunction with the accompanying drawings and examples. The following examples are intended to illustrate the present application but are not intended to limit the scope of the present application.

Fig. 1 shows a state transition diagram corresponding to a multi-cycle mixed industrial control network flow, and referring to fig. 1, the state transition diagram includes a polling cycle state sequence "abcdbadddbedf … …" and another timing cycle state sequence "AAA … …", which is a multi-cycle mixed state transition diagram.

As shown in fig. 1, for this state transition diagram, simple semantic attacks can be divided into two categories: sequence attacks and time-series attacks.

The sequence attack refers to that an attacker sends message instructions in an illegal and malicious sequence, for example, the sequence attack is carried out by reversing the sub-sequence of 'ab' in the sequence 'abcdbedfabdbedf … …' to form an abnormal sequence 'backdbedfbacdbedf … …'. For example, in the case of a sequence attack affecting a high pressure gas delivery pipe, the pressure of the gas delivery pipe is controlled by two valves, and the attacker controls the PLC of the gas delivery pipe to force one valve to be fully opened and the other valve to be fully closed by sending instructions, resulting in the pressure of the gas delivery pipe being too high and stopping working. These instructions are legitimate when detected individually, but will stop the system when they are sent in an illegitimate order.

The time sequence attack means that an attacker sends a message instruction at an illegal time, for example, the cycle time of the sequence 'AAA … …' is changed from 5 seconds to 2 seconds to form the time sequence attack. Illustratively, in a water delivery system, an attacker sends normal sequence commands to the PLC at an abnormal frequency, causing the valves of the water delivery pipes to open and close quickly, creating an air hammer effect, causing a large number of water delivery pipes to break.

More complex semantic attacks can be constructed if the attacker has deeper knowledge of the industrial production process: branch node attacks and sub-cycle replay attacks.

The branch node attack means that an attacker reverses the transmission order of subsequences to cause an attack. For the branch nodes b, b → c and b → e, the state transitions are legal, and the attacker can construct the branch node attack sequence "abedbcdfabebcddf … …" by reversing the order of the sub-sequences "bcd" and "bed".

The sub-period replay attack refers to the attack caused by repeatedly sending a sub-sequence by an attacker. For the sub-period "AAA … …" in the state diagram shown in fig. 1, an attacker can send state a multiple times so that the entire production flow changes to interfere with industrial production.

The embodiment of the application provides an industrial control network flow abnormity detection method aiming at the defect of semantic attack detection of the conventional industrial control system abnormity detection method on multi-period mixed industrial control network flow.

Fig. 2 is a flowchart of an industrial control network traffic anomaly detection method according to an embodiment of the present application, where the method includes the following steps:

s201: and acquiring industrial control network flow data.

S202: and matching the message pair consisting of the request message and the response message in the industrial control network flow data.

Specifically, the matching in this embodiment refers to whether the response message is a response to the request message. If so, the match is successful. If not, the match fails.

S203: judging whether the message pair is successfully matched, and if the message pair is not successfully matched, entering S204; if the matching is successful, entering S205;

s204: detects as a loss anomaly and returns to S201.

S205: and converting the successfully matched message pair into a symbol sequence carrying a timestamp.

In the interaction process of the industrial control system, the specific operation of the industrial production is performed through communication flow, and the embodiment converts the flow data into a symbol sequence by converting the acquired flow data into a state event and constructing a state transition diagram, wherein different states represented by each node in the state transition diagram are represented by different symbols. The specific conversion is as follows:

different industrial protocols contain different fields and different definitions of state events, in order to ensure that flow data can be accurately converted into the state events, semantic features of the protocols need to be analyzed, appropriate feature fields are selected, an original flow sequence can be converted into a symbol sequence carrying time stamps according to the selected feature fields, and each symbol represents a unique state event.

In this embodiment, taking the S7 Protocol as an example, a feature extraction and state conversion rule based on the S7 Protocol is formulated, and Data headers of Protocol Id, ROSCTR, Parameter Length, Data Length, Function Code, Item Count, and Item in the Protocol are extracted to define a state event. The state event transition rules defined by these features are as follows:

(1) when the S7 message function code is read (0x04), the same status event is obtained when the addresses of the objects read by the request message are the same (reading the same field device information), and the same status event is obtained when the parameters of the return values of the response message are the same (the returned device information is the same).

(2) When the S7 message function code is write operation (0x05), the same object address and the same write value (the same value is written to the same field device) of the request message are the same state event, and the same state event is obtained when the parameters of the response message return value are the same (whether writing is successful).

Any one of the S7 messages in the traffic data may be mapped to a unique state event according to the above-described transformation rules.

In order to facilitate the construction of the anomaly detection model, in this embodiment, values of the parameter items required in the S7 message are taken out first, and symbol string concatenation is performed, so that one S7 message becomes a symbol string. And then, converting the symbol strings by using an SHA-1 function to obtain hash symbol strings with equal length, and simultaneously keeping the timestamp of each hash symbol string to complete the conversion from the message pair to the symbol sequence, wherein each symbol represents a unique state event.

For convenience of presentation, for example, the corresponding hash symbol string may be represented by "a, b, c … …", illustratively, for example, symbol a represents reading of a certain sensor, and symbol b represents modifying a parameter of a certain controller.

S206: and sequentially acquiring symbols in the symbol sequence carrying the time stamp in sequence, and inputting a pre-constructed anomaly detection model for anomaly detection.

Specifically, the anomaly detection model constructed in the embodiment includes at least two sub-period DFA models, a DFA selector, and an anomaly determination module.

In this embodiment, a state transition diagram is constructed from a symbol sequence carrying a timestamp, and symbols in different states represented by each node in the state transition diagram are classified to obtain a plurality of separated sub-periods. After separating a plurality of sub-periods, respectively and independently constructing a DFA model for each sub-period, wherein each sub-period DFA model is used for outputting a state transition result according to an input symbol.

Referring to fig. 3, the present embodiment sets the DFA selector according to the periodic pattern of the sub-periods. When at least 2 seed period patterns exist in the channel, the symbols in the symbol sequence are sent into the corresponding sub-period DFA model by adding a DFA selector.

The DFA selector of this embodiment performs the selection function by analyzing and comparing the symbol content and the time stamp in the channel^[30]The design is mainly designed for the following two cases:

(1) the symbol contents contained in the sub-period modes are different, and the symbol contents can be directly selected according to the symbol contents, and the flow symbols are sent to the corresponding DFAs.

(2) The content of the symbols contained in the sub-period mode is repeated, and the comparison between the timestamp and the period value is added on the basis of the content of the symbols to send the traffic symbols into the corresponding DFA model.

The anomaly determination module of this embodiment is configured to determine whether the input industrial control network traffic is abnormal according to the state transition result output by the sub-period DFA model. For a specific determination method, please refer to the following description.

S207: judging whether the currently acquired symbol belongs to a known symbol, if not, entering S208; if yes, go to S209;

specifically, the input symbols are divided into two categories: known symbols and unknown symbols.

The known symbol set is composed of all input symbols observed in the training phase of the anomaly detection model, and has corresponding DFA states, and the rest are unknown symbols which can be directly detected as unknown attacks.

The present embodiment sets a state alphabet States, in which the elements are state types in the symbol sequence SymSeq.

The symbol s to be acquired in the present embodiment_iSending into DFA selector, judging s_iWhether in the status alphabet States, if s_iIn the state alphabet States, then known symbols, otherwise unknown symbols.

S208: detecting the currently acquired symbol as an unknown anomaly and proceeding to S206;

s209: the symbols are input to the corresponding sub-period DFA model.

S210: after inputting the currently obtained symbol, whether the state is transferred to the expected position of the periodic sequence or not is judged, if yes, the operation goes to S213; otherwise, go to S212;

s211, whether the state is still kept at the current state, if not, the step goes to S212, and if so, the step goes to S213.

S212, detecting as a lost exception, executing S206.

S213, detecting the abnormality of 'retransmission', and executing S206;

s214: judging the normal state transition;

s215: judging that the difference value between the timestamp carried by the current symbol and the average time interval in the current state is greater than a time interval deviation threshold value, if so, entering S216; otherwise, the process proceeds to S217.

S216: the current symbol timing is detected to be abnormal, and the process proceeds to S216.

S217: the detection is that the time sequence and the order are normal.

In the DFA model, traffic behavior is described as three types of state transitions (output symbols):

(1) and (3) normal: a "normal" state transition occurs after receiving the known symbol, such that the state transitions to the next state of the periodic sequence, i.e., s_j＝s_i+1. As a result of a "normal" event, the DFA transitions to its next State State_i+1。

(2) And (4) retransmission: a "retransmission" is the occurrence of the same known symbol as the previous symbol. As a result of the "retransmit" event, the DFA remains in its current State State_iI.e. s_j＝s_i. If there are two consecutive identical symbols in the pattern, the DFA will cause the state to have two different state transitions for the same symbol, namely a forward "normal" state transition and a self-looping "retransmission" state transition. This uncertainty can be resolved at runtime by selecting a "normal" state transition rather than a self-looping "retransmit" state transition.

(3) Loss: "missing" refers to a known symbol s_jOut of State_i+1Is received (not present at the expected location of the state transition process), i.e. s_j≠s_i+1. As a result of a "lost" event, the DFA state transitions to receive s_jState of a symbol_j。

For steps S211-S217, the acquired symbols are fed into the sub-period DFA model DFA_iThen, if DFA_iThe current State is State_jInputting s_iPost State transition to State_j+1Then the transition is "normal".

Input s_iThen, if the State is transferred, the State is still_jThen a "retransmit" exception is detected.

Input s_iAfter that, the State transitions to State_j+kIf k is greater than or equal to 2, a "lost" anomaly is detected.

For the symbol detected as a normal transition, further detection of the timing is required, T being the symbol s_iTime stamping of, DFA_iEach State of_jAll have corresponding average time intervals T_ijAvg, duration threshold is the time interval deviation threshold, if T-T_ijAvg | < duration threshold, then s_iIs "normal" in both order and timing, otherwise s_iDetected as a "timing" anomaly.

Optionally, the present embodiment further includes a step of constructing an anomaly detection model, which is specifically as follows:

and S301, constructing a state transition diagram.

According to S205, the present embodiment converts the original traffic data into a symbol sequence carrying a timestamp, that is, includes a symbol sequence SymSeq and a time sequence TimeSeq corresponding to the symbol sequence.

In this embodiment, a state transition diagram is constructed according to a symbol sequence carrying a timestamp, and a state transition relationship of an event can be obtained, where the state transition relationship is represented by a matrix adjStates, where adjStates [ i ] [ j ] is an element of the adjStates, and a sequence number i corresponds to an ith state in the States: i ', and a sequence number j is a jth state in the States: j', and the adjStates [ i ] [ j ] represents the number of transitions from the state i 'to the state j'.

After the state transition relation use matrix adjStates is constructed, elements in the state transition relation use matrix adjStates are divided by the total time interval of the time sequence TimeSeq to obtain a frequency matrix adjF. The frequency matrix adjF is a finally constructed state transition diagram matrix, wherein elements adjF [ i ] [ j ] of the adjF represent the transition frequency from the state i 'to the state j'. The specific construction process is as follows:

and S1, sequentially taking out symbols symbol from SymSeq, judging whether the current symbol is a new state event or not, and if so, adding the current symbol to the state alphabet States.

And S2, updating the transition relation between the current state and the previous historical state, namely adding 1 to the corresponding element of the matrix adjStates.

And S3, judging whether the symbols in the SymSeq are completely taken or not, and if not, entering S1. Otherwise, the process proceeds to S4.

And S4, dividing the values in the matrix adjStates by the total time interval of the time sequence in sequence, and calculating the frequency of the degree of entrance and exit of each state node to obtain a frequency matrix adjF.

And S5, counting the access value of each state node and the relevant access state set according to whether the value of the frequency matrix adjF is 0 or not.

The embodiment of the application classifies the symbols represented by different States in the alphabet States into different symbol sets S according to the access relation and the access frequency, thereby achieving the purpose of separating a plurality of sub-periods. Finally classifying to obtain a subcycle set C ═ S₁,S₂,…,S_i,…,S_nIn which S is_i，i∈[1,n]A set of symbols representing the ith sub-period.

Fig. 4 is a state transition diagram consisting of a sequence of timing cycles "ij … …" of cycle length 2, which consists of a request-response pair, i.e. the symbol "ij", and a polling sequence "ababcefghefghefghefghefghefghgh … …" of cycle length 50. The polling cycle sequence is composed of four request response pairs, namely symbols 'ab', 'cd', 'ef' and 'gh', the number in the state diagram node is the frequency of each state, wherein the input state set of the state 'a' is { b, h, j } and the output state set is { b }.

According to the state transition diagram access relationship and frequency, the present embodiment divides the symbols in fig. 4 into four sub-periods: { a, b }, { c, d }, { e, f, g, h } and { i, j }, successfully separating the polling period and the timing period.

S302: the order of symbols contained in each sub-period is determined.

Determining the sequence of the symbols in each set according to the input-output relationship of the symbols in the state transition diagram and the sequence of the original symbol sequence, and sequentially setting each sub-period symbol set S_iThe following operations were carried out:

(1) first, a subset S is determined from the original symbol sequence SymSeq_iFirst symbol s in (1)_first。

(2) Then, determining s according to the access relation of the state transition diagram_firstUntil the set S is reached, this operation is repeated_iAll symbol orders in (a) are determined.

(3) Determining the last symbol s_lastIs the first symbol s of the sub-period_firstForming a complete cycle.

The above construction process is described in detail below by taking the corresponding separated subcycle { e, f, g, h } in the symbol sequence "ababcdefghefghefghefghefghefghefghefghefghefgh … …" in fig. 4 as an example:

firstly, selecting a first symbol e from the subset according to the original symbol sequence, then obtaining a next symbol f of e according to the state transition diagram, determining the sequence e → f, and then sequentially executing the determination sequence e → f → g → h. Since h is the last symbol in the symbol set and the sub-periods are cyclic, the order of h → e is determined, and finally the order relationship as shown in fig. 5 is constructed.

S303: a DFA model is constructed for each sub-period.

Optionally, the process of constructing the DFA model for each sub-period in this embodiment is as follows:

specifically, the DFA consists of five tuples (Q, sigma, delta, Q)₀F), where Q is a non-empty finite set of states, sigma is a non-empty finite set of alphabets, delta is a transfer function, Q₀(q₀E.g. Q) is an initial state, F (F e.Q) is an acceptance state set, a DFA model is respectively constructed for each sub-period, and in order to enable the DFA model to meet the actual modeling requirement, the DFA is modified by the following two items:

(1) the final state set F is removed because the input of the abnormal testing model in the detection phase is an endless repeated data stream, and the DFA model cannot be terminated unless the data stream is ended.

(2) The start state is defined as the state corresponding to the first symbol in the periodic traffic pattern.

The DFA model of this embodiment is constructed according to the obtained sub-period symbol sequence with determined sequence, and the State is ordered_iExpressed as the current state, s_iIndicating a State transition to State_iInput symbol of s_jIndicates the currently received input symbol and will cause the State to transition to State_j。

For each sub-period, the DFA model construction process is as follows:

(1) the normal conversion relation between sequences, i.e. s, is constructed first_j≠s_i+1Current State State_iUpon reception of a symbol s_jNormal transition to State_i+1。

(2) Reconstructing the retransmission transition relation between sequences, i.e. s_j＝s_iCurrent State State_iUpon reception of a symbol s_jThen, the retransmission is judged, and the current state is unchanged.

(3) Finally, the lost transition relation between sequences is constructed, i.e. s_j≠s_i+1Current State State_iUpon reception of a symbol s_jAnd then judging that the current state is lost and the current state is unchanged.

Optionally, in this embodiment, the classifying each symbol and dividing the sub-period are based on the principle of a markov chain, and a specific process of the sub-period in this embodiment is as follows:

(1) since nodes with both 1 degree of entry and exit are not branched in the state transition diagram and belong to only one subcycle set, the node V with both 1 degree of entry and exit in the state transition diagram is selected first, and according to the node frequency F_VDividing the frequency of the frequency into a set S of similar frequencies, wherein the set frequency is F_S. The symbol ≈ herein indicates frequency similarity, i.e., F_S×(1-F_T)≤F_V≤F_S×(1+F_T) In which F is_TIs the frequency threshold, in this experiment F_TSet to 0.05. If the set of frequencies does not exist, a new frequency F is created_VNode V is added to the new set.

(2) remainV represents a collection of nodes in the state diagram that are not all allocated, i.e.

Selecting a node V with out-degree or in-degree of 1 from the remainV according to the node frequency F_VIt is assigned to the set of already existing similar frequencies and node V is removed from remainV.

(3) Nodes may be members of multiple sets, representing the occurrence of symbols in multiple recurring patterns. V_inIs an entry node of node V and V_inBelonging to only one known set, V_outIs an output node of node V and V_outOnly belonging to a known set if all V_inOr all of V_outSum of the set frequencies and frequency F of node V_VSimilarly, node V is added to all relevant sets separately, while node V is removed from remainV. If the relevant set frequency is not similar to the node frequency, judging all V_inAnd V_outSum of the set frequenciesFrequency F with node V_VAnd if so, adding the node V into all the related sets respectively, and removing the node V from the remainV.

(4) If at least one access node adjacent to the node V is in the known set and the access degree of the adjacent node is 1, adding the node V into the set, and simultaneously modifying the frequency of the node to be F_V＝F_V-F_SIf F is_VIs 0, the remaining node set remainV is removed.

(5) Judging whether nodes exist in the residual node set remainV or not, if so, finding out the node with the minimum frequency from the residual nodes, creating a new set according to the frequency of the node and adding the node into the set, then adding the residual nodes into the set, and modifying the frequency of each node, if the frequency of the node is less than a threshold F_TThen node V is removed from remainV.

(6) And finally adding the residual nodes in the remainV into a known similar set according to the frequency.

The markov chain is of common general knowledge in the art and will not be described in further detail herein.

Further optionally, the present embodiment fuses the separated sub-periods.

When the traffic is mixed in multiple cycles, it is also possible to decompose one long-cycle traffic into multiple sub-cycle traffic, for example, the polling cycle with a length of 50 in fig. 4 is decomposed into 3 sub-cycles "abab … …", "cdcd … …", "efghefgh … …" by the classification algorithm, if the attacker continuously replays the "ab" sub-cycle after fully understanding the production link, the DFA model cannot detect the attack behavior, which may result in serious production accident.

In order to improve the detection capability of the model for complex semantic attacks, the separated sub-periods are further fused on the basis. The method comprises the following specific steps:

(1) obtaining all subcycle symbol set C ═ S₁,S₂,…,S_i,…,S_nIn which S is_i，i∈[1,n]The symbol set of the ith sub-period is represented, and simultaneously the model false alarm rate FPR can be obtained_orig。

(2) For all symbol sets, creating a matrix adjSet to record whether the sets try to fuse or not, and initializing the adjSet [ i ] [ j ] ═ 1 and i ═ j since the sets do not need to try to fuse; adjSet [ i ] [ j ] ═ 0, i ≠ j; and i, j ∈ [1, n ].

(3) Searching the matrix adjSet, and selecting two sets S which are not tried to be fused from the matrix C_iAnd S_jI.e., adjSet [ i ]][j]And (5) if no qualified set exists in the C, executing (6), otherwise, executing (4).

(4) For all symbols V in both sets_i∈S_iAnd V_j∈S_jIf, if

Or

Namely S_iAnd S_jIf the symbol has an in-out relationship, S is fused temporarily_iAnd S_jTo generate a new set S of symbols_tempRemoving S from C_iAnd S_jAnd mixing S_tempIs added in.

Then according to all the sub-period symbol sets S in C in sequence_tClassifying the original symbol sequence SymSeq to generate a new subsequence subSeq_iI.e. to

If V is equal to S_tsubSeq_t←V。

Finally, adopting an unsupervised learning method to carry out unsupervised learning on each subSeq_tAnd constructing a temporary DFA model.

If S_iAnd S_jIf there is no in-out relation in the middle symbol, then adjSet [ i][j]And (3) is repeatedly executed, wherein the number is 1.

(5) The false alarm rate of the temporary DFA model is FPR_tempIf FPR_temp＞FPR_origWill S_tempReduction to S_iAnd S_jSimultaneous adjSet [ i ]][j]And (3) is repeatedly executed as 1. Otherwise, confirm the fusion S_iAnd S_jAnd updating C, and repeatedly executing (2) and (3).

(6) After the above steps, the fused set C is obtained, and a false alarm (i.e. q) is formed due to the normal delay of the request packet and the response packet₁,r₁,q₂,r₂… … sequences will form q due to the normal delays of the network₁,q₂,r₁,r₂… …, which may be misinformed by the model) so that the embodiment resolves the protocol based on the matched request response set q in set C₁,r₁,q₂,r₂… … request response pair q₁,q₂… … request to indicate, a new symbol set C ' is generated and from each S in C ', a new symbol set C ' is generated_iGenerating the corresponding subSeq_iFor each subSeq, unsupervised learning method is adopted_iAnd constructing the DFA model to obtain the fused DFA model.

For the IP channels with the symbol sequence modeling completed, after the DFA sequence model corresponding to each channel is obtained, in order to enable the model to detect the time sequence attack, the time mark is added to each node in the DFA model.

The original time symbol sequence is abnormally input into the DFA model after fusion, and the DFA model is used for each sub DFA model_iRecord each State State_ijThe time stamp of the symbol.

When all time symbols are input to the end, the average time interval T of each state is obtained by the time stamp of the state record_ij.avg＝(T_ij.last-T_ij.first)/Length(T_ij) Wherein T is_ijAll timestamps representing the jth state of the ith DFA. It should be noted that each node of the sub-DFA has its own T_ij.avg。

Fig. 6 is a block diagram of an industrial control network traffic anomaly detection device according to an embodiment of the present application, where the device is applied to the anomaly detection model shown in fig. 3 in the present embodiment. The device at least comprises the following modules:

the judging module is used for inputting the symbols in the currently acquired symbol sequence into a corresponding sub-period DFA model if the symbols are detected to belong to known symbols; the sub-period DFA model is obtained by classifying symbol sequences corresponding to industrial control network traffic data to obtain symbol sets corresponding to a plurality of sub-periods and establishing a DFA model for the symbol sets in each sub-period according to a state transition relation;

For relevant details reference is made to the above-described method embodiments.

It should be noted that: in the above embodiment, when the industrial control network traffic anomaly detection apparatus performs network reconnection, only the division of the above functional modules is taken as an example, and in practical applications, the above function distribution may be completed by different functional modules as needed, that is, the internal structure of the industrial control network traffic anomaly detection apparatus is divided into different functional modules, so as to complete all or part of the above described functions. In addition, the industrial control network traffic anomaly detection device and the industrial control network traffic anomaly detection method provided by the above embodiments belong to the same concept, and specific implementation processes thereof are detailed in the method embodiments and are not described herein again.

Fig. 7 is a block diagram of an industrial control network traffic anomaly detection system according to an embodiment of the present application, where the system may be: a smartphone, a tablet, a laptop, a desktop, or a server. The industrial control network traffic abnormality detection apparatus may also be referred to as a user equipment, a portable terminal, a laptop terminal, a desktop terminal, a control terminal, etc., which is not limited in this embodiment. The system includes at least a processor and a memory.

The processor may include one or more processing cores, such as: 4 core processors, 6 core processors, etc. The processor may be implemented in at least one hardware form of DSP (Digital Signal Processing), FPGA (Field-Programmable gate array), PLA (Programmable logic array). The processor may also include a main processor and a coprocessor, where the main processor is a processor for Processing data in an awake state, and is also called a Central Processing Unit (CPU); a coprocessor is a low power processor for processing data in a standby state. In some embodiments, the processor may be integrated with a GPU (Graphics processing unit), which is responsible for rendering and drawing the content that the display screen needs to display. In some embodiments, the processor may further include an AI (Artificial Intelligence) processor for processing computing operations related to machine learning.

The memory may include one or more computer-readable storage media, which may be non-transitory. The memory may also include high speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer-readable storage medium in a memory is configured to store at least one instruction for execution by a processor to implement a method for industrial control network traffic anomaly detection provided by method embodiments herein.

In some embodiments, the industrial control network traffic anomaly detection system may further include: a peripheral interface and at least one peripheral. The processor, memory and peripheral interface may be connected by bus or signal lines. Each peripheral may be connected to the peripheral interface via a bus, signal line, or circuit board. Illustratively, peripheral devices include, but are not limited to: radio frequency circuit, touch display screen, audio circuit, power supply, etc.

Of course, the industrial control network traffic anomaly detection system may further include fewer or more components, which is not limited in this embodiment.

Optionally, the present application further provides a computer-readable storage medium, where a program is stored in the computer-readable storage medium, and the program is loaded and executed by a processor to implement the industrial control network traffic anomaly detection method according to the foregoing method embodiment.

Optionally, the present application further provides a computer product, where the computer product includes a computer-readable storage medium, where a program is stored in the computer-readable storage medium, and the program is loaded and executed by a processor to implement the industrial control network traffic anomaly detection method according to the foregoing method embodiment.

The technical features of the embodiments described above may be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the embodiments described above are not described, but should be considered as being within the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A method for detecting abnormal flow of an industrial control network is characterized by comprising the following steps:

2. The method of claim 1, wherein the determining the abnormal detection result according to the state transition result after receiving the symbol by the sub-period DFA model comprises:

after the sub-period DFA model receives the symbol, if the corresponding state is transferred to the next state of the periodic sequence from the current state, the state is judged to be normally transferred;

and under the condition of normal state transition, if the difference value between the timestamp carried by the current symbol and the average time interval in the current state is greater than a time interval deviation threshold value, detecting that the time sequence of the current symbol is abnormal.

3. The method of claim 1, wherein the determining the abnormal detection result according to the state transition result after receiving the symbol by the sub-period DFA model comprises:

and after the sub-period DFA model receives the symbol, if the occurred state transition does not appear at the expected position in the state transition process, detecting the abnormal state as 'loss'.

4. The method of claim 1, wherein the determining the abnormal detection result according to the state transition result after receiving the symbol by the sub-period DFA model comprises:

and after the sub-period DFA model receives the symbol, if the corresponding state is still the current state and the state transition does not occur, detecting that the state is abnormal for retransmission.

5. The method of claim 1, wherein an "unknown" anomaly is detected if a symbol in the currently acquired sequence of symbols belongs to an unknown symbol.

6. The method according to claim 1, wherein the matching of the message pair consisting of the request message and the response message in the industrial control network traffic data further comprises:

if the match fails, it is directly detected as a "missing exception".

7. The method of claim 1, wherein the anomaly detection model comprises:

at least two sub-period DFA models for outputting state transition results according to input symbols;

the DFA selector is used for sending the symbols corresponding to the input industrial control network flow into the corresponding sub-period DFA models according to the input symbol content and the corresponding time stamps;

and the abnormity judgment module is used for judging whether the input industrial control network flow is abnormal or not according to the state transition result output by the sub-period DFA model.

8. The method according to claim 6, wherein after the DFA model is established for the symbol set in each of the sub-periods according to the state transition relationship, the method further comprises performing fusion again for each sub-period according to the false alarm rate of the corresponding DFA model, so as to obtain a plurality of fused sub-period DFA models.

9. The utility model provides an industrial control network flow anomaly detection device which characterized in that includes:

10. The industrial control network flow abnormity detection system is characterized in that the device comprises a processor and a memory; the memory stores a program, and the program is loaded and executed by the processor to implement the steps of the industrial control network traffic anomaly detection method according to any one of claims 1-8.