CN112448947B

CN112448947B - Network anomaly determination method, equipment and storage medium

Info

Publication number: CN112448947B
Application number: CN202011247194.7A
Authority: CN
Inventors: 白岩; 李拓
Original assignee: Qianxin Technology Group Co Ltd; Secworld Information Technology Beijing Co Ltd
Current assignee: Qianxin Technology Group Co Ltd; Secworld Information Technology Beijing Co Ltd
Priority date: 2020-11-10
Filing date: 2020-11-10
Publication date: 2022-10-28
Anticipated expiration: 2040-11-10
Also published as: CN112448947A

Abstract

The present disclosure provides a method, a device and a storage medium for determining network anomaly, wherein the method comprises the following steps: acquiring network flow data; dividing the network traffic data into a plurality of groups of subdata according to a time period corresponding to the network traffic data; extracting the characteristics of multiple dimensions of each group of subdata; for each group of subdata, respectively calculating a group of information entropies corresponding to the characteristics of each dimension in the characteristics of the multiple dimensions; for each group of subdata, comparing each information entropy in a group of information entropies of each characteristic with a pre-established data range of an information entropy base line corresponding to the characteristic; calculating a difference value between a target information entropy which is not within the data range and an information entropy which is the smallest difference value between a set of information entropies and the target information entropy; calculating a risk index corresponding to each feature according to the difference value corresponding to each feature; and determining whether the network is abnormal or not according to the risk indexes corresponding to the characteristics. The method can improve the identification efficiency of the network abnormity.

Description

Network anomaly determination method, equipment and storage medium

Technical Field

The present invention relates to the field of network security technologies, and in particular, to a method, a device, and a storage medium for determining a network anomaly.

Background

Currently, some network anomalies caused by communication protocols are generally difficult to detect, for example, the network anomalies caused by the Modbus protocol. The Modbus protocol is a widely used industrial control protocol. However, when a specific industrial control system is implemented, since a developer does not have security knowledge or is unaware of security problems, various security holes may exist in the system using the Modbus protocol. For example, during communication, a node under malicious control may send out illegal data. The function code is an important content in the Modbus protocol, and the abuse of the function code is a main factor causing the abnormity of the Modbus network; illegal message lengths, short-cycle useless commands, and incorrect message lengths may cause system anomalies. Currently, white list rules are generally adopted to detect Modbus protocol anomalies. For example, a white list rule is established for attributes such as a source IP, a destination IP, a function code, and an operation address, and an alarm is generated for a message that does not match the white list rule. The detection method only aims at the microscopic detection of a single Modbus message, and does not consider the macroscopic characteristics such as the time characteristic, the frequency characteristic and the like of an industrial control system. For example, some packets, which appear once alone, are normal, but many occurrences in a short time are likely to be attacks. White list rules are not detectable for such attacks. It can be seen that a method for detecting network anomalies is yet to be proposed.

Disclosure of Invention

In view of this, one or more embodiments of the present disclosure provide a method, a device, and a storage medium for determining a network anomaly, so as to solve the problem in the related art that it is not possible to detect whether a network is anomalous based on a macro feature of network traffic data.

One or more embodiments of the present disclosure provide a network anomaly determination method, including: acquiring network flow data; dividing the network traffic data into a plurality of groups of subdata according to a time period corresponding to the network traffic data; extracting the characteristics of multiple dimensions of each group of subdata; for each group of subdata, respectively calculating a group of information entropies corresponding to the characteristics of each dimension in the characteristics of the multiple dimensions; for each group of subdata, comparing each information entropy in a group of information entropies of each characteristic with a pre-established data range of an information entropy base line corresponding to the characteristic; calculating a difference value between a target information entropy which is not in the data range and an information entropy which is the smallest difference value between the target information entropy and the information entropy in the group of information entropies; calculating a risk index corresponding to each feature according to the difference value corresponding to each feature; and determining whether the network is abnormal or not according to the risk indexes corresponding to the characteristics.

Optionally, the features include at least any three of: operating system fingerprint, equipment identification field in the inquiry message, function code field in the inquiry message, data packet length in the inquiry message, equipment identification field in the response message, function code field in the response message and data packet length of the response message.

Optionally, the risk index corresponding to each feature is calculated by the following formula:

wherein, score (x) represents a risk index corresponding to the characteristic, x represents the difference, δ is a preset threshold, n is a real number preset to be greater than 1, k is a weight coefficient, and k is an integer greater than 1.

Optionally, determining whether the network is abnormal according to the risk index corresponding to each feature includes: calculating a comprehensive risk index according to the risk index corresponding to each characteristic; determining whether the network is abnormal or not according to the comprehensive risk index; wherein the comprehensive risk index is calculated by the following formula:

wherein, score _i And (5) representing the risk index corresponding to the characteristic i, and m represents the number of the characteristics participating in the operation.

Optionally, determining whether the network is abnormal according to the risk index corresponding to each feature includes: if the calculated risk index corresponding to any one feature is larger than a first preset value, determining that the network is abnormal; or if the calculated comprehensive risk index is larger than a second preset value, determining that the network is abnormal, wherein the first preset value is larger than the second preset value.

Optionally, extracting features of multiple dimensions of each group of sub-data includes: for each group of subdata, identifying a Modbus protocol according to the characteristics of the Modbus protocol; pairing the query message and the response message of the Modbus protocol; and analyzing and extracting key fields and characteristics in the query message and the response message of the Modbus protocol.

Optionally, the extracting the features of the multiple dimensions of each group of sub-data further includes: identifying the type of an operating system according to TCP packet header information in a Transmission Control Protocol (TCP) handshaking process; identifying the manufacturer information of the network card according to the MAC address; identifying equipment manufacturer information according to data characteristics of the Modbus application layer; and carrying out hash operation on the operating system type, the network card manufacturer information and the equipment manufacturer information to obtain a hash value.

Optionally, the method further includes: acquiring network flow sample data; dividing the network traffic sample data into a plurality of groups of sub-sample data according to a time period corresponding to the network traffic sample data; and respectively establishing an information entropy baseline corresponding to the feature of each dimension in the features of the multiple dimensions for each group of subsample data.

One or more embodiments of the present disclosure also provide an electronic device, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, and when the processor executes the program, the processor implements any one of the above network anomaly determination methods.

One or more embodiments of the present disclosure also provide a non-transitory computer-readable storage medium storing computer instructions for causing the computer to perform any one of the above network anomaly determination methods.

The network anomaly determination method provided by one or more embodiments of the present disclosure groups network traffic data according to the time of network traffic data generation to obtain multiple groups of sub-data, respectively calculates the information entropy corresponding to each of the multiple dimensions of the sub-data in each group, compares the calculated information entropy with the pre-established value range of the information entropy baseline, calculates the difference between the target information entropy outside the value range of the information entropy baseline and the information entropy closest to the value in the information entropy in the group, determines the risk index corresponding to each of the features according to the difference, and determines whether the network is anomalous according to the risk index, thereby achieving the purpose of identifying the network anomaly according to the characteristics of the network traffic data and improving the identification efficiency of the network anomaly.

Drawings

Fig. 1 is a flow diagram illustrating a network anomaly determination method in accordance with one or more embodiments of the present disclosure;

fig. 2 is a block diagram of an electronic device shown in accordance with one or more embodiments of the present disclosure.

Detailed Description

For the purpose of promoting a better understanding of the objects, aspects and advantages of the present disclosure, reference is made to the following detailed description taken in conjunction with the accompanying drawings.

It should be noted that all expressions using "first" and "second" in the embodiments of the present disclosure are used for distinguishing two entities with the same name but different names or different parameters, and it should be noted that "first" and "second" are only used for convenience of expression and should not be construed as limitations of the embodiments of the present disclosure, and the following embodiments do not describe this. The word "comprising" or "comprises", and the like, means that the element or item listed before the word covers the element or item listed after the word and its equivalents, but does not exclude other elements or items. The terms "connected" or "coupled" and the like are not restricted to physical or mechanical connections, but may include electrical connections, whether direct or indirect. "upper", "lower", "left", "right", and the like are used merely to indicate relative positional relationships, and when the absolute position of the object being described is changed, the relative positional relationships may also be changed accordingly.

Fig. 1 is a flowchart illustrating a network anomaly determination method according to one or more embodiments of the present disclosure, as shown in fig. 1, the method including:

step 101: acquiring network flow data;

for example, by connecting the bypassed network packet sniffers to the mirror ports of the switch. After sniffing, the data packet can be analyzed at a high level, and information such as a connection state and an application layer protocol is recorded. A session table may be maintained, and the connected state information is recorded through session table entries, where the session table is a hash table, each table entry represents a data stream, and the table entries include various information uniquely identifying a data stream, such as MAC (Media Access Control Address ), IP (Internet Protocol, internet Protocol) Address, TTL (Time To Live), source and destination operating system types, and application layer Protocol types.

Step 102: dividing the network traffic data into a plurality of groups of subdata according to a time period corresponding to the network traffic data;

taking the network flow data of the Modbus protocol as an example, the protocol is an industrial control protocol, and the data characteristics are greatly influenced by the production period of a factory. The time slices can be reasonably divided according to the production cycle of the factory, and each time slice is called a time period. For example, 8% in factory, 00-12. Due to the significant difference between the characteristics of the start-up time and the midday break time industrial control protocol, the working day of 8. Within a time period, a time window is capable of framing a time series according to a specified unit length, thereby calculating a statistical indicator within the frame. For example, a slider corresponding to a designated length may be slid on the scale, and data in the slider may be fed back every unit of sliding.

For example, the time period corresponding to the network traffic data is 8.

Step 103: extracting the characteristics of multiple dimensions of each group of subdata;

step 104: for each group of subdata, respectively calculating a group of information entropies corresponding to the features of each dimension in the features of the multiple dimensions;

the following takes the device _ id field of the Modbus protocol as an example to explain the calculation process of the information entropy:

and establishing a hash table for quickly searching the device _ id.

Monitoring Modbus flow in a period of time, and extracting the device _ id value of each Modbus message. Querying the device _ id value through a hash table, and if the device _ id appears before, increasing the count of the corresponding node in the hash table by 1; otherwise, a new hash node is created and the reference count is set to 1. The purpose of this step is to count the counts that occur for each different device _ id value over a period of time.

And traversing the hash table, and calculating the occurrence probability of each device _ id value. The probability calculation formula is the percentage of each device _ id count in all device _ id counts.

Calculating a final information entropy value: the formula for calculating the entropy value is as follows:

wherein, P represents the probability of a certain device _ id, and all P log P are accumulated and taken as negative numbers, thus obtaining the overall information entropy value.

It should be noted that, each group of sub-data includes a plurality of characteristics of a certain dimension acquired in a time period, so in step 104, for each group of sub-data, the characteristic corresponding information entropy of each dimension of the characteristics of the plurality of dimensions is calculated as a plurality of information entropies.

Step 105: for each group of subdata, comparing each information entropy in a group of information entropies of each characteristic with a pre-established data range of an information entropy base line corresponding to the characteristic;

for example, the information entropy baselines corresponding to the features may be constructed in advance based on sample data of the network traffic data.

In step 105, each information entropy in the set of information entropies of each feature may be sequentially compared to a pre-established value range of the information entropy baseline corresponding to each feature to determine whether each information entropy is within the data range.

Step 106: calculating a difference value between a target information entropy which is not in the data range and an information entropy which is the smallest difference value between the target information entropy and the information entropy in the group of information entropies;

for example, for a certain characteristic a in a certain set of sub-data (the time period corresponding to the sub-data is 9-00 a.m. ₁ ，h ₂ …h _i Suppose, wherein h ₃ Not feature a at 9 am: the information entropy base line corresponding to 00-10 ₃ In a set of information entropies and h ₃ The information entropy with the closest value is h ₁ Then calculate h ₃ And h ₁ The difference between them is the above difference.

Step 107: calculating a risk index corresponding to each feature according to the difference value corresponding to each feature;

for example, the difference corresponding to each feature may measure the fluctuation of each feature over a period of time, and the larger the fluctuation is, the larger the risk that the network abnormality may be caused is, so that the risk index corresponding to each feature and the difference corresponding to each feature form a positive correlation relationship, and the risk index corresponding to each feature is calculated according to the difference corresponding to each feature by using a preset coefficient.

Step 108: and determining whether the network is abnormal or not according to the risk indexes corresponding to the characteristics.

For example, whether the network is abnormal or not may be determined according to whether the risk index corresponding to each feature exceeds a preset numerical value or not, or alternatively, a comprehensive risk index may be calculated according to the risk index corresponding to each feature through a preset algorithm, and whether the network is abnormal or not may be determined according to whether the comprehensive risk index exceeds another preset numerical value or not.

In one or more embodiments of the disclosure, the features may include at least any three of: operating system fingerprint, equipment identification field in the inquiry message, function code field in the inquiry message, data packet length in the inquiry message, equipment identification field in the response message, function code field in the response message and data packet length of the response message.

Still taking the Modbus protocol as an example, the characteristics of multiple dimensions may include, for example: a device _ id field, a function code field, and a packet length in a Query message of the Modbus protocol, a device _ id field, a function code field, a packet length in a Response message, and an operating system fingerprint.

In one or more embodiments of the present disclosure, the risk index corresponding to each feature may be calculated by the following formula:

wherein, score (x) represents a risk index corresponding to the characteristic, x represents the difference, δ is a preset threshold, n is a real number preset to be greater than 1, k is a weight coefficient, and k is an integer greater than 1. If the deviation degree of the information entropy is smaller than the threshold value delta, the information entropy is not considered to be an attack, the parameter can be configured by a system user, and the smaller delta is, the lower the missing report rate is, and the higher the false report rate is. n may be typically built in at the factory according to the operating context of the software. To increase the computation speed, n can be chosen to be the natural logarithm base e or 2.K can be preset by the equipment manufacturer.

For example, in the stage of detecting network abnormality, the network traffic data is divided into a plurality of sub-data according to the equal time period TAnd a group for calculating information entropy for each dimension (denoted as i) of each group t, and for each calculated information entropy group (called an information entropy group because the calculated dimension has a plurality of characteristics), denoted as h ₁ ,h ₂ …h _i . For example, for information entropy h _i H in (1) _i1 According to h _i1 The time period and the characteristic are searched, the information entropy base line database is searched, and h is searched _i1 Determining the information entropy h according to the numerical range of the information entropy baseline _i1 Within a reasonable range. If h _i1 If the entropy is not within the range of reasonable information entropy, h is calculated _i1 The difference from the range of information entropy in hi that is numerically closest to it is denoted as x.

In one or more embodiments of the present disclosure, determining whether an abnormality exists in the network according to the risk index corresponding to each feature may include:

calculating a comprehensive risk index according to the risk index corresponding to each characteristic;

determining whether the network is abnormal or not according to the comprehensive risk index;

wherein the comprehensive risk index is calculated by the following formula:

wherein, score _i And m represents the number of the features participating in the operation. It can be seen that the network anomaly determination method provided in one or more embodiments of the present disclosure has a low computational complexity, and performs computation based on the above multidimensional characteristics, and has a high detection rate for functional code injection attack, denial of service attack, and functional code abuse attack on a Modbus protocol, and a certain detection capability for buffer overflow attack.

In one or more embodiments of the present disclosure, determining whether an abnormality exists in the network according to the risk index corresponding to each of the features may include:

if the calculated risk index corresponding to any one of the characteristics is larger than a first preset value, determining that the network is abnormal;

or if the calculated comprehensive risk index is larger than a second preset value, determining that the network is abnormal, wherein the first preset value is larger than the second preset value. For example, for the features of the plurality of dimensions, the risk index corresponding to each feature is calculated, and the obtained result of the risk index is represented as a set S.

Defining a threshold value eta ₁ (as an example of the first preset value described above), η ₂ (as an example of the second preset value described above), and η ₁ >η ₂ ；

If there is at least one h _i ，h _i Is e.g. S, and h _i >η ₁ If the network is abnormal, the alarm is generated. Or, if

And (4) considering the network to be abnormal, and generating an alarm.

In one or more embodiments of the present disclosure, extracting the features of the multiple dimensions of the groups of sub-data may include:

identifying the Modbus protocol for each group of subdata according to the characteristics of the Modbus protocol;

matching the query message and the response message of the Modbus protocol;

and analyzing and extracting key fields and characteristics in the query message and the response message of the Modbus protocol. For example, the device _ id field and the function code field corresponding to the query message of the listed Modbus protocol and the packet length corresponding to the query message of the Modbus protocol may be analyzed and extracted, and the device _ id field, the function code field and the packet length in the response message of the listed Modbus protocol may be analyzed and extracted. The analyzed data can be recorded in the session table entry corresponding to the connection.

In one or more embodiments of the present disclosure, extracting the features of the multiple dimensions of the each group of sub-data may further include:

identifying the type of an operating system according to TCP packet header information in a TCP handshake process; for example, the operating system type may be identified according to TCP header information in a TCP three-way handshake process;

identifying the manufacturer information of the network card according to the MAC address;

identifying equipment manufacturer information according to data characteristics of the Modbus application layer;

and carrying out hash operation on the operating system type, the network card manufacturer information and the equipment manufacturer information to obtain a hash value. For example, an operating system fingerprint may also be recorded in the session table.

In one or more embodiments of the present disclosure, the network anomaly determination method may further include:

acquiring network flow sample data; for example, historical network traffic data may be obtained as network traffic sample data, which may be different from the network traffic data obtained in step 101 above.

Dividing the network traffic sample data into a plurality of groups of sub-sample data according to a time period corresponding to the network traffic sample data; in this step, the dividing manner of the sub-sample data is consistent with the dividing manner of the sub-data in the foregoing, and details are not repeated here.

And respectively establishing an information entropy baseline corresponding to the feature of each dimension in the features of the multiple dimensions for each group of subsample data.

The information entropy learning process of the device _ id field of the Modbus protocol is taken as an example to describe the baseline establishing process.

Dividing the network flow data in the learning period into a plurality of groups according to equal time period T, and recording T ₁ ，t ₂ ，…，t _n And respectively calculating information entropy for device _ id field in each time packet, and recording the information entropy as h ₁ ，h ₂ ，…，h _n 。

Go through all h, calculate Δ _i ＝h _i –h _i-1 And a maximum of _i Is recorded as Delta _max 。

Traverse all hAnd the difference is smaller than the threshold value delta ₁ Are grouped together. The merged information entropy becomes a plurality of groups h _a ,(h _b ,h _c )(h _d ,h _e …h _x )；

And traversing each group, and respectively taking the maximum value and the minimum value in the group as reasonable range values of the information entropy.

For example, the following information entropy values:

0.71,0.92,0.93,0.94

the reasonable range of the information entropy obtained after merging according to the threshold value of 0.1 is 0.71,0.92-0.94

After learning is finished, the learning result, namely the corresponding relation between the time period and the reasonable range of the information entropy (namely an example of the numerical range) is recorded in the information entropy baseline database.

In the baseline learning process, seven dimensions, such as operating system fingerprints, device _ id fields and function code fields in Query messages, data packet lengths, device _ id fields, function code fields and data packet lengths in Response messages, are selected to respectively establish an information entropy baseline.

Because of the single and repeated attributes of the industrial production process, the information entropy value presented by the industrial control protocol is relatively fixed, so that the identification of whether the network is abnormal or not based on the information entropy is more suitable for an industrial control system. The information entropy of the seven dimensions is selected, and the main field information of the Modbus protocol is covered, so that the true entropy value can be reflected in a larger range, and the possibility of misinformation is reduced.

To facilitate understanding of the network anomaly determination method of one or more embodiments of the present disclosure, the entire flow of the method is briefly described below by way of an example. In this example, the method includes the following processes:

installing a network data packet sniffer on a mirror image port of a switch in an industrial control network, so that the sniffer can sniff all flow data in the network;

starting a learning mode, learning seven-dimensional information entropy fluctuation ranges of fingerprints of an operating system, a device _ id field, a function code field and a data packet length in each time period and a device _ id field, a function code field, a data packet length and the like in a Response message, and establishing a base line;

and starting a detection mode, and detecting whether information entropies of seven dimensions, such as operating system fingerprints in the current time period, device _ id fields and function code fields in the Query message, the length of the data packet, device _ id fields and function code fields in the Response message, the length of the data packet and the like, deviate from the baseline. And if the information entropy deviation exists, calculating the risk index and the comprehensive risk index of each dimension, and judging whether the network is abnormal or not according to the risk index and/or the total risk index of each dimension.

It should be noted that the method of the embodiment of the present disclosure may be executed by a single device, such as a computer or a server. The method of the embodiment can also be applied to a distributed scene and completed by the mutual cooperation of a plurality of devices. In such a distributed scenario, one of the multiple devices may only perform one or more steps of the method of the embodiments of the present disclosure, and the multiple devices interact with each other to complete the method.

The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.

Fig. 2 is a schematic diagram illustrating a more specific hardware structure of an electronic device according to this embodiment, where the electronic device may include: a processor 1010, a memory 1020, an input/output interface 1030, a communication interface 1040, and a bus 1050. Wherein the processor 1010, memory 1020, input/output interface 1030, and communication interface 1040 are communicatively coupled to each other within the device via bus 1050.

The processor 1010 may be implemented by a general-purpose CPU (Central Processing Unit), a microprocessor, an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits, and is configured to execute related programs to implement the technical solutions provided in the embodiments of the present disclosure.

The Memory 1020 may be implemented in the form of a ROM (Read Only Memory), a RAM (Random Access Memory), a static Memory device, a dynamic Memory device, or the like. The memory 1020 may store an operating system and other application programs, and when the technical solutions provided by the embodiments of the present specification are implemented by software or firmware, the relevant program codes are stored in the memory 1020 and called by the processor 1010 for execution.

The input/output interface 1030 is used for connecting an input/output module to input and output information. The i/o module may be configured as a component in a device (not shown) or may be external to the device to provide a corresponding function. The input devices may include a keyboard, a mouse, a touch screen, a microphone, various sensors, etc., and the output devices may include a display, a speaker, a vibrator, an indicator light, etc.

The communication interface 1040 is used for connecting a communication module (not shown in the drawings) to implement communication interaction between the present device and other devices. The communication module can realize communication in a wired mode (for example, USB, network cable, etc.), and can also realize communication in a wireless mode (for example, mobile network, WIFI, bluetooth, etc.).

Bus 1050 includes a path that transfers information between various components of the device, such as processor 1010, memory 1020, input/output interface 1030, and communication interface 1040.

It should be noted that although the above-mentioned device only shows the processor 1010, the memory 1020, the input/output interface 1030, the communication interface 1040 and the bus 1050, in a specific implementation, the device may also include other components necessary for normal operation. In addition, those skilled in the art will appreciate that the above-described apparatus may also include only the components necessary to implement the embodiments of the present disclosure, and need not include all of the components shown in the figures.

Computer-readable media, including both permanent and non-permanent, removable and non-removable media, for storing information may be implemented in any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. Those of ordinary skill in the art will understand that: the discussion of any embodiment above is meant to be exemplary only, and is not intended to intimate that the scope of the disclosure, including the claims, is limited to these examples; within the idea of the present disclosure, features in the above embodiments or in different embodiments may also be combined, steps may be implemented in any order, and there are many other variations of the different aspects of the present disclosure as described above, which are not provided in detail for the sake of brevity.

In addition, well known power/ground connections to Integrated Circuit (IC) chips and other components may or may not be shown in the provided figures for simplicity of illustration and discussion, and so as not to obscure the disclosure. Furthermore, devices may be shown in block diagram form in order to avoid obscuring the disclosure, and also in view of the fact that specifics with respect to implementation of such block diagram devices are highly dependent upon the platform within which the present disclosure is to be implemented (i.e., specifics should be well within purview of one skilled in the art). Where specific details (e.g., circuits) are set forth in order to describe example embodiments of the disclosure, it should be apparent to one skilled in the art that the disclosure can be practiced without, or with variation of, these specific details. Accordingly, the description is to be regarded as illustrative instead of restrictive.

While the present disclosure has been described in conjunction with specific embodiments thereof, many alternatives, modifications, and variations of these embodiments will be apparent to those of ordinary skill in the art in light of the foregoing description. For example, other memory architectures, such as Dynamic RAM (DRAM), may use the discussed embodiments.

The embodiments of the present disclosure are intended to embrace all such alternatives, modifications and variances that fall within the broad scope of the appended claims. Therefore, any omissions, modifications, equivalents, improvements, and the like that may be made within the spirit and principles of the disclosure are intended to be included within the scope of the disclosure.

Claims

1. A method for determining network anomalies, comprising:

acquiring network flow data;

dividing the network traffic data into a plurality of groups of subdata according to a time period corresponding to the network traffic data;

extracting the characteristics of multiple dimensions of each group of subdata;

for each group of subdata, respectively calculating a group of information entropies corresponding to the characteristics of each dimension in the characteristics of the multiple dimensions;

for each group of subdata, comparing each information entropy in a group of information entropies of each characteristic with a pre-established data range of an information entropy base line corresponding to the characteristic;

calculating a difference value between a target information entropy which is not in the data range and an information entropy which is the smallest difference value between the target information entropy and the information entropy in the group of information entropies;

calculating a risk index corresponding to each feature according to the difference value corresponding to each feature;

and determining whether the network is abnormal or not according to the risk indexes corresponding to the characteristics.

2. The method of claim 1, wherein the features include at least any three of:

operating system fingerprint, equipment identification field in the query message, function code field in the query message, data packet length in the query message, equipment identification field in the response message, function code field in the response message, and data packet length of the response message.

3. The method of claim 1, wherein the risk index for each feature is calculated by the formula:

4. The method of claim 3, wherein determining whether the network is abnormal according to the risk index corresponding to each feature comprises:

wherein the comprehensive risk index is calculated by the following formula:

wherein, score _i And m represents the number of the features participating in the operation.

5. The method of claim 4, wherein determining whether the network is abnormal according to the risk index corresponding to each feature comprises:

if the calculated risk index corresponding to any one feature is larger than a first preset value, determining that the network is abnormal;

or if the calculated comprehensive risk index is larger than a second preset value, determining that the network is abnormal, wherein the first preset value is larger than the second preset value.

6. The method of claim 1, wherein extracting features of the sets of sub-data in multiple dimensions comprises:

matching the query message and the response message of the Modbus protocol;

and analyzing and extracting key fields and characteristics in the query message and the response message of the Modbus protocol.

7. The method of claim 6, wherein extracting features of the sets of sub-data in multiple dimensions further comprises:

identifying the type of an operating system according to TCP packet header information in a Transmission Control Protocol (TCP) handshaking process;

identifying network card manufacturer information according to the MAC address;

and carrying out hash operation on the operating system type, the network card manufacturer information and the equipment manufacturer information to obtain a hash value.

8. The method of claim 1, further comprising:

acquiring network flow sample data;

dividing the network traffic sample data into a plurality of groups of sub-sample data according to a time period corresponding to the network traffic sample data;

and respectively establishing an information entropy baseline corresponding to the feature of each dimension in the features of the multiple dimensions for each group of sub-sample data.

9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the network anomaly determination method according to any one of claims 1 to 8 when executing the program.

10. A non-transitory computer-readable storage medium storing computer instructions for causing a computer to execute the network anomaly determination method according to any one of claims 1 to 8.