CN117955729A

CN117955729A - Flow-based malicious software detection method and device and electronic equipment

Info

Publication number: CN117955729A
Application number: CN202410267255.8A
Authority: CN
Inventors: 张长河
Original assignee: Beijing Weida Information Technology Co ltd
Current assignee: Beijing Weida Information Technology Co ltd
Priority date: 2024-03-08
Filing date: 2024-03-08
Publication date: 2024-04-30

Abstract

The application provides a flow-based malicious software detection method and device and electronic equipment, and relates to the technical field of detection technology. In the method, a network traffic data set aiming at target equipment is obtained, wherein the network traffic data set comprises traffic basic data and data set attribute data; extracting flow basic characteristics from flow basic data by adopting a preset flow detection model, wherein the flow basic characteristics comprise flow size characteristics, duration characteristics and transmission rate characteristics; extracting data set attribute characteristics from the data set attribute data by adopting a preset flow detection model, wherein the data set attribute characteristics comprise source IP address characteristics, target IP address characteristics, port characteristics and protocol type characteristics; respectively carrying out feature comparison on the flow basic features and the data set attribute features through a preset flow detection model to obtain comparison results; and if the comparison result is determined to indicate that the flow basic feature and/or the data set attribute feature have abnormal features, determining that potential malicious software exists in the target equipment. By implementing the technical scheme provided by the application, the detection accuracy of the malicious software is improved conveniently.

Description

Flow-based malicious software detection method and device and electronic equipment

Technical Field

The application relates to the technical field of detection technology, in particular to a flow-based malicious software detection method and device and electronic equipment.

Background

With the development of technology, software for users has grown endlessly, and such software installed in user devices often needs to be authorized before personal information of the users can be used. However, the malicious software often steals personal information of the user, such as a user name, a password, a bank account and other sensitive information, thereby bringing financial loss and privacy disclosure risks to the user. Through malicious software detection, the threats can be timely found and cleared, and the privacy and information safety of users are protected.

Currently, conventional malware detection methods typically only focus on a single feature or indicator, such as file hash value, process behavior, etc., while ignoring other possible features. This limitation makes the detection method vulnerable to attack, resulting in lower detection accuracy.

Therefore, a traffic-based malware detection method, a traffic-based malware detection device and an electronic device are urgently needed.

Disclosure of Invention

The application provides a flow-based malicious software detection method and device and electronic equipment, which are convenient for improving the detection accuracy of malicious software.

In a first aspect of the present application, there is provided a traffic-based malware detection method, the method comprising: acquiring a network traffic data set aiming at target equipment, wherein the network traffic data set comprises traffic basic data and data set attribute data; extracting flow basic characteristics from the flow basic data by adopting a preset flow detection model, wherein the flow basic characteristics comprise flow size characteristics, duration characteristics and transmission rate characteristics; extracting data set attribute characteristics from the data set attribute data by adopting the preset flow detection model, wherein the data set attribute characteristics comprise source IP address characteristics, target IP address characteristics, port characteristics and protocol type characteristics; respectively comparing the flow basic characteristics with the data set attribute characteristics through the preset flow detection model to obtain comparison results; and if the comparison result is determined to indicate that the flow basic feature and/or the data set attribute feature have abnormal features, determining that potential malicious software exists in the target equipment.

By adopting the technical scheme, because the flow is to acquire and analyze network traffic data in real time, the flow means that the flow can timely discover the existence of potential malicious software and respond quickly. This real-time nature is critical for rapidly coping with cyber threats. By using a preset flow detection model to perform feature extraction and feature comparison, the process reduces the need of manual participation and improves the automation and intelligent level of detection. The method not only can reduce the consumption of human resources, but also can reduce human errors and improve the detection accuracy. Since the flow considers various flow characteristics and data set attributes, the flow can be adaptively detected in complex and changeable network environments. This enables it to cope with evolving malware threats, maintaining the validity of the detection. By identifying traffic base features and anomalies in data set attributes, this process can be forewarned before malware actually causes damage. The early warning function can help enterprises and individuals take measures in advance, and damage of malicious software to systems and data is prevented. This flow takes into account not only the traffic base data (traffic size, duration and transmission rate) but also the data set attribute data (source IP address, destination IP address, port and protocol type). The comprehensive analysis method enables detection to cover more potential malicious behaviors, and improves detection accuracy.

Optionally, the acquiring the network traffic data set for the target device specifically includes: receiving original data sent by the target equipment; and carrying out data processing on the original data to obtain the network flow data set, wherein the data processing comprises data cleaning, data deduplication and data integration.

By adopting the technical scheme, invalid, wrong or repeated information in the original data can be removed by data cleaning, and the quality and accuracy of the data in subsequent analysis are ensured. The repeated record in the original data can be removed by data de-duplication, and repeated processing of the same data in subsequent analysis is avoided, so that the analysis efficiency is improved. This is particularly important when processing large amounts of network traffic data, as repeated data can add significant complexity and time costs to the analysis. Data integration is the merging of raw data of different sources or formats into one unified data set, which facilitates subsequent feature extraction and model training. By integrating the data, the relevance and consistency between different data can be ensured. In addition, the data integration can realize the standardization of the data, namely, the data in different formats or units are converted into a unified format or unit, so that the subsequent analysis is convenient. The network flow data set obtained after data cleaning, deduplication and integration has higher quality and consistency, which provides a solid foundation for subsequent feature extraction, model training and malware detection. Such a data base can ensure the accuracy and reliability of the analysis results. By preprocessing the original data, the data volume of subsequent analysis can be greatly reduced, so that the efficiency of malware detection is improved. This is particularly important for real-time or large-scale network traffic analysis, as it can reduce the consumption of computing resources and time while ensuring detection accuracy.

Optionally, the extracting the flow basic feature from the flow basic data by using a preset flow detection model specifically includes: according to the flow basic data, an average group flow value, a minimum group flow value and a maximum group flow value are counted to obtain the flow size characteristics; calculating the starting time and the ending time of the flow according to the flow basic data to obtain the duration characteristic; calculating the number of bytes transmitted by the flow in unit time according to the flow basic data to obtain the transmission rate characteristic; and fusing the flow size characteristic, the duration characteristic and the transmission rate characteristic to obtain the flow basic characteristic.

By adopting the technical scheme, the scale distribution characteristics of the flow can be comprehensively captured by counting the average group flow value, the minimum group flow value and the maximum group flow value, and the abnormal large-flow or small-flow behavior can be found. Calculating the start time and the end time of the traffic can reflect the duration characteristics of the traffic, and is important for identifying persistent malicious traffic activities. And the number of bytes transmitted by the flow in unit time is calculated, so that the transmission rate characteristic of the flow can be embodied, and the sudden and high-speed flow activity can be recognized. And fusing the flow size characteristic, the duration characteristic and the transmission rate characteristic to obtain a comprehensive flow basic characteristic. The flow characteristics of multiple aspects can be comprehensively considered through the fusion, so that the malicious software detection model can more comprehensively know the behavior mode of the flow, and the detection accuracy is improved. Meanwhile, the fusion characteristic can simplify the input of the model, reduce the complexity of the model and improve the detection efficiency. By extracting the flow basic features, powerful support can be provided for subsequent malware detection. These features may be provided as input data to a machine learning algorithm or statistical model for training and constructing a malware detection model. By analyzing these features, the model can identify abnormal network traffic behavior, thereby discovering potential malware activity.

Optionally, the comparing, by the preset flow detection model, the flow basic feature and the data set attribute feature respectively to obtain a comparison result specifically includes: calculating the duration corresponding to the duration characteristic; calculating a flow fluctuation difference value corresponding to the flow size characteristic; calculating the flow transmission rate corresponding to the transmission rate characteristic; judging the magnitude relation between the frequency of the flow fluctuation difference and a preset frequency threshold value and the magnitude relation between the flow transmission rate and a preset rate threshold value within the duration time, and obtaining the comparison result; if it is determined that the comparison result indicates that the traffic base feature and/or the data set attribute feature have abnormal features, determining that potential malicious software exists in the target device specifically includes: and if the frequency is determined to be greater than or equal to the preset frequency threshold value and the traffic transmission rate is determined to be greater than or equal to the preset rate threshold value, determining that the traffic basic feature has abnormal features so as to determine that the malicious software exists in the target equipment.

By adopting the technical scheme, the anomalies in the basic flow characteristics and the attribute characteristics of the data set can be accurately identified by calculating specific indexes such as duration, flow fluctuation difference value, flow transmission rate and the like and comparing the specific indexes with a preset threshold value. The comparison method based on the specific indexes reduces the possibility of false alarm and false omission and improves the accuracy of malicious software detection. The preset frequency threshold and the preset speed threshold can be adjusted according to actual needs, so that the detection model is more flexible, and different network environments and flow modes can be adapted. Meanwhile, the sensitivity and the specificity of detection can be balanced by adjusting the threshold value so as to meet different safety requirements. When the frequency is greater than or equal to a preset frequency threshold and the traffic transmission rate is greater than or equal to a preset rate threshold, it may be determined that the traffic base feature has an abnormal feature, thereby determining that potential malware is present in the target device. The comprehensive judgment based on the multiple indexes improves the effectiveness of malicious software identification, and reduces the misjudgment possibly caused by a single index. By monitoring network traffic in real time and comparing the characteristics of the network traffic, potential malicious software activities can be found in time, so that corresponding measures are taken for prevention and treatment. This helps reduce damage to the system and data by malware, protecting network security and user privacy. Through characteristic comparison and analysis of network traffic, the activity mode and trend of the malicious software can be known, and a basis is provided for formulating an effective security policy. This helps businesses and individuals better address network threats, increasing the overall network security level.

Optionally, the comparing, by the preset flow detection model, the flow basic feature and the data set attribute feature respectively to obtain a comparison result specifically includes: obtaining a source IP address, a target IP address, a port and a protocol type according to the attribute characteristics of the data set; comparing the source IP address, the target IP address, the port and the protocol type with a preset feature set, and obtaining the comparison result, wherein the preset feature set comprises the source IP address, a preset target IP address, a preset port and a preset protocol type, and is a normal feature pre-stored in the preset flow detection model; if it is determined that the comparison result indicates that the traffic base feature and/or the data set attribute feature have abnormal features, determining that potential malicious software exists in the target device specifically includes: if any one of the source IP address, the target IP address, the port and the protocol type is not consistent with the corresponding feature in the preset feature set, determining that the data set attribute feature has an abnormal feature, so as to determine that the malicious software exists in the target equipment.

By adopting the technical scheme, the abnormal characteristics which are inconsistent with the normal behaviors can be accurately identified by comparing the attribute characteristics (source IP address, target IP address, port and protocol type) of the data set with the normal characteristics (preset characteristic group) which are pre-stored in the preset flow detection model. The comparison method is beneficial to reducing false alarms and improving the accuracy of malicious software detection. The preset feature set can be updated and adjusted according to the actual network environment and security requirements. This means that when new malware behavior patterns occur, these changes can be accommodated by updating the preset feature set, maintaining the timeliness and effectiveness of the detection model. This flexibility and scalability enables the detection model to accommodate evolving network threats. By comparing multiple data set attribute characteristics of source IP address, destination IP address, port, protocol type, etc., various aspects of network traffic may be covered over the whole area. This comprehensive detection approach facilitates the discovery of multiple types of malware activity, including but not limited to attacks based on a particular IP address, port, or protocol type. Once the presence of the anomalous signature in the dataset attribute signature is discovered, the presence of potential malware in the target device can be quickly determined. This quick response capability facilitates timely action to prevent further damage to the system and data by malware. By predefining and storing normal features (preset feature groups), the detection flow can be simplified, and the calculation amount and complexity of real-time analysis can be reduced. This simplification helps to improve detection efficiency, making malware detection faster and more efficient.

Optionally, training a preset flow detection model before extracting flow basic features from the flow basic data by adopting the preset flow detection model; training the preset flow detection model specifically comprises the following steps: processing the historical flow basic data and the historical data set attribute data to obtain a plurality of characteristic values; taking the characteristic value which is larger than a preset threshold value in the plurality of characteristic values as a representative characteristic value; taking the historical flow basic data and the historical data set attribute data corresponding to the representative feature values as training samples; inputting the training sample to an input layer of a neural network, and performing supervision training on a corresponding comparison result to an output layer of the neural network to obtain an initial flow detection model; and detecting the precision of the initial flow detection model, and selecting a model with precision meeting preset conditions as the preset flow detection model.

By adopting the technical scheme, the model can learn the behavior mode and the characteristics of the normal network flow by training through adopting the historical flow basic data and the historical data set attribute data. Meanwhile, the representative feature value is screened out by setting a preset threshold value, so that the most important feature is ensured to be focused on by the model, and the detection accuracy is improved. In addition, the neural network is used for supervision training, and the comparison result is used as feedback of an output layer, so that the prediction capability and reliability of the model can be further improved. The initial flow detection model obtained through training can carry out precision detection, and only the model meeting the preset condition can be selected as the preset flow detection model. This process ensures that the selected model not only performs well on the current data set, but also has some generalization capability, i.e., is able to cope with new, unseen network traffic data. The historical traffic base data and the historical data set attribute data are fully utilized to train the model, which means that whether the data is normal traffic or abnormal traffic can be used to train the model, thereby improving the utilization of the data. Because the model is trained through neural networks, it can accommodate a variety of complex flow patterns and features. In addition, parameters of the neural network can be adjusted through training, so that the model can flexibly cope with different network environments and safety requirements. Through a preset flow detection model, flow basic characteristics and data set attribute characteristics can be rapidly and accurately extracted from a large amount of network flow data, so that the efficiency of malware detection is improved.

Optionally, the method further comprises: receiving malicious software marking data sent by the target equipment; storing the malicious software marking data into the preset flow detection model.

By adopting the technical scheme, the flow detection model can acquire the latest malicious software activity information in real time by receiving the malicious software marking data sent by the target equipment. This means that the model can be continually adjusted and optimized to the malware behavior in the actual network environment, thereby preserving its timeliness and accuracy. The malicious software marking data are stored in the preset flow detection model, so that the model can be more suitable for the current network security threat. With the evolution of network attack means, the behavior patterns of malware change. By constantly learning and updating, the model can better address these new threats, improving the ability to detect malware. Malware marking data typically contains exact information and features about the malware, and its inclusion in the model may enhance the recognition capabilities of the model. The model can utilize the marking data to optimize the detection algorithm, thereby more accurately identifying the activity of the malicious software and reducing the situations of false alarm and missing report. If a plurality of target devices send the malicious software marking data to the same preset flow detection model, the devices can realize data sharing and cooperation. The method not only helps to promote the security of the whole network environment, but also can promote the cooperation between different organizations and individuals to jointly cope with network security challenges. By automatically receiving and processing malware marking data sent by a target device, the need for manual operation and intervention can be simplified. This not only improves work efficiency, but also reduces the possibility of human error, making the malware detection process more automated and efficient.

In a second aspect of the present application, a traffic-based malware detection apparatus is provided, where the malware detection apparatus includes an acquisition module and a processing module, where the acquisition module is configured to acquire a network traffic data set for a target device, where the network traffic data set includes traffic base data and data set attribute data; the processing module is further used for extracting flow basic characteristics from the flow basic data by adopting a preset flow detection model, wherein the flow basic characteristics comprise flow size characteristics, duration characteristics and transmission rate characteristics; the processing module is further configured to extract a data set attribute feature from the data set attribute data by using the preset flow detection model, where the data set attribute feature includes a source IP address feature, a target IP address feature, a port feature, and a protocol type feature; the processing module is further used for respectively comparing the flow basic characteristics with the data set attribute characteristics through the preset flow detection model to obtain comparison results; the processing module is further configured to determine that potential malware exists in the target device if it is determined that the comparison result indicates that the traffic base feature and/or the data set attribute feature has an abnormal feature.

In a third aspect of the application there is provided an electronic device comprising a processor, a memory for storing instructions, a user interface and a network interface, both for communicating to other devices, the processor being for executing instructions stored in the memory to cause the electronic device to perform a method as described above.

In a fourth aspect of the application there is provided a computer readable storage medium storing instructions which, when executed, perform a method as described above.

In summary, one or more technical solutions provided in the embodiments of the present application at least have the following technical effects or advantages:

Since the process is to acquire and analyze network traffic data in real time, this means that it can discover the presence of potential malware in a timely manner and respond quickly. This real-time nature is critical for rapidly coping with cyber threats. By using a preset flow detection model to perform feature extraction and feature comparison, the process reduces the need of manual participation and improves the automation and intelligent level of detection. The method not only can reduce the consumption of human resources, but also can reduce human errors and improve the detection accuracy. Since the flow considers various flow characteristics and data set attributes, the flow can be adaptively detected in complex and changeable network environments. This enables it to cope with evolving malware threats, maintaining the validity of the detection. By identifying traffic base features and anomalies in data set attributes, this process can be forewarned before malware actually causes damage. The early warning function can help enterprises and individuals take measures in advance, and damage of malicious software to systems and data is prevented. This flow takes into account not only the traffic base data (traffic size, duration and transmission rate) but also the data set attribute data (source IP address, destination IP address, port and protocol type). The comprehensive analysis method enables detection to cover more potential malicious behaviors, and improves detection accuracy.

Drawings

Fig. 1 is a flow chart of a flow-based malware detection method according to an embodiment of the present application.

Fig. 2 is another flow chart of a flow-based malware detection method according to an embodiment of the present application.

Fig. 3 is a schematic block diagram of a flow-based malware detection device according to an embodiment of the present application.

Fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Reference numerals illustrate: 31. an acquisition module; 32. a processing module; 41. a processor; 42. a communication bus; 43. a user interface; 44. a network interface; 45. a memory.

Detailed Description

In order that those skilled in the art will better understand the technical solutions in the present specification, the technical solutions in the embodiments of the present specification will be clearly and completely described below with reference to the drawings in the embodiments of the present specification, and it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments.

In describing embodiments of the present application, words such as "for example" or "for example" are used to mean serving as examples, illustrations, or descriptions. Any embodiment or design described herein as "such as" or "for example" in embodiments of the application should not be construed as preferred or advantageous over other embodiments or designs. Rather, the use of words such as "or" for example "is intended to present related concepts in a concrete fashion.

In the description of embodiments of the application, the term "plurality" means two or more. For example, a plurality of systems means two or more systems, and a plurality of screen terminals means two or more screen terminals. Furthermore, the terms "first," "second," and the like, are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating an indicated technical feature. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include one or more such feature. The terms "comprising," "including," "having," and variations thereof mean "including but not limited to," unless expressly specified otherwise.

With the technological change in the day and month, numerous software applications emerge like spring bamboo shoots after rain, and a rich use experience is provided for users. However, these applications installed in the user's devices often require authorization from the user before utilizing the user's personal information. In contrast, malware is a serious threat to users, and they can steal personal information of users, such as user names, passwords, bank accounts, and other sensitive data, which may not only cause financial loss of users, but also cause risk of privacy disclosure.

To address this challenge, malware detection is a vital ring. The method can timely discover and clear the threats hidden in the user equipment, thereby protecting the privacy and information security of the user. However, the conventional malware detection method at present only focuses on a specific feature or index, such as file hash value, process behavior, and the like, and ignores other possible key features. The single detection strategy makes the existing detection method appear to catch the fly and break the elbow when facing the continuously evolving attack means, thereby greatly reducing the detection accuracy.

In order to solve the above technical problems, the present application provides a flow-based malware detection method, and referring to fig. 1, fig. 1 is a flow diagram of a flow-based malware detection method according to an embodiment of the present application. The method is applied to a controller and comprises the following steps of S110 to S150:

S110, acquiring a network traffic data set aiming at the target equipment, wherein the network traffic data set comprises traffic basic data and data set attribute data.

In particular, the controller is a central component of a system or network that is responsible for managing and coordinating the behavior of other components in the target device and is capable of monitoring, analyzing and responding to network traffic. The target device refers to a monitored network device, which may be any type of network device, such as a router, switch, computer, cell phone, etc., that generates and transmits traffic in the network. The network traffic data set refers to a collection of network traffic associated with the target device, which includes traffic base data and data set attribute data. Traffic base data is the basic information of network traffic, such as the number, size, transmission rate, etc. of data packets. These data provide information about the basic characteristics of the network traffic. The data set attribute data refers to other attribute information related to the network traffic, such as a source IP address, a destination IP address, a port number, a protocol type, and the like. These data provide detailed information about the source, destination and manner of delivery of the traffic.

For example, assume a corporate network in which there are multiple servers, computers, and other network devices. To monitor the security and performance of the network, a controller is provided to collect and analyze network traffic data. Wherein the controller detects an abnormal network traffic data set. This data set includes traffic base data and data set attribute data. The flow base data shows that the flow of the data set suddenly increases and continues for a period of time. The data set attribute data shows that this traffic comes mainly from an unknown IP address, targeting an important server in the corporate network, using an unusual protocol. By analyzing and comparing the data, the controller can determine that this traffic data set may be evidence of malware or a network attack. The controller may then trigger an alarm informing the network administrator to take further action to address this potential security threat.

In one possible implementation manner, acquiring the network traffic data set for the target device specifically includes: receiving original data sent by target equipment; and carrying out data processing on the original data to obtain a network flow data set, wherein the data processing comprises data cleaning, data deduplication and data integration.

In particular, the controller needs to establish a connection with the target device in order to be able to receive the raw data they send. These raw data contain various network traffic information, but are typically raw and raw. Wherein the original data may contain errors, outliers or invalid data. The purpose of data cleansing is to identify and correct these problems, ensuring the accuracy and consistency of the data. Such as deleting duplicate data lines, filling in missing values, correcting erroneous fields, etc. In network traffic data, there may also be duplicate packets or records. The purpose of data deduplication is to remove these duplicate items to reduce redundant information in the dataset and improve the quality of the data. In addition, the raw data may come from different sources or formats and need to be integrated to form a unified data set. Data integration may involve converting data in different formats into a unified format, merging data from different sources, and so forth.

For example, after the controller receives the original data, the controller first performs data cleansing. For example, it may delete lines of data that contain erroneous or invalid fields, such as data with an incorrect IP address format or a missing timestamp. Next, the controller performs a deduplication process on the data. Because there may be duplicate packets or records in the network traffic data, the controller will identify and remove these duplicate items to ensure that the data set does not contain duplicate information. Finally, the controller will integrate data from different sources and formats. For example, it may merge data from different servers into a unified data set and convert data in different formats into a unified format for subsequent analysis and processing. After the data processing process, the controller can obtain a high-quality network flow data set, and powerful data support is provided for subsequent network security and flow monitoring tasks.

S120, extracting flow basic characteristics from flow basic data by adopting a preset flow detection model, wherein the flow basic characteristics comprise flow size characteristics, duration characteristics and transmission rate characteristics.

Specifically, the preset traffic detection model is a trained machine learning model that can analyze and identify network traffic data. This model is built based on neural network algorithms and has been trained and optimized using a large amount of historical traffic data. In this process, the preset flow detection model receives the flow base data as input, and then applies its learned algorithms and patterns to extract key information in the data. These key information are traffic base features that describe the basic properties and behavior of network traffic. Traffic size characteristics refer to the total amount or size of network traffic, typically measured in bytes or packets. This feature may help to understand the scale and magnitude of network traffic. The duration feature refers to the length of time that network traffic is sustained. This feature may reflect the persistence and stability of network activity. The transmission rate characteristic refers to the transmission rate of network traffic, typically expressed in bytes or packets transmitted per second. This feature may reveal the rate and bandwidth usage of network traffic.

In one possible implementation manner, the method for extracting the flow basic features from the flow basic data by adopting a preset flow detection model specifically includes: according to the flow basic data, counting an average group flow value, a minimum group flow value and a maximum group flow value to obtain flow size characteristics; calculating the starting time and the ending time of the flow according to the flow basic data to obtain the duration characteristic; calculating the number of bytes transmitted by the flow in unit time according to the flow basic data to obtain a transmission rate characteristic; and fusing the flow size characteristics, the duration characteristics and the transmission rate characteristics to obtain flow basic characteristics.

Specifically, first, the controller extracts a flow value of each packet from the flow base data, the flow value being in bytes. Statistical information of these flow values is then calculated, including average flow value, i.e. average value of all packet flow values, minimum flow value, i.e. minimum flow value of all packets, and maximum flow value, i.e. maximum flow value of all packets. Together, these statistics form a flow size feature that describes the scale and fluctuation range of the flow. By analyzing the traffic base data, the time of transmission of each data packet will also be determined. From these time stamps, the earliest start time and the latest end time of the traffic are found. The duration characteristic is the difference between the start time and the end time of the flow, which reflects the duration of the flow activity. Next, the total number of transport bytes of the traffic, i.e. the sum of all packet traffic values, is determined. The total number of bytes transmitted is then divided by the duration (in seconds) to give the number of bytes transmitted per unit time, which describes the amount of data that the traffic has transmitted per unit time, typically expressed in bytes per second (Bps) or megabytes per second (Mbps). Finally, the three features are combined to form a comprehensive flow basic feature. This typically involves combining the eigenvalues into a eigenvector or matrix for subsequent analysis and processing.

For example, assuming a network service provider, the network traffic of the user needs to be monitored and analyzed. The controller collects network traffic data of the user over a period of time and extracts traffic base features using a preset traffic detection model. First, the controller extracts statistical information of the user traffic data. These data show that the average flow value for the user is 500KB/s, the minimum flow value is 100KB/s, and the maximum flow value is 10MB/s. This indicates that the user's flow usage fluctuates over a period of time, but most of the time is around the average flow. Next, the controller analyzes the start time and end time of the traffic data, and finds that the traffic activity has continued for an entire hour, starting at 10 am and ending at 11 am, indicating that the user's network activity has continued for a considerable period of time. The controller then calculates the transmission rate of the traffic. In this one hour, the user transmits 3GB of data in total, and thus the transmission rate is about 5Mbps. This indicates that the network connection speed of the user is relatively fast. Finally, the controller fuses the three features to form a flow basic feature vector: average flow value=500 KB/s, minimum flow value=100 KB/s, maximum flow value=10 MB/s, duration=1 hour, transmission rate=5 Mbps.

S130, extracting data set attribute characteristics from the data set attribute data by adopting a preset flow detection model, wherein the data set attribute characteristics comprise source IP address characteristics, target IP address characteristics, port characteristics and protocol type characteristics.

Specifically, the data set attribute data includes various attribute information related to the network traffic, such as a source IP address, a destination IP address, a port number, a protocol type, and the like. The preset flow detection model analyzes the data and extracts key data set attribute characteristics. The dataset attribute features include a source IP address feature, a destination IP address feature, a port feature, and a protocol type feature. The source IP address feature refers to the IP address of the device that originated the network request or data transmission. By analyzing the source IP address, it is possible to know which devices or network segments are generating traffic. The target IP address feature refers to the IP address of the device receiving the network request or data transmission. This helps to know the destination of the traffic and the possible traffic patterns. Port characteristics refer to network port numbers used for data transmission. Different port numbers are typically associated with different applications or services, so the port features can provide information about the source and destination applications of traffic. The protocol type feature refers to a network protocol type for data transmission, such as TCP, UDP, etc. The protocol type feature may provide information about the manner of traffic transport and the communication protocol.

And S140, respectively carrying out feature comparison on the flow basic features and the data set attribute features through a preset flow detection model to obtain comparison results.

Specifically, the preset traffic detection model will compare the traffic base characteristics (such as traffic size, duration, and transmission rate) with the data set attribute characteristics (such as source IP address, destination IP address, port, and protocol type), respectively. Feature alignment involves comparing the extracted features to a set of known, expected or baseline features to identify any abnormal or off-normal pattern behavior. After comparison, the model generates a comparison result. The result may be a simple "normal" or "abnormal" flag or a more detailed report containing detailed information about the flow characteristics and comparison results.

For example, assume a network security team of a large enterprise is monitoring and analyzing network traffic using a pre-set traffic detection model. This model has been trained to recognize common patterns of network attacks and abnormal traffic behavior. First, the model extracts the flow base characteristics of the set of flows, including the average flow value, the minimum flow value, the maximum flow value, the duration, and the transmission rate. The model then compares these features to the reference data for the normal flow pattern stored in the model. The comparison shows that the duration of this set of flows is exceptionally long and the maximum flow value far exceeds the maximum value under normal conditions. These unusual features make traffic appear to be doing some malicious activity, such as denial of service attacks (DoS) or data leakage. Next, the controller control model extracts dataset attribute features including source IP address, destination IP address, port, and protocol type. These features are then compared with a library of known malicious IP addresses, open vulnerability ports, and common attack protocols. The comparison result shows that the source IP address is matched with the address in the known malicious IP address library, the target IP address is an important server in the company, the port number is the port utilized by a security hole recently reported, and the protocol type is a protocol commonly used for network attack. By combining these comparison results, it can be concluded that this set of traffic is likely to be an attack from a malicious attacker. Further, corresponding security measures, such as blocking access to the source IP address, closing the vulnerability port, and enforcing security protection of the target server, may be taken to address this potential security threat. In this way, the preset flow detection model can help a network security team to timely find and identify abnormal flow through feature comparison, so that corresponding measures are taken to protect the security and stability of the network.

And S150, if the comparison result is determined to indicate that the flow basic feature and/or the data set attribute feature have abnormal features, determining that potential malicious software exists in the target equipment.

Specifically, after feature comparison, the model generates a result indicating whether there is an anomaly in the underlying feature of the traffic (e.g., traffic size, duration, and transmission rate) or in the attribute of the data set (e.g., source IP address, destination IP address, port, and protocol type). Abnormal characteristics may mean that traffic does not conform to normal network behavior patterns, or to known malicious traffic patterns. If an abnormal feature is detected, this is typically a signal indicating that the target device (i.e., the device that generated the traffic) may be affected by malware. Malware, such as viruses, trojans, worms, etc., may alter the normal behavior of a device, causing anomalies in network traffic.

Thus, since the process is acquiring and analyzing network traffic data in real time, it means that it can discover the presence of potential malware in a timely manner and respond quickly. This real-time nature is critical for rapidly coping with cyber threats. By using a preset flow detection model to perform feature extraction and feature comparison, the process reduces the need of manual participation and improves the automation and intelligent level of detection. The method not only can reduce the consumption of human resources, but also can reduce human errors and improve the detection accuracy. Since the flow considers various flow characteristics and data set attributes, the flow can be adaptively detected in complex and changeable network environments. This enables it to cope with evolving malware threats, maintaining the validity of the detection. By identifying traffic base features and anomalies in data set attributes, this process can be forewarned before malware actually causes damage. The early warning function can help enterprises and individuals take measures in advance, and damage of malicious software to systems and data is prevented. This flow takes into account not only the traffic base data (traffic size, duration and transmission rate) but also the data set attribute data (source IP address, destination IP address, port and protocol type). The comprehensive analysis method enables detection to cover more potential malicious behaviors, and improves detection accuracy.

In one possible implementation manner, the comparison result is obtained by respectively comparing the flow basic feature and the data set attribute feature through a preset flow detection model, and specifically includes: calculating the duration corresponding to the duration characteristic; calculating a flow fluctuation difference value corresponding to the flow size characteristics; calculating the flow transmission rate corresponding to the transmission rate characteristics; judging the magnitude relation between the frequency of the flow fluctuation difference and a preset frequency threshold value and the magnitude relation between the flow transmission rate and a preset rate threshold value within the duration time, and obtaining a comparison result; if the comparison result is determined to indicate that the flow basic feature and/or the data set attribute feature have abnormal features, determining that potential malicious software exists in the target device specifically includes: if the frequency is determined to be greater than or equal to the preset frequency threshold value and the traffic transmission rate is determined to be greater than or equal to the preset rate threshold value, determining that the traffic basic feature has abnormal features so as to determine that malicious software exists in the target equipment.

Specifically, first, the controller extracts the duration of the flow, i.e., the time difference from the start to the end of the flow, from the flow data. The flow magnitude characteristics include an average flow value, a minimum flow value, and a maximum flow value. The flow fluctuation difference may be obtained by calculating the difference of these values, for example by calculating the difference between the maximum flow value and the average flow value. Next, the number of bytes transmitted per unit time, i.e., the transmission rate of the traffic, is calculated. Second, the controller checks the frequency of occurrence of the flow fluctuation difference over the duration. If the frequency is greater than or equal to a preset frequency threshold value, the flow fluctuation is indicated to be abnormal frequently; at the same time, the controller checks whether the traffic transmission rate is greater than or equal to a preset rate threshold. If this threshold value is also exceeded, this means that the transmission speed of the traffic is abnormal. And meanwhile, the two conditions are met, and malicious software exists in the target equipment of the controller.

In one possible implementation manner, the comparison result is obtained by respectively comparing the flow basic feature and the data set attribute feature through a preset flow detection model, and specifically includes: obtaining a source IP address, a target IP address, a port and a protocol type according to the attribute characteristics of the data set; comparing the source IP address, the target IP address, the port and the protocol type with a preset feature group, and obtaining a comparison result, wherein the preset feature group comprises the source IP address, the preset target IP address, the preset port and the preset protocol type, and is a normal feature pre-stored in a preset flow detection model; if the comparison result is determined to indicate that the flow basic feature and/or the data set attribute feature have abnormal features, determining that potential malicious software exists in the target device specifically includes: if any one of the source IP address, the target IP address, the port and the protocol type is not consistent with the corresponding feature in the preset feature set, determining that the data set attribute feature has an abnormal feature, so as to determine that malicious software exists in the target equipment.

Specifically, the controller extracts data set attribute characteristics from the captured network traffic data, which typically include a source IP address (device address to send traffic), a destination IP address (device address to receive traffic), a port number used, and a communication protocol type. The preset feature set is a set of normal features pre-stored in a preset flow detection model, and comprises a source IP address, a preset target IP address, a preset port and a preset protocol type. These preset features generally represent normal, expected network behavior patterns. The comparison process is to check whether the features extracted from the flow data are consistent with the features in the preset feature group. The comparison will indicate whether the data set attribute features match the set of preset features. When at least one data set attribute feature extracted from the flow data is not matched with a preset normal feature, the flow data contains abnormal or abnormal behavior mode information. Based on anomalies in the data set attribute characteristics, the model may infer that the target device (i.e., the device that sent or received the traffic) may be affected by malware.

For example, assume that the model detects an abnormal flow pattern. From the traffic data, the following data set attribute features are extracted: source IP address: 192.168.1.100; target IP address: 10.0.0.1; the port: 8080; protocol type: TCP, in turn, compares these features to a set of preset features. The set of preset features may include: presetting a source IP address range: 192.168.1.0-192.168.1.255; presetting a target IP address: 10.0.0.1; presetting a port range: 8000-9000; presetting a protocol type: TCP. The comparison result shows that the source IP address is not in the preset source IP address range, and the port number is also out of the preset port range, which means that there is an abnormality in the attribute characteristics of the data set.

In a possible implementation manner, referring to fig. 2, fig. 2 is another flow chart of a flow-based malware detection method provided by the embodiment of the present application, before a preset flow detection model is adopted to extract flow basic features from flow basic data, a preset flow detection model is trained; training a preset flow detection model, specifically including steps S210 to S250, where the steps are as follows: s210, processing the historical flow basic data and the historical data set attribute data to obtain a plurality of characteristic values; s220, taking the characteristic value which is larger than a preset threshold value as a representative characteristic value in the plurality of characteristic values; s230, taking historical flow basic data and historical data set attribute data corresponding to the representative feature values as training samples; s240, inputting a training sample into an input layer of the neural network, and performing supervision training on a corresponding comparison result into an output layer of the neural network to obtain an initial flow detection model; s250, detecting the accuracy of the initial flow detection model, and selecting a model with accuracy meeting preset conditions as a preset flow detection model.

Specifically, first, a large amount of historical flow data (including flow basis data and data set attribute data) is collected, and the data is processed, e.g., cleaned, normalized, etc., to extract a plurality of feature values. These characteristic values may include traffic size, duration, transmission rate, source IP address, destination IP address, port number, etc. After the data are processed, the feature values are screened. Typically, only those feature values that are greater than a certain preset threshold are considered representative, i.e., the feature values may be more valuable for model learning. Wherein, the historical flow data corresponding to the characteristic values larger than the preset threshold value are selected as training samples. These samples will be used to train the neural network model. Further, a neural network structure (e.g., a deep learning network) is used, and the training samples are used as inputs, and their corresponding comparison results (i.e., desired outputs) are used as target outputs. By supervised training (e.g., back propagation algorithms), the neural network learns how to predict the output from the input data. After enough training iterations, an initial flow detection model is obtained. Finally, after the initial model is obtained, it needs to be evaluated. This typically involves using a separate test dataset to verify the predictive capabilities of the model. If the accuracy (e.g., accuracy, recall, etc.) of the model meets a preset condition (e.g., is greater than a certain threshold), the model is taken as the final preset flow detection model. If not, it may be necessary to adjust the structure, parameters, or to re-collect more data of the model for further training.

In one possible implementation, malware marking data sent by a target device is received; and storing the malicious software marking data into a preset flow detection model.

Specifically, when the target device detects malware or suspicious activity, it may generate malware marking data. Such data may include information on malware signatures, behavioral characteristics, time of infection, source IP address, etc. The target device sends this data to the controller for further analysis and processing. Once the controller receives malware marking data from the target device, it stores the data in a preset traffic detection model. The purpose of this is to update the knowledge base of the model so that it can identify new malware features or patterns of behavior. By continually updating the model, its ability to detect new threats may be improved. The malicious software marking data in the target equipment can be input by the user, so that the detection efficiency can be improved.

The application further provides a flow-based malicious software detection device, and referring to fig. 3, fig. 3 is a schematic block diagram of the flow-based malicious software detection device according to an embodiment of the application. The device is a controller, and the controller comprises an acquisition module 31 and a processing module 32, wherein the acquisition module 31 acquires a network traffic data set aiming at target equipment, and the network traffic data set comprises traffic basic data and data set attribute data; the processing module 32 extracts flow basic characteristics from the flow basic data by adopting a preset flow detection model, wherein the flow basic characteristics comprise flow size characteristics, duration characteristics and transmission rate characteristics; the processing module 32 extracts data set attribute characteristics from the data set attribute data by adopting a preset flow detection model, wherein the data set attribute characteristics comprise source IP address characteristics, target IP address characteristics, port characteristics and protocol type characteristics; the processing module 32 respectively performs feature comparison on the flow basic features and the data set attribute features through a preset flow detection model to obtain comparison results; if the processing module 32 determines that the comparison indicates that there is an abnormal feature in the traffic base feature and/or the data set attribute feature, then it is determined that potential malware is present in the target device.

In one possible implementation, the acquiring module 31 acquires a network traffic data set for a target device, specifically including: the acquisition module 31 receives the original data sent by the target device; the processing module 32 performs data processing on the raw data to obtain a network traffic data set, where the data processing includes data cleansing, data deduplication, and data integration.

In one possible implementation, the processing module 32 extracts the flow basic feature from the flow basic data by using a preset flow detection model, and specifically includes: the processing module 32 calculates an average group flow value, an extremely small group flow value and an extremely large group flow value according to the flow basic data to obtain flow size characteristics; the processing module 32 calculates the starting time and the ending time of the flow according to the flow basic data, and obtains the duration characteristic; the processing module 32 calculates the number of bytes transmitted by the flow in unit time according to the flow basic data to obtain the transmission rate characteristic; the processing module 32 fuses the traffic size feature, the duration feature, and the transmission rate feature to obtain a traffic base feature.

In one possible implementation manner, the processing module 32 performs feature comparison on the flow basic feature and the data set attribute feature through a preset flow detection model to obtain a comparison result, which specifically includes: the processing module 32 calculates a duration corresponding to the duration feature; the processing module 32 calculates a flow fluctuation difference value corresponding to the flow size characteristic; the processing module 32 calculates a traffic transmission rate corresponding to the transmission rate characteristic; the processing module 32 judges the magnitude relation between the frequency of occurrence of the flow fluctuation difference and the preset frequency threshold value and the magnitude relation between the flow transmission rate and the preset rate threshold value within the duration time, and obtains a comparison result; if the processing module 32 determines that the comparison result indicates that the traffic base feature and/or the data set attribute feature has an abnormal feature, determining that potential malicious software exists in the target device specifically includes: if the processing module 32 determines that the frequency is greater than or equal to the preset frequency threshold and the traffic transmission rate is greater than or equal to the preset rate threshold, then it determines that the traffic base feature has an abnormal feature to determine that malware is present in the target device.

In one possible implementation manner, the processing module 32 performs feature comparison on the flow basic feature and the data set attribute feature through a preset flow detection model to obtain a comparison result, which specifically includes: the processing module 32 obtains a source IP address, a target IP address, a port, and a protocol type according to the data set attribute characteristics; the processing module 32 compares the source IP address, the target IP address, the port and the protocol type with a preset feature set, and obtains a comparison result, wherein the preset feature set comprises the source IP address, the preset target IP address, the preset port and the preset protocol type, and the preset feature set is a normal feature pre-stored in a preset flow detection model; if the processing module 32 determines that the comparison result indicates that the traffic base feature and/or the data set attribute feature has an abnormal feature, determining that potential malicious software exists in the target device specifically includes: if any one of the source IP address, the destination IP address, the port, and the protocol type does not match the corresponding feature in the set of preset features, the processing module 32 determines that the data set attribute feature has an abnormal feature to determine that malware is present in the destination device.

In one possible implementation, the preset flow detection model is trained before the flow basic features are extracted from the flow basic data by adopting the preset flow detection model; training a preset flow detection model, specifically comprising: the processing module 32 processes the historical flow basic data and the historical data set attribute data to obtain a plurality of characteristic values; the processing module 32 takes a feature value greater than a preset threshold value as a representative feature value from the feature values; the processing module 32 takes the historical flow basic data and the historical data set attribute data corresponding to the representative feature values as training samples; the processing module 32 inputs the training sample to an input layer of the neural network, and supervises and trains the corresponding comparison result to an output layer of the neural network to obtain an initial flow detection model; the processing module 32 performs accuracy detection on the initial flow detection model, and selects a model with accuracy meeting a preset condition as a preset flow detection model.

In one possible implementation, the obtaining module 31 receives malware marking data sent by the target device; the processing module 32 stores the malware marking data into a preset traffic detection model.

It should be noted that: in the device provided in the above embodiment, when implementing the functions thereof, only the division of the above functional modules is used as an example, in practical application, the above functional allocation may be implemented by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules, so as to implement all or part of the functions described above. In addition, the embodiments of the apparatus and the method provided in the foregoing embodiments belong to the same concept, and specific implementation processes of the embodiments of the method are detailed in the method embodiments, which are not repeated herein.

The application further provides an electronic device, and referring to fig. 4, fig. 4 is a schematic structural diagram of the electronic device according to the embodiment of the application. The electronic device may include: at least one processor 41, at least one network interface 44, a user interface 43, a memory 45, at least one communication bus 42.

Wherein a communication bus 42 is used to enable connected communication between these components.

The user interface 43 may include a Display screen (Display) and a Camera (Camera), and the optional user interface 43 may further include a standard wired interface and a standard wireless interface.

The network interface 44 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface), among others.

Wherein processor 41 may comprise one or more processing cores. The processor 41 connects various parts within the overall server using various interfaces and lines, performs various functions of the server and processes data by executing or executing instructions, programs, code sets, or instruction sets stored in the memory 45, and invoking data stored in the memory 45. Alternatively, the processor 41 may be implemented in at least one hardware form of digital signal Processing (DIGITAL SIGNAL Processing, DSP), field-Programmable gate array (Field-Programmable GATE ARRAY, FPGA), programmable logic array (Programmable Logic Array, PLA). The processor 41 may integrate one or a combination of several of a central processing unit (Central Processing Unit, CPU), an image processor (Graphics Processing Unit, GPU), a modem, etc. The CPU mainly processes an operating system, a user interface, an application program and the like; the GPU is used for rendering and drawing the content required to be displayed by the display screen; the modem is used to handle wireless communications. It will be appreciated that the modem may not be integrated into the processor 41 and may be implemented by a single chip.

The Memory 45 may include a random access Memory (Random Access Memory, RAM) or a Read-Only Memory (Read-Only Memory). Optionally, the memory 45 includes a non-transitory computer readable medium (non-transitory computer-readable storage medium). Memory 45 may be used to store instructions, programs, code, a set of codes, or a set of instructions. The memory 45 may include a stored program area and a stored data area, wherein the stored program area may store instructions for implementing an operating system, instructions for at least one function (such as a touch function, a sound playing function, an image playing function, etc.), instructions for implementing the above-described respective method embodiments, etc.; the storage data area may store data or the like involved in the above respective method embodiments. The memory 45 may also optionally be at least one memory device located remotely from the aforementioned processor 41. As shown in fig. 4, an operating system, a network communication module, a user interface module, and an application program of a traffic-based malware detection method may be included in the memory 45 as a computer storage medium.

In the electronic device shown in fig. 4, the user interface 43 is mainly used for providing an input interface for a user, and acquiring data input by the user; and processor 41 may be configured to invoke an application of the traffic-based malware detection method stored in memory 45, which when executed by one or more processors, causes the electronic device to perform the method as in one or more of the embodiments described above.

It should be noted that, for simplicity of description, the foregoing method embodiments are all described as a series of acts, but it should be understood by those skilled in the art that the present application is not limited by the order of acts described, as some steps may be performed in other orders or concurrently in accordance with the present application. Further, those skilled in the art will also appreciate that the embodiments described in the specification are all of the preferred embodiments, and that the acts and modules referred to are not necessarily required for the present application.

The application also provides a computer readable storage medium storing instructions. When executed by one or more processors, cause an electronic device to perform the method as described in one or more of the embodiments above.

In the foregoing embodiments, the descriptions of the embodiments are emphasized, and for parts of one embodiment that are not described in detail, reference may be made to related descriptions of other embodiments.

In the several embodiments provided by the present application, it should be understood that the disclosed apparatus may be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative, such as a division of units, merely a division of logic functions, and there may be additional divisions in actual implementation, such as multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some service interface, device or unit indirect coupling or communication connection, electrical or otherwise.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable memory. Based on this understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in whole or in part in the form of a software product stored in a memory, comprising several instructions for causing a computer device (which may be a personal computer, a server or a network device, etc.) to perform all or part of the steps of the method of the various embodiments of the present application. And the aforementioned memory includes: various media capable of storing program codes, such as a U disk, a mobile hard disk, a magnetic disk or an optical disk.

The foregoing is merely exemplary embodiments of the present disclosure and is not intended to limit the scope of the present disclosure. That is, equivalent changes and modifications are contemplated by the teachings of this disclosure, which fall within the scope of the present disclosure. Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a scope and spirit of the disclosure being indicated by the claims.

Claims

1. A traffic-based malware detection method, the method comprising:

Acquiring a network traffic data set aiming at target equipment, wherein the network traffic data set comprises traffic basic data and data set attribute data;

Extracting flow basic characteristics from the flow basic data by adopting a preset flow detection model, wherein the flow basic characteristics comprise flow size characteristics, duration characteristics and transmission rate characteristics;

Extracting data set attribute characteristics from the data set attribute data by adopting the preset flow detection model, wherein the data set attribute characteristics comprise source IP address characteristics, target IP address characteristics, port characteristics and protocol type characteristics;

respectively comparing the flow basic characteristics with the data set attribute characteristics through the preset flow detection model to obtain comparison results;

And if the comparison result is determined to indicate that the flow basic feature and/or the data set attribute feature have abnormal features, determining that potential malicious software exists in the target equipment.

2. The traffic-based malware detection method of claim 1, wherein the obtaining a network traffic data set for a target device specifically comprises:

Receiving original data sent by the target equipment;

And carrying out data processing on the original data to obtain the network flow data set, wherein the data processing comprises data cleaning, data deduplication and data integration.

3. The flow-based malware detection method of claim 1, wherein the extracting the flow basic feature from the flow basic data by using a preset flow detection model specifically comprises:

According to the flow basic data, an average group flow value, a minimum group flow value and a maximum group flow value are counted to obtain the flow size characteristics;

calculating the starting time and the ending time of the flow according to the flow basic data to obtain the duration characteristic;

Calculating the number of bytes transmitted by the flow in unit time according to the flow basic data to obtain the transmission rate characteristic;

And fusing the flow size characteristic, the duration characteristic and the transmission rate characteristic to obtain the flow basic characteristic.

4. The flow-based malware detection method according to claim 1, wherein the feature comparison is performed on the flow basic feature and the data set attribute feature through the preset flow detection model, so as to obtain a comparison result, and the method specifically comprises:

calculating the duration corresponding to the duration characteristic;

calculating a flow fluctuation difference value corresponding to the flow size characteristic;

Calculating the flow transmission rate corresponding to the transmission rate characteristic;

Judging the magnitude relation between the frequency of the flow fluctuation difference and a preset frequency threshold value and the magnitude relation between the flow transmission rate and a preset rate threshold value within the duration time, and obtaining the comparison result;

If it is determined that the comparison result indicates that the traffic base feature and/or the data set attribute feature have abnormal features, determining that potential malicious software exists in the target device specifically includes:

And if the frequency is determined to be greater than or equal to the preset frequency threshold value and the traffic transmission rate is determined to be greater than or equal to the preset rate threshold value, determining that the traffic basic feature has abnormal features so as to determine that the malicious software exists in the target equipment.

5. The flow-based malware detection method according to claim 1, wherein the feature comparison is performed on the flow basic feature and the data set attribute feature through the preset flow detection model, so as to obtain a comparison result, and the method specifically comprises:

obtaining a source IP address, a target IP address, a port and a protocol type according to the attribute characteristics of the data set;

Comparing the source IP address, the target IP address, the port and the protocol type with a preset feature set, and obtaining the comparison result, wherein the preset feature set comprises the source IP address, a preset target IP address, a preset port and a preset protocol type, and is a normal feature pre-stored in the preset flow detection model;

If any one of the source IP address, the target IP address, the port and the protocol type is not consistent with the corresponding feature in the preset feature set, determining that the data set attribute feature has an abnormal feature, so as to determine that the malicious software exists in the target equipment.

6. The flow-based malware detection method of claim 1, wherein the pre-set flow detection model is trained before the flow base features are extracted from the flow base data using the pre-set flow detection model; training the preset flow detection model specifically comprises the following steps:

Processing the historical flow basic data and the historical data set attribute data to obtain a plurality of characteristic values;

taking the characteristic value which is larger than a preset threshold value in the plurality of characteristic values as a representative characteristic value;

taking the historical flow basic data and the historical data set attribute data corresponding to the representative feature values as training samples;

Inputting the training sample to an input layer of a neural network, and performing supervision training on a corresponding comparison result to an output layer of the neural network to obtain an initial flow detection model;

And detecting the precision of the initial flow detection model, and selecting a model with precision meeting preset conditions as the preset flow detection model.

7. The traffic-based malware detection method of claim 1, further comprising:

Receiving malicious software marking data sent by the target equipment;

storing the malicious software marking data into the preset flow detection model.

8. Traffic-based malware detection device, characterized in that it comprises an acquisition module (31) and a processing module (32), wherein,

The acquiring module (31) is configured to acquire a network traffic data set for a target device, where the network traffic data set includes traffic base data and data set attribute data;

The processing module (32) is further used for extracting flow basic characteristics from the flow basic data by adopting a preset flow detection model, wherein the flow basic characteristics comprise flow size characteristics, duration characteristics and transmission rate characteristics;

The processing module (32) is further configured to extract a data set attribute feature from the data set attribute data by using the preset flow detection model, where the data set attribute feature includes a source IP address feature, a target IP address feature, a port feature, and a protocol type feature;

the processing module (32) is further configured to perform feature comparison on the flow basic feature and the data set attribute feature through the preset flow detection model, so as to obtain a comparison result;

The processing module (32) is further configured to determine that potential malware exists in the target device if it is determined that the comparison result indicates that the traffic base feature and/or the data set attribute feature has an abnormal feature.

9. An electronic device, characterized in that the electronic device comprises a processor (41), a memory (45), a user interface (43) and a network interface (44), the memory (45) being arranged to store instructions, the user interface (43) and the network interface (44) being arranged to communicate to other devices, the processor (41) being arranged to execute the instructions stored in the memory (45) to cause the electronic device to perform the method according to any one of claims 1 to 7.

10. A computer readable storage medium storing instructions which, when executed, perform the method of any one of claims 1 to 7.