CN114338436A - Network traffic file identification method and device, electronic equipment and medium - Google Patents

Network traffic file identification method and device, electronic equipment and medium Download PDF

Info

Publication number
CN114338436A
CN114338436A CN202111632893.8A CN202111632893A CN114338436A CN 114338436 A CN114338436 A CN 114338436A CN 202111632893 A CN202111632893 A CN 202111632893A CN 114338436 A CN114338436 A CN 114338436A
Authority
CN
China
Prior art keywords
protocol
file
network traffic
data packets
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111632893.8A
Other languages
Chinese (zh)
Inventor
黄子恒
张星
葛继声
张志良
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sangfor Technologies Co Ltd
Original Assignee
Sangfor Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sangfor Technologies Co Ltd filed Critical Sangfor Technologies Co Ltd
Priority to CN202111632893.8A priority Critical patent/CN114338436A/en
Publication of CN114338436A publication Critical patent/CN114338436A/en
Pending legal-status Critical Current

Links

Images

Abstract

The embodiment of the application discloses a method and a device for identifying a network flow file, electronic equipment and a computer readable storage medium. And analyzing the acquired network flow file according to a set protocol analysis mode to obtain the protocol information of the network flow file. The distribution of the traffic data has a strong correlation with the protocol. And matching the protocol information with a set service database to determine the proportion of the service data in the network flow file. When the ratio is greater than or equal to the preset threshold, it indicates that the network traffic file contains a large amount of service data, and at this time, it may be determined that the network traffic file is a valid file. In the technical scheme, the protocol information contained in the network flow file is analyzed based on the strong correlation between the service data and the protocol, whether the network flow file contains the required service data can be automatically identified, manual analysis is not needed, and the identification efficiency of the service data is improved.

Description

Network traffic file identification method and device, electronic equipment and medium
Technical Field
The present application relates to the field of network security technologies, and in particular, to a method and an apparatus for identifying a network traffic file, an electronic device, and a computer-readable storage medium.
Background
With the development of new technologies such as 5G (5th Generation Mobile Communication Technology, fifth Generation Mobile Communication Technology) and Mobile computing, various internet of things scenes are rapidly applied in recent years. While the number of devices in the internet of things is rapidly increased, the number also means that once a vulnerability occurs, the range of influence and the loss caused by the vulnerability are hard to imagine. In recent years, a plurality of related vulnerabilities and attacks of the internet of things with a large influence range are disclosed, and the caused consequences affect hundreds of millions of internet of things devices. Therefore, under the background of the frequent occurrence of the attack event of the internet of things, the identification, analysis and combing of the internet of things equipment are very necessary, and especially the fine-grained division of the internet of things equipment can better make safety measures for different equipment and different vulnerabilities, thereby reducing the attacked range of the internet of things equipment.
In the aspect of identification of the equipment of the Internet of things, the equipment is mainly subjected to packet capturing by an acquisition person on the spot of a client, network flow files passing through the equipment are acquired and then transmitted to an analyst, the analyst analyzes the flows through special analysis platforms or tools, and equipment fingerprints are extracted to mark the equipment. Each type of device, or each type of device model, has its own unique device fingerprint. However, the device fingerprint generally exists in the service data in the network traffic file, and when the captured network traffic file does not contain the service data, the analyst cannot analyze and extract the device fingerprint of the device. Therefore, judging whether the network traffic file contains the service data is an important prerequisite for equipment identification.
At present, for the identification of network flow files, a manual analysis mode is often adopted to determine whether the network flow files contain service data. In practical application, an acquisition person goes to the field to acquire the equipment network flow file and then returns the acquired equipment network flow file to an analysis person, the analysis person analyzes and finds that no business data exists and then feeds back a result to the acquisition person, and the acquisition person goes to the field to acquire the data again. This method is very time and labor consuming. Because the on-site collection is carried out on the client, no special analysis platform and tool are provided, the platforms and tools are used in the company of the analyst, the analysis process is not a short-time process, and if the analyst directly goes to the on-site and carries out the collection and the analysis, the workload of the analyst is increased, and the analyst is not suitable for the on-site collection and the analysis of the analyst.
Therefore, how to improve the identification efficiency of the service data is a problem to be solved by those skilled in the art.
Disclosure of Invention
The embodiment of the application aims to provide a method and a device for identifying a network traffic file, electronic equipment and a computer readable storage medium, which can improve the identification efficiency of service data.
In order to solve the foregoing technical problem, an embodiment of the present application provides a method for identifying a network traffic file, including:
analyzing the acquired network traffic file according to a set protocol analysis mode to obtain protocol information of the network traffic file;
matching the protocol information with a set service database to determine the proportion of service data in the network traffic file;
and determining the network traffic file as an effective file under the condition that the ratio is greater than or equal to a preset threshold value.
Optionally, the analyzing the acquired network traffic file according to a set protocol analysis manner to obtain the protocol information of the network traffic file includes:
extracting session information of each data packet in the network flow file; wherein the session information comprises a protocol name;
and summarizing the field contents corresponding to the same protocol name in the network flow file according to the respective corresponding field type of each protocol type.
Optionally, the service database includes a protocol white list composed of protocols to which service data belongs;
the matching the protocol information with a set service database and determining the proportion of the service data in the network traffic file comprises:
screening out a target protocol name matched with the protocol white list from the network flow file;
summarizing the data packets corresponding to the target protocol name to obtain the number of the data packets matched with the protocol white list;
and determining the proportion of the service data in the network flow file based on the number of the data packets and the total number of the data packets of the network flow file.
Optionally, the method further comprises:
and under the condition that the ratio is smaller than a preset threshold value, setting a port blacklist according to port information corresponding to the data packet which is not matched with the protocol whitelist.
Optionally, the service database further includes a service data blacklist composed of non-service data;
after determining the number of the data packets matched with the protocol white list according to the respective corresponding protocol names of the protocol information, the method further comprises the following steps:
screening out target data packets which are not matched with the service data blacklist from the data packets which are matched with the protocol white list;
and taking the number of the target data packets as the final number of the data packets.
Optionally, the service database further includes an IP blacklist composed of IPs for non-service data;
after the step of screening out the target data packets which are not matched with the service data blacklist from the data packets which are matched with the protocol whitelist, the method further comprises the following steps:
screening out target data packets which are not matched with the IP blacklist from the target data packets;
the taking the number of the target data packets as the final number of the data packets comprises:
and taking the number of the target data packets which are not matched with the IP blacklist as the final number of the data packets.
Optionally, after the determining that the network traffic file is a valid file, the method further includes:
generating an identification report of the network traffic file; the identification report comprises protocol names contained in the network traffic file, field contents corresponding to the same protocol names respectively and/or the proportion of service data in the network traffic file.
The embodiment of the application also provides a device for identifying the network flow file, which comprises an analysis unit, a matching unit and a determination unit;
the analysis unit is used for analyzing the acquired network traffic file according to a set protocol analysis mode to obtain protocol information of the network traffic file;
the matching unit is used for matching the protocol information with a set service database and determining the proportion of service data in the network traffic file;
the determining unit is configured to determine that the network traffic file is a valid file when the ratio is greater than or equal to a preset threshold.
Optionally, the parsing unit includes an extracting subunit and a summarizing subunit;
the extraction subunit is configured to extract session information of each data packet in the network traffic file; wherein the session information comprises a protocol name;
and the summarizing subunit is used for summarizing the field contents corresponding to the same protocol name in the network flow file according to the respective field type corresponding to each protocol type.
Optionally, the service database includes a protocol white list composed of protocols to which service data belongs; the matching unit comprises a screening subunit, a gathering subunit and a determining subunit;
the screening subunit is configured to screen out, from the network traffic file, a target protocol name matched with the protocol white list;
the summarizing subunit is configured to summarize the data packets corresponding to the target protocol name to obtain the number of data packets matching the protocol white list;
and the determining subunit is configured to determine, based on the number of the data packets and the total number of the data packets in the network traffic file, a proportion of the service data in the network traffic file.
Optionally, the system further comprises a setting unit;
and the setting unit is used for setting a port blacklist according to the port information corresponding to the data packet which is not matched with the protocol white list under the condition that the ratio is smaller than a preset threshold value.
Optionally, the service database further includes a service data blacklist composed of non-service data; the device also comprises a data screening unit;
the data screening unit is used for screening out target data packets which are not matched with the service data blacklist from the data packets which are matched with the protocol white list;
the acting unit is used for taking the number of the target data packets as the final number of the data packets.
Optionally, the service database further includes an IP blacklist composed of IPs for non-service data; the device also comprises an IP screening unit;
the IP screening unit is used for screening out the target data packets which are not matched with the IP blacklist from the target data packets;
the acting unit is used for taking the number of the target data packets which do not match with the IP blacklist as the final number of the data packets.
Optionally, the system further comprises a generating unit;
the generating unit is configured to generate an identification report of the network traffic file after the network traffic file is determined to be a valid file; the identification report comprises protocol names contained in the network traffic file, field contents corresponding to the same protocol names respectively and/or the proportion of service data in the network traffic file.
An embodiment of the present application further provides an electronic device, including:
a memory for storing a computer program;
a processor for executing the computer program to implement the steps of the method for identifying a network traffic file as described above.
The embodiment of the present application further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the steps of the method for identifying a network traffic file as described above are implemented.
According to the technical scheme, the acquired network traffic file is analyzed according to a set protocol analysis mode to obtain the protocol information of the network traffic file; the protocol information reflects the protocol used by each packet in the network traffic file. The distribution of the traffic data has a strong correlation with the protocol. The service data is often contained in data packets corresponding to certain specific protocols, so that in the application, a service database can be constructed based on the protocols, the protocol information is matched with the set service database, and the proportion of the service data in the network traffic file is determined. Under the condition that the ratio is greater than or equal to the preset threshold, it is indicated that the network traffic file contains a large amount of service data, and the network traffic file can be used for subsequent asset identification and other work, so that under the condition that the ratio is greater than or equal to the preset threshold, the network traffic file can be determined to be an effective file. In the technical scheme, the protocol information contained in the network flow file is analyzed based on the strong correlation between the service data and the protocol, whether the network flow file contains the required service data can be automatically identified, manual analysis is not needed, and the identification efficiency of the service data is improved.
Drawings
In order to more clearly illustrate the embodiments of the present application, the drawings needed for the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings can be obtained by those skilled in the art without inventive effort.
Fig. 1 is a schematic view of a scenario for identifying a network traffic file according to an embodiment of the present application;
fig. 2 is a flowchart of an identification method for a network traffic file according to an embodiment of the present disclosure;
fig. 3 is a schematic structural diagram of an apparatus for identifying a network traffic file according to an embodiment of the present disclosure;
fig. 4 is a structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all the embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the present application without any creative effort belong to the protection scope of the present application.
The terms "including" and "having," and any variations thereof, in the description and claims of this application and the drawings described above, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements but may include other steps or elements not expressly listed.
In order that those skilled in the art will better understand the disclosure, the following detailed description will be given with reference to the accompanying drawings.
Different types of devices produce different traffic data based on which fingerprint information may be generated. The fingerprint information is unique, and different devices can be identified based on the fingerprint information. The key of fingerprint information generation is that a large amount of service data needs to be contained in a captured network traffic file. At present, the identification of business data mainly depends on manual analysis, but the manual analysis efficiency is low, and for network traffic files with large data volume, errors easily occur in the manual analysis.
Therefore, the embodiment of the application provides a method and a device for identifying a network traffic file, an electronic device and a computer-readable storage medium. In consideration of the strong correlation between the service data and the protocols, the service data is often contained in data packets corresponding to certain specific protocols, so that the service database can be constructed based on the protocols in the application. According to the strong correlation between the service data and the protocol, the protocol information contained in the network flow file is analyzed, and whether the required service data is contained in the network flow file or not can be automatically identified.
Fig. 1 is a schematic view of a scene for identifying a network traffic file according to an embodiment of the present application, where in order to implement automatic analysis of the network traffic file, a service database may be pre-constructed at a server side, and the service database may be constructed based on protocol information that is strongly related to service data. The network flow file of the equipment can be obtained in a packet capturing mode. In order to realize the identification of the service data in the network traffic file based on the protocol, the acquired network traffic file may be analyzed according to a set protocol analysis mode to obtain the protocol information of the network traffic file. And matching the protocol information with a set service database to determine the proportion of the service data in the network flow file. And under the condition that the ratio is greater than or equal to the preset threshold value, the network flow file contains a large amount of service data, and the network flow file can be determined to be an effective file at the moment. The implementation mode can automatically identify whether the network flow file contains the required service data, and does not need to be analyzed manually, so that the identification efficiency and accuracy of the service data are improved.
Next, a method for identifying a network traffic file provided in an embodiment of the present application is described in detail.
Fig. 2 is a flowchart of a method for identifying a network traffic file according to an embodiment of the present application, where the method includes:
s201: and analyzing the acquired network flow file according to a set protocol analysis mode to obtain the protocol information of the network flow file.
In the embodiment of the application, strong correlation between the service data and the protocol is considered. For example, in a specific medical client network, the service data usually appears in the HTTP Protocol (hypertext Transfer Protocol) of the application layer, the DICOM (Digital Imaging and Communications in Medicine) Protocol specific to the medical field, and for some proprietary unknown protocols, it is transmitted through the TCP (Transmission Control Protocol) of the transport layer, the UDP (User Datagram Protocol), and therefore, the communication content of the unidentified application layer Protocol in the TCP and the UDP may also include the service data. However, the DNS (Domain Name System) Protocol in the application layer, the ICMP Protocol (Internet Control Message Protocol) in the network layer, and the like do not generate traffic data.
Therefore, whether the network traffic file contains the service data or not can be identified based on the protocol information contained in the network traffic file.
A network traffic file often contains multiple packets, each packet having a corresponding protocol type. In order to facilitate subsequent evaluation of the amount of data including the service data in the network traffic file, the data included in the network traffic file may be classified and summarized according to the protocol type.
In a specific implementation, the session information of each data packet in the network traffic file may be extracted. The session information may be five-tuple information of the packet, including a source IP, a destination IP, a source port, a destination port, and a protocol name.
Different protocol types differ in their corresponding field types. For a protocol containing service data, the content of a field corresponding to each field type under the protocol may be often used as the service data.
After the session information of each data packet in the network traffic file is extracted, the contents of the fields corresponding to the same protocol name in the network traffic file can be summarized according to the respective field type corresponding to each protocol type.
The PCAP file is a common datagram storage format, the captured network flow file is often presented in the form of a PCAP packet, and the overall structure of the PCAP packet is in the forms of a file header-a data packet header 1-a data packet header 2-a data packet 2 and the like. The service data can be parsed from the PCAP packet. Each PCAP packet often includes a plurality of data packets, and five-tuple information corresponding to each data packet may be extracted from the PCAP packet to obtain a source IP, a destination IP, a source port, a destination port, and a protocol name. And then, by analyzing the hexadecimal information of the data packet, judging the length of the data packet, then finding the starting position and the ending position of the data packet, converting the content into readable text content by using a hexadecimal conversion tool, continuously and circularly analyzing, and finally integrating the field contents of the same type of protocol, namely the same protocol name.
S202: and matching the protocol information with a set service database to determine the proportion of the service data in the network flow file.
In the embodiment of the application, the service database can be constructed based on the protocol. For example, the service data included in the historical network traffic file may be analyzed, and the protocol names to which the service data belongs may be summarized, so that a protocol white list may be obtained. In practical application, as the number of historical network traffic files increases, a new protocol name to which the service data belongs may appear, so that in practical application, the protocol names included in the protocol white list can be continuously adjusted and updated.
Each packet has its corresponding protocol information. The data packet corresponding to each protocol name can be obtained by analyzing the network flow file.
When the proportion of the service data in the network flow file is evaluated, a target protocol name matched with a protocol white list can be screened from the network flow file; summarizing the data packets corresponding to the target protocol name to obtain the number of the data packets matched with the protocol white list; and determining the proportion of the service data in the network flow file based on the number of the data packets and the total number of the data packets of the network flow file.
For example, it is assumed that the network traffic file contains 100 packets, the 100 packets correspond to 10 protocol types, and each protocol type corresponds to 10 packets. The protocol names of 8 protocol types exist in the protocol white list, at this time, the number of the data packets matched with the protocol white list is 10 × 8 — 80, the total number of the data packets of the network traffic file is 100, and at this time, it can be determined that the proportion of the service data in the network traffic file is 80/100 — 80%.
S203: and determining the network traffic file as a valid file under the condition that the ratio is greater than or equal to a preset threshold value.
The higher the proportion of the service data in the network traffic file is, the more the service data exists in the network traffic file. Only when the network flow file contains a large amount of service data, the asset identification value is achieved. Therefore, in the embodiment of the present application, a threshold may be set to evaluate the proportion of the traffic data in the network traffic file.
And under the condition that the ratio is greater than or equal to the preset threshold value, the network flow file contains a large amount of service data and has the value of subsequent analysis, and at the moment, the network flow file can be determined to be an effective file.
The value of the preset threshold can be set based on actual requirements, and when the quality requirement on the network traffic file is high, the value of the preset threshold can be set to be higher; when the quality requirement on the network traffic file is not high, the value of the preset threshold value can be set to be lower.
For example, the preset threshold may be set to 80%, and in combination with that the percentage of the service data in the network traffic file determined in the above example is 80%, and the percentage is equal to the preset threshold, the network traffic file may be determined to be a valid file, and the network traffic file may be used to perform subsequent asset identification.
According to the technical scheme, the acquired network traffic file is analyzed according to a set protocol analysis mode to obtain the protocol information of the network traffic file; the protocol information reflects the protocol used by each packet in the network traffic file. The distribution of the traffic data has a strong correlation with the protocol. The service data is often contained in data packets corresponding to certain specific protocols, so that in the application, a service database can be constructed based on the protocols, the protocol information is matched with the set service database, and the proportion of the service data in the network traffic file is determined. Under the condition that the ratio is greater than or equal to the preset threshold, it is indicated that the network traffic file contains a large amount of service data, and the network traffic file can be used for subsequent asset identification and other work, so that under the condition that the ratio is greater than or equal to the preset threshold, the network traffic file can be determined to be an effective file. In the technical scheme, the protocol information contained in the network flow file is analyzed based on the strong correlation between the service data and the protocol, whether the network flow file contains the required service data can be automatically identified, manual analysis is not needed, and the identification efficiency of the service data is improved.
In consideration of the practical application, there may be a case where the ratio is smaller than the preset threshold. And under the condition that the occupation ratio is smaller than the preset threshold value, the service data contained in the network flow file is less. The captured network flow file has a corresponding port, and if the network flow file captured from the port contains less service data, the newly captured network flow file does not contain the service data or only contains a small amount of service data with a high probability if the network flow file is still captured from the port subsequently.
Therefore, in the embodiment of the present application, in order to improve the success rate of capturing the network traffic file, the port blacklist may be set according to the port information corresponding to the data packet that is not matched with the protocol whitelist, when the duty ratio is smaller than the preset threshold value.
Under the condition that the occupation ratio is smaller than the preset threshold value, the number of data packets unmatched with the protocol white list is often large, port information corresponding to the data packets is also large, and if all the port information is added into the port black list, the data volume acquired by a subsequent network traffic file can be greatly influenced. Therefore, in practical application, the ports corresponding to the data packets that do not match the protocol white list may be classified and summarized to obtain the number of each port appearing in the network traffic file, and the first N ports with the highest port number are added to the port black list. The value of N may be set based on actual requirements, for example, N may be set to 3.
In practical application, when setting the port blacklist, the protocol blacklist may be set independently, except for setting according to the port information corresponding to the data packet that does not match with the protocol whitelist. The protocol black list may include a protocol name without service data.
And under the condition that the occupation ratio is smaller than the preset threshold value, comparing the protocol name contained in the network traffic file with a protocol blacklist, and setting the port blacklist according to the port information corresponding to the data packet matched with the protocol blacklist.
By setting the port blacklist, when the network traffic file is captured subsequently, the ports included in the port blacklist can be avoided, that is, the network traffic file is not captured from the ports, so that the quality of the network traffic file is improved, and the captured network traffic file contains service data as much as possible.
In consideration of practical application, the network traffic file contains data which is useless for asset identification, such as intranet communication data in an enterprise, and the data is useless for asset identification analysis. But the protocol name to which these data belong may be present in the protocol white list, thereby affecting the proportion of traffic data in the network traffic file.
Therefore, in the embodiment of the present application, in order to reduce the influence of the useless data on the proportion, a business data blacklist may be constructed based on the useless data, i.e., the non-business data, after obtaining the useless data. Correspondingly, the service database may further include a service data blacklist composed of non-service data.
In a specific implementation, the feature analysis may be performed on the useless data, and after the useless data is converted into hexadecimal, the regular expression rule is generated. And summarizing regular expression rules corresponding to all the useless data to be used as a service data blacklist.
After the number of the data packets matched with the protocol white list is determined according to the protocol names corresponding to the protocol information, target data packets unmatched with the service data black list can be screened out from the data packets matched with the protocol white list; and taking the number of the target data packets as the final number of the data packets.
Besides setting the service data blacklist, the IP is also a factor influencing the service data, and in practical applications, some IPs cannot obtain useful service data, for example, in a medical client scenario, some IPs are IPs of a scanner, and network traffic flowing through the scanner may simultaneously contain service data with various devices, which interferes with asset identification analysis. Therefore, in the embodiment of the present application, the IP blacklist may be composed based on the IPs of the non-service data. Correspondingly, the service database may further include an IP blacklist consisting of IPs for non-service data.
After the target data packets which are not matched with the service data blacklist are screened out from the data packets which are matched with the protocol white list, the target data packets which are not matched with the IP blacklist can be screened out from the target data packets; and taking the number of the target data packets which are not matched with the IP blacklist as the final number of the data packets.
It should be noted that, in the embodiment of the present application, the sequence of comparing the data packet with the service data blacklist and the IP blacklist is not limited. In the above description, an example is given of an execution sequence in which target data packets that do not match the service data blacklist are screened from data packets that match the protocol whitelist, and then target data packets that do not match the IP blacklist are screened from the target data packets. In practical application, the target data packet that is not matched with the IP blacklist may be screened from the data packets that are matched with the protocol white list, and then the target data packet that is not matched with the service data blacklist may be screened from the target data packet.
By setting the service data blacklist and the IP blacklist, the data content contained in the network traffic file can be further analyzed on the basis of analyzing the protocol type contained in the network traffic file, so that a data packet without service data is discarded, the influence of the data packet without service data on the calculation proportion is avoided, the proportion accuracy is improved, and whether the network traffic file can be used as an effective file or not can be more accurately evaluated.
In the embodiment of the application, when the network traffic file is analyzed, data included in the network traffic file can be classified and summarized according to the protocol type. In order to facilitate subsequent operators to know the distribution condition of the service data in the network flow file, an identification report of the network flow file can be generated after the network flow file is determined to be an effective file; the identification report may include the protocol name contained in the network traffic file, the content of the field corresponding to each of the same protocol name, and/or the proportion of the service data in the network traffic file.
By generating the identification report of the network flow file, the analyzed data can be visually displayed in a list form, so that an operator can directly know the distribution condition of the service data in the network flow file.
Fig. 3 is a schematic structural diagram of an apparatus for identifying a network traffic file according to an embodiment of the present application, including an analyzing unit 31, a matching unit 32, and a determining unit 33;
the analysis unit 31 is configured to analyze the acquired network traffic file according to a set protocol analysis manner to obtain protocol information of the network traffic file;
the matching unit 32 is used for matching the protocol information with a set service database to determine the proportion of the service data in the network flow file;
the determining unit 33 is configured to determine that the network traffic file is a valid file when the ratio is greater than or equal to a preset threshold.
Optionally, the parsing unit includes an extracting subunit and a summarizing subunit;
the extraction subunit is used for extracting the session information of each data packet in the network flow file; wherein the session information comprises a protocol name;
and the collecting subunit is used for collecting the field contents corresponding to the same protocol name in the network flow file according to the respective field type corresponding to each protocol type.
Optionally, the service database includes a protocol white list composed of protocols to which the service data belongs; the matching unit comprises a screening subunit, a gathering subunit and a determining subunit;
the screening subunit is used for screening out a target protocol name matched with the protocol white list from the network flow file;
the summarizing subunit is used for summarizing the data packets corresponding to the target protocol name to obtain the number of the data packets matched with the protocol white list;
and the determining subunit is used for determining the proportion of the service data in the network traffic file based on the number of the data packets and the total number of the data packets of the network traffic file.
Optionally, the system further comprises a setting unit;
and the setting unit is used for setting a port blacklist according to the port information corresponding to the data packet which is not matched with the protocol white list under the condition that the occupation ratio is smaller than a preset threshold value.
Optionally, the service database further includes a service data blacklist composed of non-service data; the device also comprises a data screening unit;
the data screening unit is used for screening out target data packets which are not matched with the service data blacklist from the data packets which are matched with the protocol white list;
and the unit is used for taking the number of the target data packets as the final number of the data packets.
Optionally, the service database further includes an IP blacklist composed of IPs for non-service data; the device also comprises an IP screening unit;
the IP screening unit is used for screening out target data packets which are not matched with the IP blacklist from the target data packets;
and the unit is used for taking the number of the target data packets which do not match with the IP blacklist as the final data packet number.
Optionally, the system further comprises a generating unit;
the generating unit is used for generating an identification report of the network flow file after the network flow file is determined to be a valid file; the identification report comprises protocol names contained in the network traffic file, field contents corresponding to the same protocol names and/or the proportion of service data in the network traffic file.
The description of the features in the embodiment corresponding to fig. 3 may refer to the related description of the embodiment corresponding to fig. 2, and is not repeated here.
According to the technical scheme, the acquired network traffic file is analyzed according to a set protocol analysis mode to obtain the protocol information of the network traffic file; the protocol information reflects the protocol used by each packet in the network traffic file. The distribution of the traffic data has a strong correlation with the protocol. The service data is often contained in data packets corresponding to certain specific protocols, so that in the application, a service database can be constructed based on the protocols, the protocol information is matched with the set service database, and the proportion of the service data in the network traffic file is determined. Under the condition that the ratio is greater than or equal to the preset threshold, it is indicated that the network traffic file contains a large amount of service data, and the network traffic file can be used for subsequent asset identification and other work, so that under the condition that the ratio is greater than or equal to the preset threshold, the network traffic file can be determined to be an effective file. In the technical scheme, the protocol information contained in the network flow file is analyzed based on the strong correlation between the service data and the protocol, whether the network flow file contains the required service data can be automatically identified, manual analysis is not needed, and the identification efficiency of the service data is improved.
Fig. 4 is a structural diagram of an electronic device according to an embodiment of the present application, and as shown in fig. 4, the electronic device includes: a memory 20 for storing a computer program;
a processor 21, configured to implement the steps of the method for identifying a network traffic file as described in the above embodiments when executing the computer program.
The electronic device provided by the embodiment may include, but is not limited to, a smart phone, a tablet computer, a notebook computer, or a desktop computer.
The processor 21 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and the like. The processor 21 may be implemented in at least one hardware form of a DSP (Digital Signal Processing), an FPGA (Field-Programmable Gate Array), and a PLA (Programmable Logic Array). The processor 21 may also include a main processor and a coprocessor, where the main processor is a processor for Processing data in an awake state, and is also called a Central Processing Unit (CPU); a coprocessor is a low power processor for processing data in a standby state. In some embodiments, the processor 21 may be integrated with a GPU (Graphics Processing Unit), which is responsible for rendering and drawing the content required to be displayed on the display screen. In some embodiments, the processor 21 may further include an AI (Artificial Intelligence) processor for processing a calculation operation related to machine learning.
The memory 20 may include one or more computer-readable storage media, which may be non-transitory. Memory 20 may also include high speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In this embodiment, the memory 20 is at least used for storing the following computer program 201, wherein after being loaded and executed by the processor 21, the computer program can implement the relevant steps of the method for identifying a network traffic file disclosed in any one of the foregoing embodiments. In addition, the resources stored in the memory 20 may also include an operating system 202, data 203, and the like, and the storage manner may be a transient storage manner or a permanent storage manner. Operating system 202 may include, among others, Windows, Unix, Linux, and the like. Data 203 may include, but is not limited to, protocol information, a traffic database, and the like.
In some embodiments, the electronic device may further include a display 22, an input/output interface 23, a communication interface 24, a power supply 25, and a communication bus 26.
Those skilled in the art will appreciate that the configuration shown in fig. 4 is not intended to be limiting of electronic devices and may include more or fewer components than those shown.
It is to be understood that, if the identification method of the network traffic file in the above embodiment is implemented in the form of a software functional unit and sold or used as a stand-alone product, it may be stored in a computer readable storage medium. Based on such understanding, the technical solutions of the present application may be substantially or partially implemented in the form of a software product, which is stored in a storage medium and executes all or part of the steps of the methods of the embodiments of the present application, or all or part of the technical solutions. And the aforementioned storage medium includes: a U disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), an electrically erasable programmable ROM, a register, a hard disk, a removable magnetic disk, a CD-ROM, a magnetic or optical disk, and other various media capable of storing program codes.
Based on this, the embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the steps of the method for identifying a network traffic file as described above are implemented.
The functions of the functional modules of the computer-readable storage medium according to the embodiment of the present invention may be specifically implemented according to the method in the foregoing method embodiment, and the specific implementation process may refer to the related description of the foregoing method embodiment, which is not described herein again.
The method, the apparatus, the electronic device, and the computer-readable storage medium for identifying a network traffic file provided in the embodiments of the present application are described in detail above. The embodiments are described in a progressive manner in the specification, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.
Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
The foregoing describes a method, an apparatus, an electronic device, and a computer-readable storage medium for identifying a network traffic file provided in the present application in detail. The principles and embodiments of the present invention are explained herein using specific examples, which are presented only to assist in understanding the method and its core concepts. It should be noted that, for those skilled in the art, it is possible to make various improvements and modifications to the present invention without departing from the principle of the present invention, and those improvements and modifications also fall within the scope of the claims of the present application.

Claims (10)

1. A method for identifying a network traffic file is characterized by comprising the following steps:
analyzing the acquired network traffic file according to a set protocol analysis mode to obtain protocol information of the network traffic file;
matching the protocol information with a set service database to determine the proportion of service data in the network traffic file;
and determining the network traffic file as an effective file under the condition that the ratio is greater than or equal to a preset threshold value.
2. The method for identifying a network traffic file according to claim 1, wherein the analyzing the acquired network traffic file according to a set protocol analysis manner to obtain the protocol information of the network traffic file comprises:
extracting session information of each data packet in the network flow file; wherein the session information comprises a protocol name;
and summarizing the field contents corresponding to the same protocol name in the network flow file according to the respective corresponding field type of each protocol type.
3. The method of claim 2, wherein the traffic database comprises a protocol white list consisting of protocols to which the traffic data belongs;
the matching the protocol information with a set service database and determining the proportion of the service data in the network traffic file comprises:
screening out a target protocol name matched with the protocol white list from the network flow file;
summarizing the data packets corresponding to the target protocol name to obtain the number of the data packets matched with the protocol white list;
and determining the proportion of the service data in the network flow file based on the number of the data packets and the total number of the data packets of the network flow file.
4. The method for identifying network traffic files according to claim 3, further comprising:
and under the condition that the ratio is smaller than a preset threshold value, setting a port blacklist according to port information corresponding to the data packet which is not matched with the protocol whitelist.
5. The method of claim 3, wherein the service database further comprises a service data blacklist comprising non-service data;
after determining the number of the data packets matched with the protocol white list according to the respective corresponding protocol names of the protocol information, the method further comprises the following steps:
screening out target data packets which are not matched with the service data blacklist from the data packets which are matched with the protocol white list;
and taking the number of the target data packets as the final number of the data packets.
6. The method of claim 5, wherein the traffic database further comprises an IP blacklist of IPs for non-traffic data;
after the step of screening out the target data packets which are not matched with the service data blacklist from the data packets which are matched with the protocol whitelist, the method further comprises the following steps:
screening out target data packets which are not matched with the IP blacklist from the target data packets;
the taking the number of the target data packets as the final number of the data packets comprises:
and taking the number of the target data packets which are not matched with the IP blacklist as the final number of the data packets.
7. The method for identifying a network traffic file according to claim 2, further comprising, after the determining that the network traffic file is a valid file:
generating an identification report of the network traffic file; the identification report comprises protocol names contained in the network traffic file, field contents corresponding to the same protocol names respectively and/or the proportion of service data in the network traffic file.
8. The device for identifying the network flow file is characterized by comprising an analysis unit, a matching unit and a determination unit;
the analysis unit is used for analyzing the acquired network traffic file according to a set protocol analysis mode to obtain protocol information of the network traffic file;
the matching unit is used for matching the protocol information with a set service database and determining the proportion of service data in the network traffic file;
the determining unit is configured to determine that the network traffic file is a valid file when the ratio is greater than or equal to a preset threshold.
9. An electronic device, comprising:
a memory for storing a computer program;
a processor for executing said computer program to carry out the steps of the method of identifying a network traffic file according to any one of claims 1 to 7.
10. A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, which computer program, when being executed by a processor, carries out the steps of the method for identifying a network traffic file according to any one of claims 1 to 7.
CN202111632893.8A 2021-12-28 2021-12-28 Network traffic file identification method and device, electronic equipment and medium Pending CN114338436A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111632893.8A CN114338436A (en) 2021-12-28 2021-12-28 Network traffic file identification method and device, electronic equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111632893.8A CN114338436A (en) 2021-12-28 2021-12-28 Network traffic file identification method and device, electronic equipment and medium

Publications (1)

Publication Number Publication Date
CN114338436A true CN114338436A (en) 2022-04-12

Family

ID=81014698

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111632893.8A Pending CN114338436A (en) 2021-12-28 2021-12-28 Network traffic file identification method and device, electronic equipment and medium

Country Status (1)

Country Link
CN (1) CN114338436A (en)

Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2001001272A2 (en) * 1999-06-30 2001-01-04 Apptitude, Inc. Method and apparatus for monitoring traffic in a network
EP2819365A1 (en) * 2013-06-24 2014-12-31 Alcatel Lucent Network traffic inspection
CN104348811A (en) * 2013-08-05 2015-02-11 深圳市腾讯计算机系统有限公司 Method and device for detecting attack of DDoS (distributed denial of service)
US9462014B1 (en) * 2015-04-23 2016-10-04 Datiphy Inc. System and method for tracking and auditing data access in a network environment
CN107592303A (en) * 2017-08-28 2018-01-16 北京明朝万达科技股份有限公司 A kind of high speed mirror is as the extracting method and device of outgoing document in network traffics
US20180337826A1 (en) * 2017-05-22 2018-11-22 Netscout Systems, Inc Fault-tolerant monitoring of tunneled ip flows
CN111131070A (en) * 2019-12-19 2020-05-08 北京浩瀚深度信息技术股份有限公司 Port time sequence-based network traffic classification method and device and storage medium
CN111277570A (en) * 2020-01-10 2020-06-12 中电长城网际系统应用有限公司 Data security monitoring method and device, electronic equipment and readable medium
CN111901300A (en) * 2020-06-24 2020-11-06 武汉绿色网络信息服务有限责任公司 Method and device for classifying network traffic
CN112235160A (en) * 2020-10-14 2021-01-15 福建奇点时空数字科技有限公司 Flow identification method based on protocol data deep layer detection
CN112350956A (en) * 2020-10-23 2021-02-09 新华三大数据技术有限公司 Network traffic identification method, device, equipment and machine readable storage medium
WO2021135532A1 (en) * 2020-07-21 2021-07-08 平安科技(深圳)有限公司 Cloud network vulnerability discovery method, apparatus, electronic device, and medium
WO2021169730A1 (en) * 2020-02-25 2021-09-02 深信服科技股份有限公司 Method and device for data processing, and storage medium
WO2021238248A1 (en) * 2020-05-27 2021-12-02 广东浪潮智慧计算技术有限公司 Network traffic classification processing method and apparatus, device, and medium
CN113746849A (en) * 2021-09-07 2021-12-03 深信服科技股份有限公司 Method, device, equipment and storage medium for identifying equipment in network

Patent Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2001001272A2 (en) * 1999-06-30 2001-01-04 Apptitude, Inc. Method and apparatus for monitoring traffic in a network
EP2819365A1 (en) * 2013-06-24 2014-12-31 Alcatel Lucent Network traffic inspection
CN104348811A (en) * 2013-08-05 2015-02-11 深圳市腾讯计算机系统有限公司 Method and device for detecting attack of DDoS (distributed denial of service)
US9462014B1 (en) * 2015-04-23 2016-10-04 Datiphy Inc. System and method for tracking and auditing data access in a network environment
US20180337826A1 (en) * 2017-05-22 2018-11-22 Netscout Systems, Inc Fault-tolerant monitoring of tunneled ip flows
CN107592303A (en) * 2017-08-28 2018-01-16 北京明朝万达科技股份有限公司 A kind of high speed mirror is as the extracting method and device of outgoing document in network traffics
CN111131070A (en) * 2019-12-19 2020-05-08 北京浩瀚深度信息技术股份有限公司 Port time sequence-based network traffic classification method and device and storage medium
CN111277570A (en) * 2020-01-10 2020-06-12 中电长城网际系统应用有限公司 Data security monitoring method and device, electronic equipment and readable medium
WO2021169730A1 (en) * 2020-02-25 2021-09-02 深信服科技股份有限公司 Method and device for data processing, and storage medium
WO2021238248A1 (en) * 2020-05-27 2021-12-02 广东浪潮智慧计算技术有限公司 Network traffic classification processing method and apparatus, device, and medium
CN111901300A (en) * 2020-06-24 2020-11-06 武汉绿色网络信息服务有限责任公司 Method and device for classifying network traffic
WO2021135532A1 (en) * 2020-07-21 2021-07-08 平安科技(深圳)有限公司 Cloud network vulnerability discovery method, apparatus, electronic device, and medium
CN112235160A (en) * 2020-10-14 2021-01-15 福建奇点时空数字科技有限公司 Flow identification method based on protocol data deep layer detection
CN112350956A (en) * 2020-10-23 2021-02-09 新华三大数据技术有限公司 Network traffic identification method, device, equipment and machine readable storage medium
CN113746849A (en) * 2021-09-07 2021-12-03 深信服科技股份有限公司 Method, device, equipment and storage medium for identifying equipment in network

Similar Documents

Publication Publication Date Title
US9411957B2 (en) Method and device for optimizing and configuring detection rule
CN111277570A (en) Data security monitoring method and device, electronic equipment and readable medium
CN111277587A (en) Malicious encrypted traffic detection method and system based on behavior analysis
CN104270392A (en) Method and system for network protocol recognition based on tri-classifier cooperative training learning
CN110868409A (en) Passive operating system identification method and system based on TCP/IP protocol stack fingerprint
CN110365636B (en) Method and device for judging attack data source of industrial control honeypot
CN111181978B (en) Abnormal network traffic detection method and device, electronic equipment and storage medium
DE112021003315T5 (en) QUICKLY IDENTIFY VIOLATIONS AND ATTACKS IN NETWORK TRAFFIC PATTERNS
CN112800424A (en) Botnet malicious traffic monitoring method based on random forest
CN103944788A (en) Unknown trojan detecting method based on network communication behaviors
CN114338600B (en) Equipment fingerprint selection method and device, electronic equipment and medium
EP3242240A1 (en) Malicious communication pattern extraction device, malicious communication pattern extraction system, malicious communication pattern extraction method and malicious communication pattern extraction program
CN114401097B (en) HTTPS service flow identification method based on SSL certificate fingerprint
CN110225009B (en) Proxy user detection method based on communication behavior portrait
CN109309665B (en) Access request processing method and device, computing device and storage medium
CN107360062B (en) DPI equipment identification result verification method and system and DPI equipment
WO2016201876A1 (en) Service identification method and device for encrypted traffic, and computer storage medium
CN113315785A (en) Alarm reduction method, device, equipment and computer readable storage medium
CN112073364A (en) DDoS attack identification method, system, equipment and readable storage medium based on DPI
US11233703B2 (en) Extending encrypted traffic analytics with traffic flow data
CN114338436A (en) Network traffic file identification method and device, electronic equipment and medium
CN114205146B (en) Processing method and device for multi-source heterogeneous security log
CN110620682B (en) Resource information acquisition method and device, storage medium and terminal
CN110602038B (en) Abnormal UA detection and analysis method and system based on rules
US9049170B2 (en) Building filter through utilization of automated generation of regular expression

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination