CN114205161B

CN114205161B - Network attacker discovery and tracking method

Info

Publication number: CN114205161B
Application number: CN202111522637.3A
Authority: CN
Inventors: 全威; 谢景扬
Original assignee: Beijing Yingan Electronic Technology Co ltd
Current assignee: Beijing Yingan Electronic Technology Co ltd
Priority date: 2021-12-13
Filing date: 2021-12-13
Publication date: 2024-03-29
Anticipated expiration: 2041-12-13
Also published as: CN114205161A

Abstract

The invention is suitable for the field of computers, and provides a network attacker discovery and tracking method, which uses a WEB fingerprint matching algorithm to match fingerprint data with historical attacker fingerprint data one by one to obtain fingerprint matching vectors; calculating a fingerprint matching vector and a fingerprint data weight vector by using an attacker judging algorithm to obtain a matching score; judging whether the matching score is larger than an attacker threshold value, if so, judging that the matching score is a old attacker; if the attack number is smaller than the threshold value, the attack number is judged to be a new attack. The new view angle of network security defending is introduced, so that the network attacker can be found and tracked, passive defense is changed into active defense, and an information system aiming at the attacker can be established on the basis of finding and tracking the network attacker, so that the efficiency of network security defending is improved under the new view angle.

Description

Network attacker discovery and tracking method

Technical Field

The invention belongs to the field of computers, and particularly relates to a network attacker discovery and tracking method.

Background

The traditional prevention mode of the industry aiming at network attack is mainly based on the continuous improvement of the detection accuracy degree of threat events. When a threat event is found, a security response process (not limited to alerting, intercepting, logging, countering, etc.) is triggered. However, in this way, the attack event is always passively defended once, the accumulated threat information is mainly describing the threat event (such as threat carrier, attack path, attack surface, etc.), and in the aspect of threat subject (i.e. the information of the attacker), the main basis can be only the IP address. If only the IP address information is relied on, the information of an attacker cannot be effectively mastered, and thus, the attacker cannot be tracked easily.

The essence of network attack and defense is the fight between people, the passive defense of a taste, and the threat source can not be effectively restrained all the time. The network attack is required to be actively prevented, even the pre-prevention is realized, the viewing angle is required to be converted, and not only the attack event is found, but also the threat main body is effectively found and marked: an attacker can continuously track the attacker according to the mark, and the attacker can be quickly discovered when the attacker reappears, so that the prevention is finished in advance before the attack intention is achieved.

The method is applied to the online advertisement accurate pushing application of the e-commerce website at the earliest based on the marking, identifying and tracking of WEB browsers by WEB fingerprints. The invention relates to the establishment of a network attacker discovery and re-identification model, which is used for tracking the network attacker, and the invention is based on the application of WEB fingerprint to realize the purpose of discovery and tracking of the network attacker.

Disclosure of Invention

The embodiment of the invention provides a network attacker discovery and tracking method, which aims to solve the technical problems.

The embodiment of the invention is realized in such a way that the method for discovering and tracking the network attacker comprises the following steps:

acquiring original WEB fingerprint data of an attacker through a honey pot, extracting the data, and carrying out data preprocessing on the extracted fingerprint data;

matching the fingerprint data with the fingerprint data of the historical attacker one by using a WEB fingerprint matching algorithm to obtain a fingerprint matching vector;

calculating a fingerprint matching vector and a fingerprint data weight vector by using an attacker judging algorithm to obtain a matching score;

judging whether the matching score is larger than an attacker threshold value, if so, judging that the matching score is a old attacker; if the attack number is smaller than the threshold value, judging that an attacker appears newly;

regardless of the threshold determination, the fingerprint vector is saved into the historical attacker fingerprint dataset.

Further, the steps of obtaining the original WEB fingerprint data of the attacker through the honeypot, extracting the data, and performing fingerprint extraction and data preprocessing in the data preprocessing step on the extracted fingerprint data comprise the following steps: checking normalization and integrity of fingerprint data, extracting effective fingerprint data according to a preset data format, deleting useless data, and converting the data into a computer readable form; the categories of fingerprint data include network fingerprints, software fingerprints, and hardware fingerprints.

Furthermore, the newly collected attacker fingerprint data and all the historical attacker fingerprint data are matched one by using a WEB fingerprint matching algorithm;

in the execution process of the WEB fingerprint matching algorithm, a plurality of fingerprint data in a single fingerprint data set are classified into critical fingerprints and non-critical fingerprints according to categories.

Further, the step of matching the fingerprint data with the fingerprint data of the history attacker one by one specifically includes: for the key fingerprints, matching is carried out by an identity matching method, and if the key fingerprints of the two groups of fingerprints are identical, the matching is considered to be successful; otherwise, the matching fails; for non-critical fingerprints, matching is carried out through a fingerprint similarity matching algorithm, and if the character string distance of the non-critical fingerprints of the two groups of fingerprints is smaller than a threshold value, the matching is considered to be successful; otherwise, the matching fails; the similarity algorithm is specifically a SimHash character string distance algorithm or a Levenshtein character string distance algorithm.

Further, in the fingerprint similarity matching algorithm of the non-critical fingerprints, the assignment of the non-critical fingerprint similarity threshold is obtained based on a machine learning method, a distance value model of specific non-critical fingerprints in all fingerprint sets of the same attacker is constructed based on data acquired by a large number of network attack and defense experiments, and threshold assignment obtained by tuning after training is performed.

Further, in the fingerprint matching vector, each vector element represents the matching result of each fingerprint data, and takes a value of 0 or 1;0 represents unsuccessful match and 1 represents successful match.

Further, each piece of data in the historical attacker fingerprint data set represents the attacker fingerprint data which is acquired, and the attacker can be judged to be the attacker which appears or the attacker which appears at the present time by matching the attacker fingerprint data with the newly acquired attacker fingerprint data one by one, so that the tracking capability of the attacker is realized after successful matching.

Further, in the attacker judging algorithm, assignment of the fingerprint data weight is based on a machine learning method, the relationship between the fingerprint data collected by a large number of network attack and defense experiments and the attacker is used for constructing a contribution degree model of all fingerprint data on the corresponding relationship between the fingerprint and the attacker, and then the tuning weight value is obtained through data training.

Further, the calculation of the fingerprint matching vector and the fingerprint data weight vector comprises vector point multiplication calculation and vector element summation, and finally matching scores are obtained.

Further, the attacker threshold is based on a machine learning method, data of the actual attacker fingerprint attribution condition acquired by a large number of network attack and defense experiments, a model of score obtained by an attacker judgment algorithm and a corresponding value range of the attacker is constructed, and a relatively objective threshold is obtained after data training is carried out.

The invention has the beneficial effects that: the new view angle of network security defending is introduced, so that the network attacker can be found and tracked, passive defense is changed into active defense, and an information system aiming at the attacker can be established on the basis of finding and tracking the network attacker, so that the efficiency of network security defending is improved under the new view angle; an automatic method for discovering and tracking the attacker is established, so that the manpower resources of a network security defender are effectively saved, the security operation and maintenance working efficiency is greatly improved, and the labor cost and the time cost of the defender on the traceability analysis of the attacker are reduced.

Drawings

FIG. 1 is a flow chart of a method for discovering and tracking network attackers;

FIG. 2 is a flow chart of fingerprint data preprocessing in a network attacker's discovery and tracking method;

FIG. 3 is a flowchart of a WEB fingerprint matching algorithm in a network attacker's discovery and tracking method;

FIG. 4 is a flowchart of a decision attacker algorithm in a network attacker discovery and tracking method;

fig. 5 is a statistical diagram of experimental results of a network attacker discovery and tracking method.

Detailed Description

The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

Fig. 1 shows a technical scheme flowchart of a network attacker discovery and tracking method according to an embodiment of the invention, where the method includes:

step 100: acquiring an attacker fingerprint data set, and inducing the attacker to access a false service system by deploying a WEB honey pot, wherein in the access process, the attacker collects fingerprint data by using a JavaScript script;

step 200: the method comprises the steps of effectively extracting the original data of an attacker fingerprint data set, including network fingerprints, software fingerprints and hardware fingerprints, and then preprocessing the data, including data normalization, filling blank data, data de-duplication, converting the data format into computer readable data and the like;

step 300: the WEB fingerprint matching algorithm divides the attacker fingerprint attribute after data processing into two types of critical fingerprint and non-critical fingerprint, adopts the matching method of the same degree and the matching method of a similarity model respectively, and performs one-by-one matching with the historical attacker fingerprint data set, and each matching can obtain a fingerprint matching vector.

Step 400: the attacker judging algorithm performs vector point multiplication on the fingerprint matching vector and the fingerprint attribute weight assignment vector, then performs vector element summation to obtain a matching score, compares the score with an attacker judging threshold value, and can judge that the attacker is old if the score exceeds the threshold value; if the threshold value is not exceeded, it is determined that an attacker is newly present.

Step 500: and the historical attacker fingerprint data set stores all the historical collected attacker fingerprint data. The data set can be matched with the newly acquired attacker fingerprint data set one by one, and meanwhile, a plurality of models constructed by the machine learning method in the method can be trained, so that the effectiveness of the method is improved.

Fig. 2 shows a flowchart of fingerprint data preprocessing in a network attacker discovery and tracking method according to an embodiment of the invention, where the fingerprint data preprocessing specifically includes the following steps:

step 210: and carrying out effective fingerprint extraction on the original data of the attacker fingerprint data set, wherein the fingerprint extraction comprises network fingerprints, software fingerprints and hardware fingerprints. Because the original fingerprint data contains a large amount of invalid information, a field capable of possessing fingerprint uniqueness needs to be extracted from the original fingerprint data; on the other hand, part of fingerprint attribute is only used for marking whether a certain software function is started or not, so that the value of the fingerprint attribute is 0 or 1, and the value is assigned again through a logic judgment method.

Step 220: the data preprocessing is to reform fingerprint data of an attacker for convenience of a subsequent algorithm, for example, partial large values need to be reformed to a range convenient to calculate, some fingerprint attributes which are not acquired need to be filled with blank data, some redundant data need to be de-duplicated, and finally the data need to be converted into a format supported by the subsequent algorithm.

Fig. 3 shows a flowchart of a WEB fingerprint matching algorithm in a method for discovering and tracking network attackers according to an embodiment of the present invention, where the WEB fingerprint matching algorithm specifically includes:

step 310: the matching method of the key fingerprints is identity matching, if a certain key fingerprint value of two groups of fingerprint data sets is identical, the matching is judged to be successful, and the element bit of the corresponding fingerprint attribute of the fingerprint matching vector is assigned as 1; otherwise, the matching is judged to be failed, and the element bit of the corresponding fingerprint attribute of the fingerprint matching vector is assigned to 0.

Step 320: the matching method of the non-key fingerprints is a fingerprint similarity matching model, and the adopted character string distance calculation method comprises SimHash, levenshtein and other algorithms by comparing character string distances (namely character string similarity) of certain non-key fingerprints of two groups of fingerprint data sets. And judging whether the matching of the specific non-critical fingerprint is successful or not by judging the relation between the distance and the similarity threshold value. If the distance is smaller than the similarity threshold, judging that the matching is successful, and assigning an element bit of the corresponding fingerprint attribute of the fingerprint matching vector to be 1; otherwise, the matching is judged to be failed, and the element bit of the corresponding fingerprint attribute of the fingerprint matching vector is assigned to 0.

Step 330: the machine learning training fingerprint similarity threshold value assignment is to construct a distance value model of specific non-key fingerprints in all fingerprint sets of the same attacker based on data acquired by a large number of network attack and defense experiments, and perform threshold value assignment obtained by tuning after training.

Step 340: the fingerprint matching vector is obtained by taking the element bit representing the fingerprint attribute as 1 or 0 according to the matching result of all fingerprint attributes (1 is successful in matching and 0 is unsuccessful in matching), and finally obtaining the vector.

Step 510: each piece of data in the historical attacker fingerprint data set represents the attacker fingerprint data which is acquired, and the attacker fingerprint data is matched with the newly acquired attacker fingerprint data one by one, so that whether the current access object is an attacker which appears or an attacker which appears newly can be judged, and after successful matching, the tracking capability of the attacker is realized.

Fig. 4 shows a flowchart of a decision attacker algorithm in a network attacker discovery and tracking method according to an embodiment of the invention, wherein the decision attacker algorithm specifically comprises:

step 410: the fingerprint attribute weight is assigned, and the ontology is a vector, wherein each element represents the weight relation of each fingerprint attribute in determining whether the attackers to which the two groups of fingerprints belong are the same attacker.

Step 420: the machine learning training fingerprint attribute weight optimization is based on a machine learning method, a contribution degree model of all fingerprint attributes to the corresponding relationship between fingerprints and attackers is constructed according to the relationship between the fingerprint data collected by a large number of network attack and defense experiments and the attackers, and the optimization weight value is obtained through data training.

Step 430: and calculating a matching score, namely performing vector point multiplication operation on the fingerprint matching vector and the fingerprint attribute weight vector, and performing vector element summation calculation on the obtained vector. The score is 0-1, and the closer the value is to 1, the higher the confidence level that the representative fingerprint matches.

Step 440: the machine learning training attacker judges the threshold value assignment, is based on machine learning method, by the actual attacker fingerprint attribution data of the situation that a large number of network attack and defense experiments gather, construct the model of the value range that corresponds between score and attacker that the decision algorithm of attacker obtains, carry on the relative objective threshold value that the data train back obtain.

Step 450: whether the matching score is larger than the attacker judgment threshold value or not, comparing the matching score with the threshold value according to the judgment threshold value obtained in the step 440, and judging that the attacker is a old attacker if the matching score is larger than the threshold value; if the threshold value is smaller than the threshold value, the attacker is judged to be a new attacker.

Step 510: regardless of the matching scenario in step 450, the newly acquired attacker fingerprint data set will be included in the historical attacker fingerprint data set after the matching work is completed.

Test examples

And 2021, 9 months to 10 months, taking a penetration test as a name, inviting a plurality of WEB penetration manufacturers to provide penetration test personnel for penetrating the honey pot environment provided by the inventor. Before the experiment starts, an account number is distributed to each tester, so that account number labels are arranged in the obtained fingerprint data, and the fingerprint data collected in the process of one month can be compared with the account number labels after being calculated by the method, so that the efficiency of the method in finding and tracking an attacker is judged.

Dividing the test experiment into three cases, wherein 1, fingerprint data of the same attacker in different periods (for one month) are matched with own fingerprints; 2. score/3 of matching of fingerprints of different attackers, matching of the fingerprint of one attacker with the own fingerprint and other attacker fingerprints.

By adopting the discovery and tracking method, the experimental data under three different conditions are subjected to matching calculation, the obtained experimental statistical result is shown in fig. 5, and the name is coded for protecting the identity information of the experimenter.

Experiment 1 corresponds to experiment result 1, and the result shows that after the fingerprint matching algorithm and the attacker judging algorithm flow, the obtained matching score is as high as 0.99, and the actual situation also truly proves that the fingerprints belong to the same attacker.

Experiment 2 corresponds to experiment result 2, and the result shows that under the condition that fingerprints of different attackers are matched with each other, the matching score is only 0.079 at the highest, and the condition that the matching error occurs is extremely low by adopting the method.

Experiment 3 corresponds to experiment result 3, and the mixed experiment result shows that the score of the fingerprint matching with the first row is up to 0.921, the score of the fingerprint matching with other attackers is not higher than 0.8, and the score of the fingerprint matching with the other attackers in the second row is only 0.561, so that the method has no function of guiding decision.

It should be understood that, although the steps in the flowcharts of the embodiments of the present invention are shown in order as indicated by the arrows, these steps are not necessarily performed in order as indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in various embodiments may include multiple sub-steps or stages that are not necessarily performed at the same time, but may be performed at different times, nor do the order in which the sub-steps or stages are performed necessarily performed in sequence, but may be performed alternately or alternately with at least a portion of the sub-steps or stages of other steps or other steps.

Those skilled in the art will appreciate that all or part of the processes in the methods of the above embodiments may be implemented by a computer program for instructing relevant hardware, where the program may be stored in a non-volatile computer readable storage medium, and where the program, when executed, may include processes in the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the various embodiments provided herein may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), memory bus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.

The technical features of the above-described embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above-described embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The foregoing examples illustrate only a few embodiments of the invention and are described in detail herein without thereby limiting the scope of the invention. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the invention, which are all within the scope of the invention. Accordingly, the scope of protection of the present invention is to be determined by the appended claims.

The foregoing description of the preferred embodiments of the invention is not intended to be limiting, but rather is intended to cover all modifications, equivalents, and alternatives falling within the spirit and principles of the invention.

Claims

1. A method of network attacker discovery and tracking, the method comprising:

regardless of the threshold decision, the fingerprint vector is saved into the historical attacker fingerprint dataset;

the WEB fingerprint matching algorithm specifically comprises the following steps:

step 310, the matching method of the key fingerprints is the same matching, if a certain key fingerprint value of two fingerprint data sets is identical, the matching is judged to be successful, and the element bit assignment of the corresponding fingerprint attribute of the fingerprint matching vector is 1; otherwise, judging that the matching is failed, and assigning an element bit of the corresponding fingerprint attribute of the fingerprint matching vector to be 0;

step 320, a matching method of non-critical fingerprints is a fingerprint similarity matching model, and the adopted character string distance calculation method comprises SimHash, levenshtein algorithm by comparing character string distances of certain non-critical fingerprints of two groups of fingerprint data sets; judging whether the matching of the specific non-key fingerprints is successful or not through judging the relation between the distance and the similarity threshold value; if the distance is smaller than the similarity threshold, judging that the matching is successful, and assigning an element bit of the corresponding fingerprint attribute of the fingerprint matching vector to be 1; otherwise, judging that the matching is failed, and assigning an element bit of the corresponding fingerprint attribute of the fingerprint matching vector to be 0;

step 330, machine learning training fingerprint similarity threshold value assignment, namely constructing a distance value model of specific non-critical fingerprints in all fingerprint sets of the same attacker based on data acquired by a large number of network attack and defense experiments, and performing threshold value assignment obtained by tuning after training;

step 340, obtaining a fingerprint matching vector, namely according to the matching results of all fingerprint attributes, taking the element bit representing the fingerprint attribute as 1 or 0, wherein 1 is successful in matching, and 0 is unsuccessful in matching, and finally obtaining the vector;

the decision attacker algorithm specifically comprises:

step 410: the fingerprint attribute weight is assigned, the ontology is a vector, and each element represents the weight relation of each fingerprint attribute in judging whether the attackers to which the two groups of fingerprints belong are the same attacker or not;

step 420: the machine learning training fingerprint attribute weight optimization is based on a machine learning method, a contribution degree model of all fingerprint attributes to the corresponding relationship between fingerprints and attackers is constructed according to the relationship between the fingerprint data collected by a large number of network attack and defense experiments, and then the optimization weight value is obtained through data training;

step 430: the matching score calculation is that the fingerprint matching vector and the fingerprint attribute weight vector are subjected to vector point multiplication operation, the obtained vector is subjected to vector element summation calculation, the score value is 0-1, and the closer the value is 1, the higher the confidence degree of the fingerprint matching is represented;

step 440: the machine learning training attacker judges the threshold value assignment, is based on machine learning method, by the actual attacker fingerprint attribution data of the situation that a large number of network attack and defense experiments gather, construct the model of the value range that corresponds between value range and attacker that the decision algorithm of attacker obtains, carry on the relative objective threshold value that the data train gets;

2. The method for discovering and tracking network attacker according to claim 1, wherein the steps of acquiring original WEB fingerprint data of the attacker through the honeypot, extracting the data, and performing the fingerprint extraction and the data preprocessing in the data preprocessing step on the extracted fingerprint data include: checking normalization and integrity of fingerprint data, extracting effective fingerprint data according to a preset data format, deleting useless data, and converting the data into a computer readable form; the categories of fingerprint data include network fingerprints, software fingerprints, and hardware fingerprints.

3. The network attacker discovery and tracking method of claim 1, wherein the newly acquired attacker fingerprint data is matched with all historical attacker fingerprint data one by using a WEB fingerprint matching algorithm;

4. The network attacker discovery and tracking method of claim 3, wherein the matching fingerprint data one by one with the historical attacker fingerprint data specifically comprises: for the key fingerprints, matching is carried out by an identity matching method, and if the key fingerprints of the two groups of fingerprints are identical, the matching is considered to be successful; otherwise, the matching fails; for non-critical fingerprints, matching is carried out through a fingerprint similarity matching algorithm, and if the character string distance of the non-critical fingerprints of the two groups of fingerprints is smaller than a threshold value, the matching is considered to be successful; otherwise, the matching fails; the similarity algorithm is specifically a SimHash character string distance algorithm or a Levenshtein character string distance algorithm.

5. The network attacker's discovery and tracking method of claim 3, wherein in the fingerprint similarity matching algorithm of the non-critical fingerprint, the non-critical fingerprint similarity threshold value assignment is obtained based on a machine learning method, and based on a large number of data collected by network attack and defense experiments, a distance value model of specific non-critical fingerprint in all fingerprint sets of the same attacker is constructed, and the threshold value assignment obtained by tuning after training is performed.

6. The network attacker discovery and tracking method of claim 1, wherein each vector element in the fingerprint matching vector represents a result of matching each fingerprint data, and takes a value of 0 or 1;0 represents unsuccessful match and 1 represents successful match.

7. The network attacker discovery and tracking method of claim 1, wherein each piece of data in the historical attacker fingerprint data set represents the attacker fingerprint data which is already collected, and the attacker can be judged to be the attacker which appears or the attacker which appears newly by matching the data with the attacker fingerprint data which is newly collected one by one, so that the attacker tracking capability is realized after successful matching.

8. The network attacker discovery and tracking method of claim 1, wherein the attacker judgment algorithm, wherein the assignment of the fingerprint data weight is based on a machine learning method, the relationship between the fingerprint data collected by a large number of network attack and defense experiments and the attacker is used for constructing a contribution degree model of all fingerprint data to the corresponding relationship between the fingerprint and the attacker, and the tuning weight value is obtained through data training.

9. The network attacker discovery and tracking method of claim 1, wherein the computation of the fingerprint matching vector and the fingerprint data weight vector includes vector point multiplication computation and vector element summation, resulting in a matching score.

10. The network attacker discovery and tracking method of any one of claims 1-9, wherein the attacker threshold is based on a machine learning method, data of the actual attacker fingerprint attribution situation collected by a large number of network attack and defense experiments, a model of score obtained by an attacker judgment algorithm and a corresponding value range of the attacker is constructed, and a relatively objective threshold is obtained after data training.