Disclosure of Invention
The invention provides a network threat intelligence sharing platform based on a blockchain intelligent contract, which is characterized in that the network threat intelligence data is used as data assets by relying on an Etherhouse bottom layer architecture, the network threat intelligence data is processed through a trusted threat intelligence processing module, distributed data storage of a blockchain system is utilized, and intelligence sharing of the network threat intelligence data on a data provider and a data buyer on an autonomous peer-to-peer data platform is realized through intelligent contract development and design.
Optionally, the trusted threat information processing module is configured to filter trusted threat information and provide a trusted decision interface for a user.
Optionally, the method for determining the trusted threat intelligence processing module includes the following steps:
s1: the user suspects the attitude of threat information shared by a certain open source and needs to know the quality of the information;
s2: aggregating data of a plurality of threat information sharing platforms and suppliers, acquiring threat information from multiple sources in real time, and layering and converging the acquired threat information according to the type of the threat information;
s3: preprocessing the threat intelligence collected in a layering way into local threat intelligence stored in a form of a triple;
s4: multi-source collecting enriched information, and extracting characteristic indexes based on time, content and field knowledge dimension on influence factors of threat information credibility;
s5: taking the multidimensional credible characteristics as the description of original threat information, pre-judging the threat information, and preliminarily screening out credible threat information;
s6: and further identifying whether the threat intelligence is credible or not by using a DBN-based threat intelligence credibility judging method, and feeding back the result to the user in an interactive interface mode.
Optionally, the step S6 includes:
s61, DNS flow acquisition and preliminary processing: by configuring a mirror image port mode for the switch, directly copying network traffic bypasses of all paths of switches to a local network card, developing a capture program by using a libpcap library, capturing all DNS traffic, performing primary rapid processing, and extracting required effective information;
s62: threat intelligence-based detection: rapidly checking the primarily screened DNS message effective information by using a distributed framework Spark based on the step S5, and judging whether an obvious malicious message exists or not by Risk value calculation;
s63: detection based on domain name features: further checking the DNS message detected by the threat information by utilizing python or spark;
s64: and threat intelligence updating: based on the STIX standard and the corresponding library, the information in the threat information library can be continuously updated and updated, and the detection rate and the accuracy of the detection system are further improved.
Further, the data sharing platform provides a platform interface for threat intelligence sharing for intelligence providers and intelligence acquirers.
The invention innovatively provides a network threat information sharing mechanism based on a blockchain intelligent contract, the network threat information data is used as data assets, an ether house bottom layer framework is based, and the credible evaluation of threat information in a uniform security sharing network space of the network threat information data is realized through intelligent contract development, so that the problem of low multi-source data integration quality of the threat information sharing is solved, the data quality of a user can be informed, the satisfaction degree of a sharing platform user is increased, and the false alarm rate of threat information security application is reduced.
The method aims at the problems that the quality of threat information in a network space is not clear, and reliability evaluation indexes are difficult to select and quantify. A multi-dimensional analysis model of threat information credibility is provided, the model comprehensively considers credibility attributes of timeliness, usability, universality, authenticity, verifiability and comprehensiveness, a formal definition of multi-source heterogeneous threat information is provided, feature indexes are extracted from three dimensions of time, content verification and field knowledge to analyze and describe the threat information, and meanwhile, a deep belief network is introduced to learn more influence features of credibility evaluation. A large number of threat information samples learn potential credible evaluation factors through a deep belief network, and dimension of high-dimension complex features is gradually reduced through layer-by-layer abstract extraction, so that information contained in threat information is better represented.
The invention selects DNS (domain name system) flow as the original data of APT overall detection aiming at an APT attack chain model, the characteristics of DNS messages are known in detail and studied deeply, attack application means in different APT cases comprise DGA, C & C, FastFlux and the like, and a threat intelligence system is introduced on the basis of the DGA, C & C, FastFlux and the like, and the Risk value of the domain name can be calculated based on the threat intelligence system, so that rapid detection and alarm can be carried out on the domain name with obvious malice.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the drawings of the embodiments of the present invention. It is to be understood that the embodiments described are only a few embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the described embodiments of the invention, are within the scope of the invention.
The invention provides a block chain intelligent contract-based network threat intelligence sharing platform, which is based on an Ethern bottom layer architecture, as shown in figure 1, the platform takes network threat intelligence data as data assets, processes the network threat intelligence data through a trustable threat intelligence processing module, utilizes distributed data storage of a block chain system, and realizes intelligence sharing of the network threat intelligence data on a data provider and a data buyer on an autonomous equivalent data platform through intelligent contract development and design.
1. Trusted threat intelligence screening based on threat intelligence analysis of multi-source integration to multiple mainstream shared intelligence platforms, it was found that the generation of threat intelligence is incredible and can be regularly followed, and the incredible influence factors generally include the following points:
(1) and (3) logging information is wrong: the threat information recorder makes messy code error, even manual error, and information distortion during data input or data transmission;
(2) intelligence analysts are not professional or analysis is wrong: in a certain analysis, the subjective factors of the analysts cause the analysis to be wrong;
(3) incomplete information: the safety information analysis is only a pure conclusion, and lacks evidence, so that the user feels lack of credibility and always fails to report;
(4) outdated information: threat intelligence has great timeliness, many attacks are usually at weekly level or daily level at present, and once the intelligence is outdated, the value is no longer available;
(5) corruption by corruption: because of the information transmission of the transmitter or the transcriber;
(6) malicious counterfeiting: organizations or businesses maliciously make false intelligence for interest.
In summary, the cause of incredible threat information can be macroscopically divided into objective cause and subjective cause, but the subjective cause may be mixed with real information or real data to mask the user, and the user tries to objectively recognize the quality of the information, the discrimination difficulty of the part is high, a professional security analyst is often needed to accurately position by a security analysis tool, and the quantitative processing is difficult, so the threat information in the model establishment and design process is mainly aimed at incredible threat information caused by the objective cause, and the threat information caused by the subjective cause is only roughly analyzed for credibility.
Under the background of threatening multi-source sharing of intelligence, aiming at the characteristics of massive intelligence data, multiple sources, unclear quality and the like, how to perform credible judgment on heterogeneous intelligence by security analysis is still a challenge. The threat information is a technology which can change the attack and defense situation, the asset is not guaranteed not to be attacked, but the whole safety system is quickly coordinated when the asset is attacked at any point, and the cost of an attacker is greatly improved. The method extracts credible characteristics from three dimensions of time, domain knowledge and content, and then represents the credibility of the information according to the multidimensional credible characteristics, which is essentially information quality evaluation. And a multi-dimensional credibility feature relation is mined through a Deep Belief Network (Deep Belief Network), and higher-order credibility influence factors are extracted. The deep learning technology is an important technical means for data mining, hidden details of credibility of information can be mined in an all-around and deep manner, and the false alarm rate of threat information is reduced.
Aiming at mass unidentified threat information of multi-source integration, the invention provides an open source sharing threat information credibility analysis method for helping a user to effectively screen credible threat information, and finally provides a third party credible judgment interface of threat information for the user, wherein the credible threat information judgment flow is shown in figure 2, and the specific process is as follows:
(1) the user suspects the attitude of threat information shared by a certain open source, and the quality of the information is urgently needed to be known;
(2) and aggregating data of a plurality of threat information sharing platforms and suppliers by adopting an aggregation model, acquiring threat information from a plurality of sources in real time, and layering and converging the acquired threat information according to the type of the threat information.
The invention collects the information of the current mainstream threat information sharing platform, namely five types of threat events are basically covered.
Including time, ip, threat type, description, evidence (possibly a trojan sample, possibly also the Trojan's MD5 information, or a security analysis report, security log, etc. relating to this malicious domain name.)
(3) Threat information collected in a layered mode is preprocessed and stored in a local threat information library in a triple mode, and therefore the threat information in different expression modes can be conveniently and uniformly analyzed.
(4) And multi-source acquisition of enriched information, and characteristic indexes are extracted based on the influence factors of time, content and domain knowledge dimension on the reliability of threat information.
(5) And taking the multidimensional credible characteristics as the description of original threat information, identifying whether the threat information is credible or not based on a threat information credibility judgment algorithm of the DBN, and feeding back the result to a user in an interactive interface mode.
The threat intelligence credibility judgment method based on deep learning is described in detail below.
The threat intelligence content has credibility attributes of authenticity and verifiability, the two attributes complement each other, the content can obtain an authentic result through verification, and the authenticity of the result must be verified. According to sampling analysis of threat information shared by open sources, most threat information has a plurality of sources, and partial threat information of each platform covers each other, so that the authenticity of the content can be verified through the multi-source threat information, and meanwhile, the verifiability of the content is explained. A common problem in intelligence retrieval and verification is how to compare different attribute types of intelligence.
From the intelligence source, the trustworthiness of the source or carrier of the threat intelligence is largely a direct reflection of the trustworthiness of the intelligence. The credibility of an intelligence source is usually measured by authority, namely whether the intelligence is famous and influenced in the field of information security. The content of an authoritative information source or a propagation channel is high in quality and credibility, and the credibility of a user is relatively higher if threat information issued by an authoritative security institution and a known website. Currently, intelligence sources are government security agencies, security organizations or teams, vendor websites, well known sites or forums, personal blogs, other unknown sources. The authority of the different sources is represented by Au, and the scores are suggested as follows according to experts:
equation 1
In the formula, Au represents: authority of different open source intelligence sources. S represents: intelligence sources such as domestic microstep online, 360. Alien Alien vault, IBM information center, etc. r represents the time of intelligence release.
From the multi-source verification of the content, the threat intelligence content has credibility attributes of authenticity and verifiability, the two attributes supplement each other, the content can obtain an authentic result through the verification, and the authenticity of the result must be verified. According to sampling analysis of threat information shared by open sources, most threat information has a plurality of sources, and partial threat information of each platform covers each other, so that the authenticity of the content is verified through the multi-source information, and meanwhile, the verifiability of the content is explained. A common problem in intelligence retrieval and verification is how to compare different attribute types of intelligence. In order to solve the problem, the similarity among threat intelligence is adopted for comparison in the multisource verification process to measure the authenticity of the threat intelligence, and the higher the similarity is, the higher the consistency of multisource verification of the threat intelligence is, and the truthfulness degree is.
Threat to targets intelligence vtThe invention provides two threat intelligence v by collecting the multisource verification threat intelligence and considering the influence of the time, source and content description triple factor of the threat intelligencetAnd viThe similarity calculation method of (1). The formula is as follows:
S(vt,vi)=θ1×Stime+θ2×Ssorce+(1-θ1-θ2)×Sdesc
equation 2
In the formula [ theta ]1And theta2Are parameters set in machine learning.
Three factors that determine the accuracy of threat intelligence are time, source, and content description. For example, when the malicious domain name is released in 11 months in 2020, the accuracy is high. On the contrary, the publishing time is 7 months in 2010, and the malicious domain name probably does not exist due to the long time, even if the source and the content are highly credible, the time is long, and the parameter set for the information is set to be 0.6 during machine learning.
Sdesc: threat intelligence interword distance.
Stime: and distance of threat intelligence release time.
Ssorce: threat intelligence sources distance.
For the intelligence sources, the distance between 2 intelligence sources is expressed as follows, i.e. the absolute value of the authority difference between the two.
ssource(vt,vi)=|Au(vt)-Au(vi)|
Equation 3
Wherein Au (v)t) Representing threat intelligence vtAuthority of, Au (v)i) Representing threat intelligence viAuthority of.
To evaluate the distance between the time of release of 2 threat intelligence, we calculated the intelligence in the real world using equation 4If v istAnd v andithen they are at a distance of 1.0. Generally, it has a value in the range of (0, 1.0)]In the meantime.
Equation 4
StimeIndicating the distance of the time of threat intelligence release. t represents the time of day and t represents the time of day,
t(vi) Representing threat intelligence viTime of release, t (v)t) Representing threat intelligence vtThe release time of (c).
The description of the threat intelligence is abstracted to BOW word bags, and the word order can be ignored in consideration of the display form of each platform. When the similarity of threat description information of 2 threat intelligence is calculated, word vector space is adopted for comparison, and the similarity of threat description is calculated by using word distance. In the invention, we calculate the distance between words of the threat text by using a word similarity algorithm based on Wikiedia, and the distance is calculated and expressed by the similarity algorithm of formula 5:
equation 5
Wherein XtAnd XiRespectively a pair of website threat information vtAnd viVector space describing content, in particular, XtAnd XiMay be a vector space expected by the threat description training Wikipedia. When two threat information descriptions agree SdescThe distance is 1.
Based on the above-mentioned similarity calculation method between intelligence, we use
And
to represent the results of threat intelligence validation, the former representing support and the latter representing objection. According to the optimum parameter K obtained by experiment
And
to show the result of threat information verification, the former shows support, the latter shows objection, the optimal parameter K is obtained according to the experiment when S (v) is
t,v
i) If the result is more than or equal to K, the verification result is
Otherwise, the verification result is
Suppose target threat intelligence v
tThe number of all verifiable information sources which can be collected is m, wherein the number of the supported information sources after verification is m
Supporting ratios of attitude intelligence sources
And if so
Attitude A after multi-source verification
attiTo be trusted, i.e.' A
ati1, otherwise, is untrustworthy, i.e.' A
atti=0。
2. APT detection based on DNS traffic
Most APT attacks employ C & C servers as part of the attack and often employ domain name based C & C servers. Therefore, the invention provides a DNS flow-based detection model, and meanwhile, in order to improve the detection efficiency, the invention also applies a popular threat information system at present, and improves the detection capability and the detection efficiency of the detection. The overall detection flow is shown in fig. 3:
(1) DNS flow acquisition and primary processing: the network traffic bypasses of all the switches are directly copied to the local network card by configuring a mirror image port mode through the switch, a libpcap library is used for capturing program development, all DNS traffic is captured and is subjected to primary rapid processing, and required effective information is extracted.
(2) Threat intelligence-based detection: and (3) rapidly checking the initially screened effective information of the DNS message based on priori knowledge (threat information base) by using a distributed framework Spark, and calculating through Risk value to judge whether an obvious malicious message exists.
(3) Detection based on domain name features: the DNS message detected by the threat intelligence is further checked by using python or spark. The detection of the part extracts some characteristics of the domain name, and after the characteristics are normalized and converted into coordinates, the domain name is classified into legal and illegal by using classification standards discovered by an unsupervised machine learning algorithm. Fig. 5 shows a flow chart of the method of the present invention based on domain name feature detection.
(4) And threat intelligence updating: based on the STIX standard and the corresponding library, the information in the threat information library can be continuously updated and updated, and the detection rate and the accuracy of the detection system are further improved.
Domain name characterization by unsupervised clustering through analysis of a number of validated C & C server domain name intelligence and DGA generated domain names:
(1) total length of domain name: from the verified DGA and C & C domain threat intelligence, it can be seen that domain names automatically generated by algorithms tend to be longer than ordinary domain names. Considering the situation that the total length of the domain name is considered to be too long due to the combination of the following reasons, firstly, the short domain name is often registered for a long time and is often preempted by a domain name preempt organization even if no company or person uses the domain name; secondly, in order to hide itself and reduce cost, often an attacker considers adopting a second-level domain name provider, compared with directly registering at a top-level domain name provider, the domain name is integrally longer, and in order to avoid a detection algorithm, a prefix label is added before a random label for maintenance at present, so that the domain name length is further lengthened; finally, in order to avoid generating a large number of conflicting registered domain names, which results in the failure to implement the C & C function, an attacker also needs to generate a relatively long domain name, which reduces the possibility of communication failure. Therefore, the invention detects the malicious domain name by taking the total length of the domain name as a characteristic.
(2) Overall hierarchy of domain names: as described in the above feature, the malicious domain name is often different from the ordinary domain name, and the malicious domain name is often formed by a domain name composition method of "(prefix),. random tag,. suffix", where the prefix is optional and not necessarily all exist, and the suffix has one to two layers. Although the overall hierarchy of domain names is generally not very different and is generally in the range of 3-5 layers, a series of generated random domain names often have the same domain hierarchy on the item, and when the random domain names are combined with other characteristics, the random domain names can be well distinguished from irrelevant domain names, so that the feature is adopted as one of the detection characteristics.
(3) Length of random label in domain name: the label randomly generated by the malicious domain name is generally generated through a specific algorithm, has a fixed length, and is longer than the ordinary normal domain name, so that the length of the random label is often a good clustering characteristic. Because the malicious domain name can be hidden by using the prefix label before the random label, the invention selects the longest label in the non-TLD labels as the judgment method of the random label.
(4) Random label hierarchy in domain names: generally, random label levels of malicious domain names generated by the same algorithm have uniformity, and no attack example disguised aiming at the aspect is found, so that a specific attack behavior can be distinguished from other attack behaviors or normal behaviors by using the characteristic.
(5) Random labels with different number of letters: generally speaking, legal domain names can adopt various meaningful word or letter abbreviations, malicious domain names are generated by algorithms, and partial algorithms even have uniformly distributed characteristics after certain hash operation, so that the method selects different letter numbers in the domain names as detection characteristics, and can effectively assist in distinguishing whether the domain names are legal or not.
(6) The random label has different number of digits: pick 51,52 as the domain name start. Other domain names containing numbers are rare, while malicious domain names are different, and some domain names contain more numbers, are similar to the English letters mentioned above, and have the characteristic of uniform distribution after certain processing. Therefore, the characteristics of different number numbers in the label can also be used as a detection characteristic, and different clusters can be effectively distinguished in the clustering process.
In summary, the present invention selects six features, such as domain name length, domain name overall hierarchy, domain name random label length, random label hierarchy, different number of letters in the random label, and different number of digits in the random label, for clustering analysis, and in order to perform normalization processing on different data, weights are sequentially and respectively assigned to different features.
The K-Means algorithm is adopted to perform unsupervised clustering on the domain names of the training set. FIG. 4 shows the K-Means clustering method process. The K-Means clustering procedure is as follows:
1) randomly selecting K clustering centers,
2) calculating the distance from each data to each cluster center, and classifying each data to the nearest cluster
Equation 6
Wherein xiCoordinate values representing a certain sample data, ciCoordinate values representing the cluster centers.
3) The cluster centers of the respective clusters are recalculated.
Equation 7
Where n represents the number of samples taken, centeriCoordinate values representing the recalculated cluster center.
After the screening process and the threshold value calculation process, the domain name-based detection system completes the training process, the domain name to be detected only needs to carry out normalized coordinate change on the domain name, the minimum clustering center distance is calculated, whether the domain name belongs to a malicious domain name can be judged, the domain name judged to be malicious can be uploaded to a threat information base according to the situation, and automatic threat information generation is realized.
3. Information sharing
The flow of the network threat intelligence sharing mechanism based on the block chain intelligent contract is as follows: the whole network threat intelligence sharing is operated on an Etherhouse blockchain data sharing platform, on the intelligence sharing platform, the network threat intelligence data is used as valuable data assets, and after the network threat intelligence data processing module is standardized and unified, transaction sharing can be carried out on the sharing platform. The information provider uploads the produced network threat information to the data sharing platform to mark the data type and value of the information data, and the information acquirer purchases the network threat information data through the data sharing platform to acquire the network threat information data. The sharing mechanism mainly comprises the following three parts:
(1) network threat intelligence data processing in network security, the classification of threat intelligence data is associated with the type of network attack, often a network attack index represents a type of threat intelligence data. The invention combines the existing network threat information classification and representation method, standardizes the data information on the block chain sharing platform through the network threat information processing process, and improves the efficiency of sharing the network threat information.
(2) The data sharing method sets two major roles in the network threat information sharing platform: intelligence providers and intelligence acquirers.
Intelligence providers are typically existing relatively specialized security companies or security analysis organizations that have the ability to collect and receive cyber attack information and to process, analyze and evaluate the information by technical means to generate cyber threat intelligence. Intelligence acquirers are typically enterprises, organizations or individuals in need thereof who generally do not have the ability to autonomously generate cyber-threat intelligence, but who require such threat intelligence data for cyber-defense or scientific research make internal disorder or usurp. After the threat intelligence data is obtained, the value of the intelligence data can be fed back, and the quality of the threat intelligence of an intelligence provider can be evaluated.
(3) The construction of the data sharing platform is that on the basis of the design of the two modules, the construction of the data sharing platform of a block chain system is realized by using an Ethernet workshop bottom layer framework, the data type and the user type of the network threat intelligence on the data sharing platform are determined through the design and development of an intelligent contract, the triggering condition and the final triggering state of the intelligent contract are preset, the network threat intelligence is used as a digital asset, and the intelligence sharing transaction between an intelligence provider and an intelligence acquirer is realized.
In the data sharing platform, the address of the user is a unique identification, and the address of the user is authenticated after the common identification is broadcasted through the p2p network. When a new user wants to enter the data sharing platform, the new user can log in only by acquiring a public key address through an algorithm after registering. The user login requires a password, i.e. a private key set by the user himself. The user information is stored in the blockchain as account information for the node.
Fig. 6 shows the overall flow of intelligence sharing communication of the present invention.
User registration: the method is a first step of using a data sharing platform, and after the intelligent contract is successfully deployed, the user address corresponds to the user information one by one because only one account public key of the same node is provided. The fields filled in at the time of registration include information such as a user name, a user password, and a user ID. The user registration belongs to one transaction operation initiated by a user, and after the user registration is completed, the blockchain stores user identity information in the blockchain in a transaction form after the user identity information is identified in common so as to be authenticated when logging in a data platform.
User login: after the user successfully registers, the user name and the user password need to be accurately input when logging in the platform. When the login user is an unregistered user, the platform prompts the user to be the unregistered user, and the user needs to register the user firstly; when the user private key (password) is input wrongly, the platform prompts the user to input the wrong password.
The information provider uploads the network threat information data, one-time data transmission is carried out on the data sharing platform, namely, the network threat information data is recorded on the block chain, and the network threat information data is stored by utilizing the distributed book characteristic of the block chain. After the data is uploaded on the shared platform, the consensus of the system nodes can be obtained to prove the uploading operation, and the security of the uploaded data is ensured. After the network threat intelligence data is uploaded successfully, the information is fed back to an intelligence provider, and the information stored in the block chain of the intelligence data and the transaction ID of the uploaded transaction are obtained.
After logging in the platform, the intelligence provider uploads the network threat intelligence data on the data platform, which is essentially how to safely store the intelligence data in the block chain, and the uploaded data record can be inquired, and the uploaded data can be inquired and acquired only under the condition of meeting a contract.
The information data uploading process is divided into two parts:
first, intelligence providers upload intelligence operations, which are at the user level; and secondly, storing the uploaded information data, feeding back to the user, and informing the information provider of the storage information of the uploaded information data on the block chain platform.
The method comprises the steps that an information provider uploads information on a user level, the information provider firstly logs in a data sharing platform, and after logging is successful, an information data uploading button is clicked to enter an uploading data operation interface. The information provider of the data platform initial design can upload information contents mainly including information title, information type, information sequence number, information level, information selling cost and specific information contents. The information is directly filled in by the information provider on a data uploading interface, and the uploading button is clicked to upload the data after the information is input.
The storage operation of the storage uploading information data sharing platform on the uploading information data comprises three steps:
firstly, after an intelligence provider uploads intelligence data, a background can automatically call an intelligent contract, the intelligent contract is successfully deployed on a data sharing platform, the intelligence provider uploads the data operation and triggers an intelligent contract running condition, and the operation can complete the call of the intelligent contract on the data sharing platform;
secondly, the information provider uploads information data which is equivalent to an affair request sent out by the blockchain platform, the affair request is broadcasted to the whole network node through the P2P network, the whole network node identifies the affair request, and the uploaded data is a piece of data which can be inquired, purchased and acquired after being authenticated by the blockchain data sharing platform;
thirdly, the data sharing platform will generate uploading records including information registration time, information ID, information recording block, etc. after recording the information data.
While the best mode for carrying out the invention has been described in detail and illustrated in the accompanying drawings, it is to be understood that the same is by way of illustration and example only and is not to be taken by way of limitation, the scope of the invention should be determined by the appended claims and any changes or modifications which fall within the true spirit and scope of the invention should be construed as broadly described herein.