CN109284317B

CN109284317B - Time sequence directed graph-based stolen information clue extraction and segmented evaluation method

Info

Publication number: CN109284317B
Application number: CN201811259183.3A
Authority: CN
Inventors: 李兴国; 苗功勋; 郑传义; 王蒙; 崔新安; 张庆亮
Original assignee: BEIJING ZHONGFU TAIHE TECHNOLOGY DEVELOPMENT CO LTD; Zhongfu Information Co Ltd; Zhongfu Safety Technology Co Ltd
Current assignee: Zhongfu Safety Technology Co Ltd
Priority date: 2018-10-26
Filing date: 2018-10-26
Publication date: 2021-07-06
Anticipated expiration: 2038-10-26
Also published as: CN109284317A

Abstract

The invention provides a time sequence directed graph-based stolen information clue extraction and segmentation evaluation method, which comprises the following steps: the method comprises the steps of obtaining log information in the whole intranet, wherein the log information comprises massive log data generated by various protection and supervision devices in the intranet, cleaning and labeling the log information in the obtaining process, and forming paradigm (formatting) clue data. The canonical thread data at least includes information about the thread body, the associated attributes, the phase of the thread and the thread time. Then, each clue data is used as a vertex, the associated attributes are used as edges, all clue data in the selected time are subjected to directed series connection, and a directed graph formed by the internal clues and the associated attributes is formed. And then, traversing the directed graph to extract the information stealing line cable chain, establishing an information stealing evaluation function to evaluate each line cable chain according to the number of the cable points and the integrity of the cable stages. And extracting and alarming the wire chain with higher evaluation risk value.

Description

Time sequence directed graph-based stolen information clue extraction and segmented evaluation method

Technical Field

The invention relates to the technical field of internet information security, in particular to a method for extracting stolen information clues and evaluating the stolen information clues in a segmented mode based on a time sequence directed graph.

Background

With the continuous improvement of the domestic information construction level, each organ and organization gradually establishes an internal office network or an industry private network, and office networks constructed by a plurality of organs and organizations cannot be communicated with the internet for various reasons. The networks isolated from the internet are often used for transmitting and processing some sensitive information, and become sensitive networks. Information security protection in such networks is a crucial issue.

At present, the information security protection solution for the intranet is single, and is basically a scheme designed based on the traditional internet information security protection means. The scheme mainly aims at the protection requirements of traditional internet equipment such as viruses, malicious software, system bugs and the like in the intranet, and carries out security reinforcement and protection on the equipment and the system in the intranet. And the targeted reinforcement and protection for information stealing and disclosure in a sensitive inner network are not realized. These traditional supervision means can generate a large amount of management logs, operation logs and alarm information in the operation process, and these information are often of large data volume, and have the problem that accurate clues cannot be provided in the aspect of judging information stealing and divulging.

Disclosure of Invention

In order to overcome the deficiencies in the prior art, the invention provides a stolen information clue extraction and segmentation evaluation method based on a time sequence directed graph, so as to solve the technical problems.

The technical scheme of the invention is as follows:

a method for extracting and evaluating stolen information clues based on a time sequence directed graph comprises the following steps:

acquiring log information and extracting clues of the log information in the acquisition process;

performing directed series segmentation on the acquired log clues, and forming a limited directed graph by all clues in the network within a determined time range;

extracting information stealing line cable chains from the digraph;

and establishing an information stealing evaluation function to perform clue evaluation on each line cable chain.

Preferably, the step of acquiring log information in the whole intranet and performing clue extraction on the log information in the acquisition process includes:

acquiring mass log data generated by various protection and supervision equipment in an intranet;

cleaning and labeling the log data in the log data acquisition process to form paradigm clue data;

the acquired log data are divided into stages according to the process characteristics of the stealing and divulging events; the attack chain model is optimized according to the behavior habit of intranet attack, and log data are divided into: intention phase A, preparation phase B, action phase C, mask phase D.

Preferably, in the step of obtaining log data, the log data is cleaned and labeled to form a canonical clue data, and the canonical clue data at least includes information about a clue body, an associated attribute, a clue stage and a clue time.

Preferably, the step of cleaning and labeling the log data in the log data acquisition process to form the canonical clue data specifically includes:

receiving the log according to SYSLOG log transmission standard;

analyzing and normalizing the received log by configuring an analysis template;

storing the normalized data to form log thread set items in different stages with event alarm and log as threads

Where event cues A, B, C, D represent sets of log cues belonging to four different phases, respectively, and E represents all log cues in the network and user environment.

Preferably, the step of storing the normalized data forms a log thread set with the event alarm and the log as threads in different stages, and the threads in the set at least should include two associated attributes and one time attribute.

Preferably, the step of performing directed series segmentation on the obtained log threads forms a limited directed graph of all threads in the network within a certain time range, which is as follows:

and performing directed series connection on all the clue data in the selected time by taking each clue data as a vertex and the associated attributes as edges to form a directed graph consisting of the internal clue and the associated attributes.

Preferably, each thread data is used as a vertex, the associated attributes are used as edges, all thread data in the selected time are subjected to directed series connection to form a directed graph formed by the intranet threads and the associated attributes, and the occurrence time sequence of the log threads determines the direction of an arrow connecting the connecting edges of the two log threads.

And forming a directed graph in the network, wherein the data volume analyzed after the time direction of the directed graph is reduced, and a large amount of computing resources are saved. There is an absolute efficiency advantage over traditional correlation analysis when analyzing data from very large data sets. The original information stealing clues captured by filtering in the time direction are more consistent with the crime rule of the information stealing activity, and the matching degree of the original information stealing clues is improved.

Preferably, in the step of extracting the information stealing line chain from the directed graph, each path is an information stealing clue composed of log clues according to a time sequence, and a plurality of paths are combined to form an information stealing clue set { L }_iWhere, set { L }_iEach element L in_iA log line chain;

T_iis a set L_iCan be evaluated as a log-wire chain of information-stealing cues.

Preferably, the step of establishing an information stealing assessment function to perform thread assessment on each thread chain includes:

establishing a piecewise evaluation function FA (L)_i),FB(L_i),FC(L_i),FD(L_i) Respectively to information stealing clue T_iPerforming evaluation of stage performance;

T_iis a set L_iA log-line chain that can be evaluated as an information-stealing cue; wherein, FA (L)_i) Chain T for representing information clues_iEvaluation of risk Performance in intentional phase A, FA (L)_i) The evaluation of the stage is scored from two aspects, the number of cable logs within the stage and the risk level of the logs, FA (L)_i)＝ValCnt(L_i)+ValLel(L_i)，ValCnt(L_i) Is a piecewise function, as follows:

CntA(L_i) Is a thread chain L_iNumber of threads in stage a.

ValLel(L_i)＝10*(ExLow(L_i)+ExMid(L_i)+ExHig(L_i))+15*(ExMid(L_i)+ExHig(L_i))+25*ExHig(L_i)，

ExLow(L_i)、ExMid(L_i)、ExHig(L_i) Respectively represent L_iAnd whether a low risk level log exists, whether a medium risk level log exists and whether a high risk level log exists are determined, wherein the value is {0, 1 }.

Similarly, FB (L) is analogized_i),FC(L_i),FD(L_i) The risk performance evaluation values of the preparation stage B, the action stage C and the covering stage D are respectively shown.

Setting weights WA, WB, WC and WD occupied by the segmented evaluation function in the information stealing event;

wherein the weight setting of the segment evaluation is obtained according to historical data analysis, and adjustment is allowed according to data in an application scene;

establishing and evaluating function FA (L)_i),FB(L_i),FC(L_i),FD(L_i) And the information stealing comprehensive evaluation function P (L) related to the weights WA, WB, WC and WD_i) Performing clue evaluation on the extracted information stealing clues, and calculating an evaluation value;

wherein the content of the first and second substances,

wherein

As an adjusting function, evaluating the coverage of the cable to the stages, and performing forward adjustment on the cable chain covered to all the stages;

CoverAll (Li) denotes L_iWhether the clue in (1) covers all stages is set as {0, 1 }.

And (4) exciting the log clues conforming to the whole process risk through the adjusting function, thereby achieving the purpose of carrying out accuracy adjustment according to specific data and services.

The accuracy of the clues grabbed by the sectional evaluation of the stealing clues is greatly improved, the number of the false alarms is effectively reduced, and the proportion of the false alarms is reduced. The model is established and a segmented evaluation method is introduced, so that index quantification can be performed on information stealing analysis behaviors through logs, and quantification indexes and algorithm support are provided for subsequent big data mining and artificial intelligence analysis.

Preferably, the method further comprises:

and performing service-related operation on the information stealing clues, wherein the operation comprises sorting according to the magnitude of the evaluation value, extracting the wire chain with higher evaluation risk value of the wire chain and giving an alarm.

The invention provides a time sequence directed graph-based stolen information clue extraction and segmented evaluation method, which can carry out targeted clue mining and clue accuracy evaluation on information stealing behaviors in a sensitive intranet.

According to the technical scheme, the invention has the following advantages: the capability of discovering and sensing the information stealing and divulging secret of the internal network in advance is greatly enhanced. The method mainly includes the steps of adopting a data mining technology, mining possible stealing and divulging key information from massive log information generated by various protection and supervision devices in an intranet, and carrying out stealing and divulging risk assessment on the key information according to stealing and divulging psychology research to find out key information with large hidden danger of stealing and divulging.

The amount of data analyzed after the time direction of the directed graph is reduced, saving a large amount of computing resources. There is an absolute efficiency advantage over traditional correlation analysis when analyzing data from very large data sets. The original information stealing clues captured by filtering in the time direction are more consistent with the crime rule of the information stealing activity, and the matching degree of the original information stealing clues is improved. The accuracy of the clues grabbed by the sectional evaluation of the stealing clues is greatly improved, the number of the false alarms is effectively reduced, and the proportion of the false alarms is reduced. The model is established and a segmented evaluation method is introduced, so that index quantification can be performed on information stealing analysis behaviors through logs, and quantification indexes and algorithm support are provided for subsequent big data mining and artificial intelligence analysis.

In addition, the invention has reliable design principle, simple structure and very wide application prospect.

Therefore, compared with the prior art, the invention has prominent substantive features and remarkable progress, and the beneficial effects of the implementation are also obvious.

Drawings

FIG. 1 is a flow chart of a method for extracting and evaluating stolen information clues in sections based on a time-series directed graph;

fig. 2 is a diagram of log thread directed concatenation segmentation.

Detailed Description

The invention provides a method for extracting and evaluating stolen information clues based on a time sequence directed graph in a segmented manner, which is used for acquiring log information in the whole intranet, including massive log data generated by various protection and supervision equipment in the intranet, and cleaning and labeling the log information in the acquisition process to form paradigm (formatted) clue data. The canonical thread data at least includes information about the thread body, the associated attributes, the phase of the thread and the thread time. Then, each clue data is used as a vertex, the associated attributes are used as edges, all clue data in the selected time are subjected to directed series connection, and a directed graph formed by the internal clues and the associated attributes is formed. And then, traversing the directed graph to extract the information stealing line cable chain, establishing an information stealing evaluation function to evaluate each line cable chain according to the number of the cable points and the integrity of the cable stages. And extracting and alarming the wire chain with higher evaluation risk value. And targeted clue mining and clue accuracy evaluation can be performed on the information stealing behavior in the sensitive intranet. The capability of discovering and sensing the information stealing and divulging secret of the internal network in advance is greatly enhanced. The method mainly includes the steps of adopting a data mining technology, mining possible stealing and divulging key information from massive log information generated by various protection and supervision devices in an intranet, and carrying out stealing and divulging risk assessment on the key information according to stealing and divulging psychology research to find out key information with large hidden danger of stealing and divulging.

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Example one

The embodiment provides a method for extracting and evaluating stolen information clues based on a time sequence directed graph, which comprises the following steps:

s1: acquiring log information and extracting clues of the log information in the acquisition process;

it should be noted that, the implementation process of this step is as follows:

s11: acquiring mass log data generated by various protection and supervision equipment in an intranet;

s12: cleaning and labeling the log data in the log data acquisition process to form paradigm clue data; the normalized pre-done data in this step at least includes information of four aspects of thread body, associated attribute, thread stage and thread time;

s13: the acquired log data are divided into stages according to the process characteristics of the stealing and divulging events; the attack chain model is optimized according to the behavior habit of intranet attack, and log data are divided into the following parts according to the time sequence: intention stage A, preparation stage B, action stage C, mask stage D;

the characteristics of each stage of a theft and divulgence event are different;

an intention stage: subjective consciousness, attitude change, temperament change and abnormal daily behavior;

a preparation stage: preparing activities, social activities, collecting information and trying to break through;

an action stage: essential actions, scanning for penetration, system intrusion, deploying tools;

a covering stage: good post-processing, trace erasing, tool unloading, data transfer.

It should be further explained that, the thread extraction process refers to SYSLOG transmission standard to receive logs, analyzes and formalizes the received logs by configuring an analysis template, and stores the formalized data to form the log thread set items at different stages with event alarms and logs as threads

Where event cues A, B, C, D represent a collection of log cues belonging to four different phases, respectively, the cues in this collection should contain at least two associated attributes and one time attribute, and E represents all log cues in the network and user environment. These log threads mentioned herein originate from network devices, security devices, SOC platforms, application systems, and other operation and maintenance systems in the intranet.

S2: performing directed series segmentation on the acquired log clues, and forming a limited directed graph by all clues in the network within a determined time range; furthermore, each clue data is used as a vertex, the associated attribute is used as an edge, and all clue data in the selected time are subjected to directed series connection to form a directed graph formed by the internal clues and the associated attributes. Subsequent clue extraction work and clue evaluation work are analyzed based on the directed graph;

the edges of the log threads are connected through the correlation attributes of the log thread E, and the time sequence of the occurrence of the log threads determines the direction of an arrow connecting the connecting edges of the two log threads.

S3: extracting information stealing line cable chains from the digraph; traversing all paths according to the direction of the directed graph, and inquiring and storing each path as a list, so that each path in the list is an information stealing clue formed by log clues according to a time sequence; multiple paths are combined into a set of stealing threads { L }_iWhere, set { L }_iEach element L in_iA log line chain;

S4: establishing an information stealing evaluation function to carry out clue evaluation on each line cable chain;

it should be noted that, the implementation of this step is as follows:

s41: establishing a piecewise evaluation function FA (L)_i),FB(L_i),FC(L_i),FD(L_i) Respectively to information stealing clue T_iPerforming evaluation of stage performance;

CntA(L_i) Is a thread chain L_iNumber of threads in stage a.

S42: setting weights WA, WB, WC and WD occupied by the segmented evaluation function in the information stealing event; wherein the weight setting of the segment evaluation is obtained according to historical data analysis, and adjustment is allowed according to data in an application scene;

s43: establishing and evaluating function FA (L)_i),FB(L_i),FC(L_i),FD(L_i) And the information stealing comprehensive evaluation function P (L) related to the weights WA, WB, WC and WD_i) The extracted information stealing clues are subjected to clue evaluation and evaluation values are calculated, wherein,

wherein

CoverAll(L_i) Represents L_iWhether the clue in (1) covers all stages is set as {0, 1 }.

The method further comprises the following steps:

s5: performing service-related operation on the information stealing clues, wherein the operation comprises sorting according to the magnitude of the evaluation value, extracting and alarming the cable chain with higher evaluation risk value;

Example two

As shown in fig. 1, the present embodiment provides a stolen information clue extraction and segmentation evaluation method based on a time-series directed graph, which includes the following steps:

it should be noted that, the implementation process of this step is as follows:

As shown in FIG. 2, A_i,B_i,C_i,D_iRespectively representing clue data in four stages of A, B, C and D, and A₁By associating an attribute with A₂,B₁The method comprises the following steps that (1) an association relation exists, and the arrow direction represents the time sequence of time attributes in clue data; a. the₃By associating an attribute with A₄,B₁There is an associative relationship, and the arrow direction represents the temporal order of the temporal attributes in the cue data. And by analogy, a directed relationship graph shown in the upper graph is formed.

it should be noted that, the implementation of this step is as follows:

CntA(L_i) Is a thread chain L_iNumber of threads in stage a.

The evaluation value of each stage needs to consider the number of clues found in the stage, the hazard level of the clues found in the stage, and the evaluation value range is [ o,100 ];

in this embodiment, it can be known through analysis of historical data and cases in an intranet that influence weights of threads appearing in four stages on information theft case occurrence risks are 1:3:4:12, and then WA is set to be 5%, WB is set to be 15%, WC is 20%, WD is set to be 60%, and in an actual implementation process, the weights of the parts are allowed to be adjusted according to data in an application scene;

wherein

The method further comprises the following steps:

for example, if the risk assessment value is higher than 60, it indicates that at least one behavior masking D clue is found, and the occurrence of the clue at this stage indicates that there is a behavior of deleting the audit log or operation trace, the corresponding event and the related personnel should be further investigated, so that the line with a high assessment value generates an alarm to remind the service personnel to further check and handle the event.

The terms "first," "second," "third," "fourth," and the like in the description and in the claims, as well as in the drawings, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprising" and "having," as well as any variations thereof, are intended to cover non-exclusive inclusions.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A time-series directed graph-based stolen information clue extraction and segmentation evaluation method is characterized by comprising the following steps:

extracting information stealing line cable chains from the digraph;

establishing an information stealing evaluation function to carry out clue evaluation on each line cable chain;

the steps of acquiring the log information and performing clue extraction on the log information in the acquisition process specifically include: acquiring mass log data generated by various protection and supervision equipment in an intranet; cleaning and labeling the log data in the log data acquisition process to form paradigm clue data; the acquired log data are divided into stages according to the process characteristics of the stealing and divulging events; the attack chain model is optimized according to the behavior habit of intranet attack, and log data are divided into: intention stage, preparation stage, action stage, mask stage;

cleaning and labeling log data in the log data acquisition process to form paradigm clue dataIn the step, the canonicalized cue data at least includes four information of cue main body, associated attribute, cue stage and cue time; the method comprises the following specific steps: receiving the log according to SYSLOG log transmission standard; analyzing and normalizing the received log by configuring an analysis template; storing the normalized data to form log thread set items in different stages with event alarm and log as threads

Wherein, the event cues A, B, C, D represent the collection of log cues belonging to four different phases respectively, and E represents all log cues in the network and user environment;

the step of extracting the information stealing line cable chain from the directed graph specifically comprises the following steps: each path is an information stealing cue composed of log cues according to the time sequence, and a plurality of paths are combined into an information stealing cue set { L }_iWhere, set { L }_iEach element L in_iA log line chain;

the step of establishing an information stealing evaluation function to perform clue evaluation on each line cable chain specifically comprises the following steps:

establishing and evaluating function FA (L)_i),FB(L_i),FC(L_i),FD(L_i) And the information stealing comprehensive evaluation function P (L) related to the weights WA, WB, WC and WD_i) And performing clue evaluation on the extracted information stealing clues, and calculating an evaluation value.

2. The method as claimed in claim 1, wherein the step of storing the normalized data forms a log thread set with event alarm and log as threads, and the threads in the set at least comprise two correlation attributes and one time attribute.

3. The method as claimed in claim 2, wherein the step of performing directional concatenation segmentation on the obtained log clues forms a limited digraph for all clues in the network within a certain time range, and the specific steps are as follows:

4. The method according to claim 3, wherein the step of using each thread data as a vertex and associated attributes as edges and performing directed concatenation on all thread data in a selected time forms a directed graph composed of intranet threads and associated attributes, and the chronological order of the occurrence of the log threads determines the direction of an arrow connecting the connecting edges of the two log threads.

5. The stolen information clue extraction and segmentation evaluation method based on the time-series directed graph as claimed in claim 4, wherein the step of establishing an information stealing evaluation function to perform clue evaluation on each line chain comprises:

establishing a piecewise evaluation function FA (L)_i),FB(L_i),FC(L_i),FD(L_i) Respectively to information stealing clue T_iPerforming evaluation of stage performance; wherein the content of the first and second substances,

FA(L_i)＝ValCnt(L_i)+ValLel(L_i)，ValCnt(L_i) Is a function of the segment to be determined,

CntA(L_i) Is a thread chain L_iNumber of threads in stage a;

ExLow(L_i)、ExMid(L_i)、ExHig(L_i) Respectively represent L_iWhether a low risk level log exists, whether a medium risk level log exists and whether a high risk level log exists are judged, and the value is {0, 1 };

setting weights WA, WB, WC and WD occupied by the segmented evaluation function in the information stealing event; wherein the weight setting of the segment evaluation is obtained according to historical data analysis;

establishing and evaluating function FA (L)_i),FB(L_i),FC(L_i),FD(L_i) And the information stealing comprehensive evaluation function P (L) related to the weights WA, WB, WC and WD_i) The extracted information stealing clues are subjected to clue evaluation and evaluation values are calculated, wherein,

6. The stolen information cue extraction and segmentation evaluation method based on the time series directed graph as claimed in claim 1, wherein the method further comprises:

and performing business operation on the information stealing clues, wherein the business operation comprises sorting according to the magnitude of the evaluation value, extracting the cable chain with higher evaluation risk value and giving an alarm.