Disclosure of Invention
In order to overcome the deficiencies in the prior art, the invention provides a stolen information clue extraction and segmentation evaluation method based on a time sequence directed graph, so as to solve the technical problems.
The technical scheme of the invention is as follows:
a method for extracting and evaluating stolen information clues based on a time sequence directed graph comprises the following steps:
acquiring log information and extracting clues of the log information in the acquisition process;
performing directed series segmentation on the acquired log clues, and forming a limited directed graph by all clues in the network within a determined time range;
extracting information stealing line cable chains from the digraph;
and establishing an information stealing evaluation function to perform clue evaluation on each line cable chain.
Preferably, the step of acquiring log information in the whole intranet and performing clue extraction on the log information in the acquisition process includes:
acquiring mass log data generated by various protection and supervision equipment in an intranet;
cleaning and labeling the log data in the log data acquisition process to form paradigm clue data;
the acquired log data are divided into stages according to the process characteristics of the stealing and divulging events; the attack chain model is optimized according to the behavior habit of intranet attack, and log data are divided into: intention phase A, preparation phase B, action phase C, mask phase D.
Preferably, in the step of obtaining log data, the log data is cleaned and labeled to form a canonical clue data, and the canonical clue data at least includes information about a clue body, an associated attribute, a clue stage and a clue time.
Preferably, the step of cleaning and labeling the log data in the log data acquisition process to form the canonical clue data specifically includes:
receiving the log according to SYSLOG log transmission standard;
analyzing and normalizing the received log by configuring an analysis template;
storing the normalized data to form log thread set items in different stages with event alarm and log as threads
Where event cues A, B, C, D represent sets of log cues belonging to four different phases, respectively, and E represents all log cues in the network and user environment.
Preferably, the step of storing the normalized data forms a log thread set with the event alarm and the log as threads in different stages, and the threads in the set at least should include two associated attributes and one time attribute.
Preferably, the step of performing directed series segmentation on the obtained log threads forms a limited directed graph of all threads in the network within a certain time range, which is as follows:
and performing directed series connection on all the clue data in the selected time by taking each clue data as a vertex and the associated attributes as edges to form a directed graph consisting of the internal clue and the associated attributes.
Preferably, each thread data is used as a vertex, the associated attributes are used as edges, all thread data in the selected time are subjected to directed series connection to form a directed graph formed by the intranet threads and the associated attributes, and the occurrence time sequence of the log threads determines the direction of an arrow connecting the connecting edges of the two log threads.
And forming a directed graph in the network, wherein the data volume analyzed after the time direction of the directed graph is reduced, and a large amount of computing resources are saved. There is an absolute efficiency advantage over traditional correlation analysis when analyzing data from very large data sets. The original information stealing clues captured by filtering in the time direction are more consistent with the crime rule of the information stealing activity, and the matching degree of the original information stealing clues is improved.
Preferably, in the step of extracting the information stealing line chain from the directed graph, each path is an information stealing clue composed of log clues according to a time sequence, and a plurality of paths are combined to form an information stealing clue set { L }
iWhere, set { L }
iEach element L in
iA log line chain;
T
iis a set L
iCan be evaluated as a log-wire chain of information-stealing cues.
Preferably, the step of establishing an information stealing assessment function to perform thread assessment on each thread chain includes:
establishing a piecewise evaluation function FA (L)i),FB(Li),FC(Li),FD(Li) Respectively to information stealing clue TiPerforming evaluation of stage performance;
T
iis a set L
iA log-line chain that can be evaluated as an information-stealing cue; wherein, FA (L)
i) Chain T for representing information clues
iEvaluation of risk Performance in intentional phase A, FA (L)
i) The evaluation of the stage is scored from two aspects, the number of cable logs within the stage and the risk level of the logs, FA (L)
i)=ValCnt(L
i)+ValLel(L
i),ValCnt(L
i) Is a piecewise function, as follows:
CntA(Li) Is a thread chain LiNumber of threads in stage a.
ValLel(Li)=10*(ExLow(Li)+ExMid(Li)+ExHig(Li))+15*(ExMid(Li)+ExHig(Li))+25*ExHig(Li),
ExLow(Li)、ExMid(Li)、ExHig(Li) Respectively represent LiAnd whether a low risk level log exists, whether a medium risk level log exists and whether a high risk level log exists are determined, wherein the value is {0, 1 }.
Similarly, FB (L) is analogizedi),FC(Li),FD(Li) The risk performance evaluation values of the preparation stage B, the action stage C and the covering stage D are respectively shown.
Setting weights WA, WB, WC and WD occupied by the segmented evaluation function in the information stealing event;
wherein the weight setting of the segment evaluation is obtained according to historical data analysis, and adjustment is allowed according to data in an application scene;
establishing and evaluating function FA (L)i),FB(Li),FC(Li),FD(Li) And the information stealing comprehensive evaluation function P (L) related to the weights WA, WB, WC and WDi) Performing clue evaluation on the extracted information stealing clues, and calculating an evaluation value;
wherein the content of the first and second substances,
wherein
As an adjusting function, evaluating the coverage of the cable to the stages, and performing forward adjustment on the cable chain covered to all the stages;
CoverAll (Li) denotes LiWhether the clue in (1) covers all stages is set as {0, 1 }.
And (4) exciting the log clues conforming to the whole process risk through the adjusting function, thereby achieving the purpose of carrying out accuracy adjustment according to specific data and services.
The accuracy of the clues grabbed by the sectional evaluation of the stealing clues is greatly improved, the number of the false alarms is effectively reduced, and the proportion of the false alarms is reduced. The model is established and a segmented evaluation method is introduced, so that index quantification can be performed on information stealing analysis behaviors through logs, and quantification indexes and algorithm support are provided for subsequent big data mining and artificial intelligence analysis.
Preferably, the method further comprises:
and performing service-related operation on the information stealing clues, wherein the operation comprises sorting according to the magnitude of the evaluation value, extracting the wire chain with higher evaluation risk value of the wire chain and giving an alarm.
The invention provides a time sequence directed graph-based stolen information clue extraction and segmented evaluation method, which can carry out targeted clue mining and clue accuracy evaluation on information stealing behaviors in a sensitive intranet.
According to the technical scheme, the invention has the following advantages: the capability of discovering and sensing the information stealing and divulging secret of the internal network in advance is greatly enhanced. The method mainly includes the steps of adopting a data mining technology, mining possible stealing and divulging key information from massive log information generated by various protection and supervision devices in an intranet, and carrying out stealing and divulging risk assessment on the key information according to stealing and divulging psychology research to find out key information with large hidden danger of stealing and divulging.
The amount of data analyzed after the time direction of the directed graph is reduced, saving a large amount of computing resources. There is an absolute efficiency advantage over traditional correlation analysis when analyzing data from very large data sets. The original information stealing clues captured by filtering in the time direction are more consistent with the crime rule of the information stealing activity, and the matching degree of the original information stealing clues is improved. The accuracy of the clues grabbed by the sectional evaluation of the stealing clues is greatly improved, the number of the false alarms is effectively reduced, and the proportion of the false alarms is reduced. The model is established and a segmented evaluation method is introduced, so that index quantification can be performed on information stealing analysis behaviors through logs, and quantification indexes and algorithm support are provided for subsequent big data mining and artificial intelligence analysis.
In addition, the invention has reliable design principle, simple structure and very wide application prospect.
Therefore, compared with the prior art, the invention has prominent substantive features and remarkable progress, and the beneficial effects of the implementation are also obvious.
Detailed Description
The invention provides a method for extracting and evaluating stolen information clues based on a time sequence directed graph in a segmented manner, which is used for acquiring log information in the whole intranet, including massive log data generated by various protection and supervision equipment in the intranet, and cleaning and labeling the log information in the acquisition process to form paradigm (formatted) clue data. The canonical thread data at least includes information about the thread body, the associated attributes, the phase of the thread and the thread time. Then, each clue data is used as a vertex, the associated attributes are used as edges, all clue data in the selected time are subjected to directed series connection, and a directed graph formed by the internal clues and the associated attributes is formed. And then, traversing the directed graph to extract the information stealing line cable chain, establishing an information stealing evaluation function to evaluate each line cable chain according to the number of the cable points and the integrity of the cable stages. And extracting and alarming the wire chain with higher evaluation risk value. And targeted clue mining and clue accuracy evaluation can be performed on the information stealing behavior in the sensitive intranet. The capability of discovering and sensing the information stealing and divulging secret of the internal network in advance is greatly enhanced. The method mainly includes the steps of adopting a data mining technology, mining possible stealing and divulging key information from massive log information generated by various protection and supervision devices in an intranet, and carrying out stealing and divulging risk assessment on the key information according to stealing and divulging psychology research to find out key information with large hidden danger of stealing and divulging.
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
Example one
The embodiment provides a method for extracting and evaluating stolen information clues based on a time sequence directed graph, which comprises the following steps:
s1: acquiring log information and extracting clues of the log information in the acquisition process;
it should be noted that, the implementation process of this step is as follows:
s11: acquiring mass log data generated by various protection and supervision equipment in an intranet;
s12: cleaning and labeling the log data in the log data acquisition process to form paradigm clue data; the normalized pre-done data in this step at least includes information of four aspects of thread body, associated attribute, thread stage and thread time;
s13: the acquired log data are divided into stages according to the process characteristics of the stealing and divulging events; the attack chain model is optimized according to the behavior habit of intranet attack, and log data are divided into the following parts according to the time sequence: intention stage A, preparation stage B, action stage C, mask stage D;
the characteristics of each stage of a theft and divulgence event are different;
an intention stage: subjective consciousness, attitude change, temperament change and abnormal daily behavior;
a preparation stage: preparing activities, social activities, collecting information and trying to break through;
an action stage: essential actions, scanning for penetration, system intrusion, deploying tools;
a covering stage: good post-processing, trace erasing, tool unloading, data transfer.
It should be further explained that, the thread extraction process refers to SYSLOG transmission standard to receive logs, analyzes and formalizes the received logs by configuring an analysis template, and stores the formalized data to form the log thread set items at different stages with event alarms and logs as threads
Where event cues A, B, C, D represent a collection of log cues belonging to four different phases, respectively, the cues in this collection should contain at least two associated attributes and one time attribute, and E represents all log cues in the network and user environment. These log threads mentioned herein originate from network devices, security devices, SOC platforms, application systems, and other operation and maintenance systems in the intranet.
S2: performing directed series segmentation on the acquired log clues, and forming a limited directed graph by all clues in the network within a determined time range; furthermore, each clue data is used as a vertex, the associated attribute is used as an edge, and all clue data in the selected time are subjected to directed series connection to form a directed graph formed by the internal clues and the associated attributes. Subsequent clue extraction work and clue evaluation work are analyzed based on the directed graph;
the edges of the log threads are connected through the correlation attributes of the log thread E, and the time sequence of the occurrence of the log threads determines the direction of an arrow connecting the connecting edges of the two log threads.
S3: extracting information stealing line cable chains from the digraph; traversing all paths according to the direction of the directed graph, and inquiring and storing each path as a list, so that each path in the list is an information stealing clue formed by log clues according to a time sequence; multiple paths are combined into a set of stealing threads { L }
iWhere, set { L }
iEach element L in
iA log line chain;
T
iis a set L
iCan be evaluated as a log-wire chain of information-stealing cues.
And forming a directed graph in the network, wherein the data volume analyzed after the time direction of the directed graph is reduced, and a large amount of computing resources are saved. There is an absolute efficiency advantage over traditional correlation analysis when analyzing data from very large data sets. The original information stealing clues captured by filtering in the time direction are more consistent with the crime rule of the information stealing activity, and the matching degree of the original information stealing clues is improved.
S4: establishing an information stealing evaluation function to carry out clue evaluation on each line cable chain;
it should be noted that, the implementation of this step is as follows:
s41: establishing a piecewise evaluation function FA (L)
i),FB(L
i),FC(L
i),FD(L
i) Respectively to information stealing clue T
iPerforming evaluation of stage performance;
T
iis a set L
iA log-line chain that can be evaluated as an information-stealing cue; wherein, FA (L)
i) Chain T for representing information clues
iEvaluation of risk Performance in intentional phase A, FA (L)
i) The evaluation of the stage is scored from two aspects, the number of cable logs within the stage and the risk level of the logs, FA (L)
i)=ValCnt(L
i)+ValLel(L
i),ValCnt(L
i) Is a piecewise function, as follows:
CntA(Li) Is a thread chain LiNumber of threads in stage a.
ValLel(Li)=10*(ExLow(Li)+ExMid(Li)+ExHig(Li))+15*(ExMid(Li)+ExHig(Li))+25*ExHig(Li),
ExLow(Li)、ExMid(Li)、ExHig(Li) Respectively represent LiAnd whether a low risk level log exists, whether a medium risk level log exists and whether a high risk level log exists are determined, wherein the value is {0, 1 }.
Similarly, FB (L) is analogizedi),FC(Li),FD(Li) The risk performance evaluation values of the preparation stage B, the action stage C and the covering stage D are respectively shown.
S42: setting weights WA, WB, WC and WD occupied by the segmented evaluation function in the information stealing event; wherein the weight setting of the segment evaluation is obtained according to historical data analysis, and adjustment is allowed according to data in an application scene;
s43: establishing and evaluating function FA (L)i),FB(Li),FC(Li),FD(Li) And the information stealing comprehensive evaluation function P (L) related to the weights WA, WB, WC and WDi) The extracted information stealing clues are subjected to clue evaluation and evaluation values are calculated, wherein,
wherein
As an adjusting function, evaluating the coverage of the cable to the stages, and performing forward adjustment on the cable chain covered to all the stages;
CoverAll(Li) Represents LiWhether the clue in (1) covers all stages is set as {0, 1 }.
And (4) exciting the log clues conforming to the whole process risk through the adjusting function, thereby achieving the purpose of carrying out accuracy adjustment according to specific data and services.
The method further comprises the following steps:
s5: performing service-related operation on the information stealing clues, wherein the operation comprises sorting according to the magnitude of the evaluation value, extracting and alarming the cable chain with higher evaluation risk value;
the accuracy of the clues grabbed by the sectional evaluation of the stealing clues is greatly improved, the number of the false alarms is effectively reduced, and the proportion of the false alarms is reduced. The model is established and a segmented evaluation method is introduced, so that index quantification can be performed on information stealing analysis behaviors through logs, and quantification indexes and algorithm support are provided for subsequent big data mining and artificial intelligence analysis.
Example two
As shown in fig. 1, the present embodiment provides a stolen information clue extraction and segmentation evaluation method based on a time-series directed graph, which includes the following steps:
s1: acquiring log information and extracting clues of the log information in the acquisition process;
it should be noted that, the implementation process of this step is as follows:
s11: acquiring mass log data generated by various protection and supervision equipment in an intranet;
s12: cleaning and labeling the log data in the log data acquisition process to form paradigm clue data; the normalized pre-done data in this step at least includes information of four aspects of thread body, associated attribute, thread stage and thread time;
s13: the acquired log data are divided into stages according to the process characteristics of the stealing and divulging events; the attack chain model is optimized according to the behavior habit of intranet attack, and log data are divided into the following parts according to the time sequence: intention stage A, preparation stage B, action stage C, mask stage D;
the characteristics of each stage of a theft and divulgence event are different;
an intention stage: subjective consciousness, attitude change, temperament change and abnormal daily behavior;
a preparation stage: preparing activities, social activities, collecting information and trying to break through;
an action stage: essential actions, scanning for penetration, system intrusion, deploying tools;
a covering stage: good post-processing, trace erasing, tool unloading, data transfer.
It should be further explained that, the thread extraction process refers to SYSLOG transmission standard to receive logs, analyzes and formalizes the received logs by configuring an analysis template, and stores the formalized data to form the log thread set items at different stages with event alarms and logs as threads
Where event cues A, B, C, D represent a collection of log cues belonging to four different phases, respectively, the cues in this collection should contain at least two associated attributes and one time attribute, and E represents all log cues in the network and user environment. These log threads mentioned herein originate from network devices, security devices, SOC platforms, application systems, and other operation and maintenance systems in the intranet.
S2: performing directed series segmentation on the acquired log clues, and forming a limited directed graph by all clues in the network within a determined time range; furthermore, each clue data is used as a vertex, the associated attribute is used as an edge, and all clue data in the selected time are subjected to directed series connection to form a directed graph formed by the internal clues and the associated attributes. Subsequent clue extraction work and clue evaluation work are analyzed based on the directed graph;
the edges of the log threads are connected through the correlation attributes of the log thread E, and the time sequence of the occurrence of the log threads determines the direction of an arrow connecting the connecting edges of the two log threads.
As shown in FIG. 2, Ai,Bi,Ci,DiRespectively representing clue data in four stages of A, B, C and D, and A1By associating an attribute with A2,B1The method comprises the following steps that (1) an association relation exists, and the arrow direction represents the time sequence of time attributes in clue data; a. the3By associating an attribute with A4,B1There is an associative relationship, and the arrow direction represents the temporal order of the temporal attributes in the cue data. And by analogy, a directed relationship graph shown in the upper graph is formed.
S3: extracting information stealing line cable chains from the digraph; traversing all paths according to the direction of the directed graph, and inquiring and storing each path as a list, so that each path in the list is an information stealing clue formed by log clues according to a time sequence; multiple paths are combined into a set of stealing threads { L }
iWhere, set { L }
iEach element L in
iA log line chain;
T
iis a set L
iCan be evaluated as a log-wire chain of information-stealing cues.
And forming a directed graph in the network, wherein the data volume analyzed after the time direction of the directed graph is reduced, and a large amount of computing resources are saved. There is an absolute efficiency advantage over traditional correlation analysis when analyzing data from very large data sets. The original information stealing clues captured by filtering in the time direction are more consistent with the crime rule of the information stealing activity, and the matching degree of the original information stealing clues is improved.
S4: establishing an information stealing evaluation function to carry out clue evaluation on each line cable chain;
it should be noted that, the implementation of this step is as follows:
s41: establishing a piecewise evaluation function FA (L)
i),FB(L
i),FC(L
i),FD(L
i) Respectively to information stealing clue T
iPerforming evaluation of stage performance;
T
iis a set L
iA log-line chain that can be evaluated as an information-stealing cue; wherein, FA (L)
i) Chain T for representing information clues
iEvaluation of risk Performance in intentional phase A, FA (L)
i) The evaluation of the stage is scored from two aspects, the number of cable logs within the stage and the risk level of the logs, FA (L)
i)=ValCnt(L
i)+ValLel(L
i),ValCnt(L
i) Is a piecewise function, as follows:
CntA(Li) Is a thread chain LiNumber of threads in stage a.
ValLel(Li)=10*(ExLow(Li)+ExMid(Li)+ExHig(Li))+15*(ExMid(Li)+ExHig(Li))+25*ExHig(Li),
ExLow(Li)、ExMid(Li)、ExHig(Li) Respectively represent LiAnd whether a low risk level log exists, whether a medium risk level log exists and whether a high risk level log exists are determined, wherein the value is {0, 1 }.
Similarly, FB (L) is analogizedi),FC(Li),FD(Li) The risk performance evaluation values of the preparation stage B, the action stage C and the covering stage D are respectively shown.
The evaluation value of each stage needs to consider the number of clues found in the stage, the hazard level of the clues found in the stage, and the evaluation value range is [ o,100 ];
s42: setting weights WA, WB, WC and WD occupied by the segmented evaluation function in the information stealing event; wherein the weight setting of the segment evaluation is obtained according to historical data analysis, and adjustment is allowed according to data in an application scene;
in this embodiment, it can be known through analysis of historical data and cases in an intranet that influence weights of threads appearing in four stages on information theft case occurrence risks are 1:3:4:12, and then WA is set to be 5%, WB is set to be 15%, WC is 20%, WD is set to be 60%, and in an actual implementation process, the weights of the parts are allowed to be adjusted according to data in an application scene;
s43: establishing and evaluating function FA (L)i),FB(Li),FC(Li),FD(Li) And the information stealing comprehensive evaluation function P (L) related to the weights WA, WB, WC and WDi) The extracted information stealing clues are subjected to clue evaluation and evaluation values are calculated, wherein,
wherein
As an adjusting function, evaluating the coverage of the cable to the stages, and performing forward adjustment on the cable chain covered to all the stages;
CoverAll(Li) Represents LiWhether the clue in (1) covers all stages is set as {0, 1 }.
And (4) exciting the log clues conforming to the whole process risk through the adjusting function, thereby achieving the purpose of carrying out accuracy adjustment according to specific data and services.
The method further comprises the following steps:
s5: performing service-related operation on the information stealing clues, wherein the operation comprises sorting according to the magnitude of the evaluation value, extracting and alarming the cable chain with higher evaluation risk value;
for example, if the risk assessment value is higher than 60, it indicates that at least one behavior masking D clue is found, and the occurrence of the clue at this stage indicates that there is a behavior of deleting the audit log or operation trace, the corresponding event and the related personnel should be further investigated, so that the line with a high assessment value generates an alarm to remind the service personnel to further check and handle the event.
The accuracy of the clues grabbed by the sectional evaluation of the stealing clues is greatly improved, the number of the false alarms is effectively reduced, and the proportion of the false alarms is reduced. The model is established and a segmented evaluation method is introduced, so that index quantification can be performed on information stealing analysis behaviors through logs, and quantification indexes and algorithm support are provided for subsequent big data mining and artificial intelligence analysis.
The terms "first," "second," "third," "fourth," and the like in the description and in the claims, as well as in the drawings, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprising" and "having," as well as any variations thereof, are intended to cover non-exclusive inclusions.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.