CN114297046A - Event obtaining method, device, equipment and medium based on log - Google Patents

Event obtaining method, device, equipment and medium based on log Download PDF

Info

Publication number
CN114297046A
CN114297046A CN202111658789.6A CN202111658789A CN114297046A CN 114297046 A CN114297046 A CN 114297046A CN 202111658789 A CN202111658789 A CN 202111658789A CN 114297046 A CN114297046 A CN 114297046A
Authority
CN
China
Prior art keywords
rule
node
text
log
tree
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111658789.6A
Other languages
Chinese (zh)
Inventor
谢泳
邓博仁
汪来富
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Telecom Corp Ltd
Original Assignee
China Telecom Corp Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Telecom Corp Ltd filed Critical China Telecom Corp Ltd
Priority to CN202111658789.6A priority Critical patent/CN114297046A/en
Publication of CN114297046A publication Critical patent/CN114297046A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the application provides a log-based event acquisition method, a log-based event acquisition device, log-based event acquisition equipment and a log-based event acquisition medium, wherein a rule tree is constructed by acquiring a rule tree file and establishing a rule tree based on the rule tree file; each node of the rule tree is constructed corresponding to one rule; the node relation among all nodes corresponds to the dependency relation among the rules; inputting the log data into a root node of the rule tree and matching corresponding rules at each node along a matching path; and acquiring an event according to the matching result of the nodes. By establishing rule trees related to the rules, log data can pass through each node along a matching path, and the node relation corresponds to the dependency relation among the rules, so that the efficiency is effectively improved compared with the prior mode of traversing a rule set. In addition, when the rule tree is constructed, the rule tree structure can be optimized by forming multiple nodes through node balance and regular text splitting, and the matching efficiency is improved; node types can also be set according to rule requirements to call corresponding methods to meet the rule requirements when matching.

Description

Event obtaining method, device, equipment and medium based on log
Technical Field
The present application relates to the field of network security technologies, and in particular, to a method, an apparatus, a device, and a medium for event acquisition based on a log.
Background
Security event identification is the basis for security domain prediction and alerting. The security researcher extracts the rules into regular expressions for matching the security logs, identifying and outputting alarm events. The current method for matching the security logs generally comprises the following steps: for a log to traverse all rules, respectively matching to obtain results, and outputting event information of corresponding rules, the method cannot efficiently solve the problem of rule matching of the security log, and is embodied in that:
1. the whole set of rule sets are traversed for matching each time, and the time complexity is high;
2. if the rule has a dependency relationship, ordered matching is needed, and the result of the subsequent rule depends on the result of the predecessor rule and the self-matching result. This dependency cannot be handled by traversal bar-by-bar;
3. the current method cannot meet the requirement for the rule requiring a plurality of logs to match the result and determine the event.
Inventing messages
In view of the above-mentioned shortcomings of the prior art, the present application aims to provide a log-based event acquisition method, apparatus, device and medium, which solve the problems of the prior art by constructing a rule tree of path optimization according to each rule for matching.
A first aspect of the present application provides a log-based event obtaining method, including: acquiring a rule tree file, and constructing a rule tree based on the rule tree file; wherein each node of the rule tree is constructed corresponding to a rule; the node relation among all the nodes corresponds to the dependency relation among the rules; inputting log data into a root node of the rule tree and matching corresponding rules at each node along a matching path; and acquiring an event according to the matching result of the node.
In some embodiments, the method for generating a rule tree file includes: traversing each rule, analyzing the content of the rule to obtain rule information, and determining a node relation based on the dependency relation among the rules; the rule information includes: rule attributes and methods, the rule attributes including regularized text for matching; constructing a first rule tree according to the rule information and the dependency relationship of each rule; wherein, each node correspondingly stores the rule information of a rule; equalizing the dependency relationship of each node in the first rule tree to obtain a second rule tree; and forming a rule tree file according to the second rule tree.
In some embodiments, before determining the corresponding node relationship in the rule tree based on the dependency relationship between the rules, further comprising rule preprocessing; the rule preprocessing comprises at least one of: 1) in response to a downstream rule relying on a first number of upstream rules, creating a downstream rule having the same content as the downstream rule such that the number of downstream rules reaches the first number; 2) splitting a regular text contained in a rule into a plurality of sub regular texts, and forming a rule according to each sub regular text; 3) splitting a regular text contained in a downstream rule into a plurality of sub regular texts, and forming a rule according to each sub regular text to form a rule set corresponding to the downstream rule; in response to the downstream rule relying on a first number of upstream rules, a rule set is created that has the same content as the rule set such that the rule set reaches the first number.
In some embodiments, the equalizing the dependency of the nodes in the first rule tree includes: generating a corresponding text characteristic matrix based on the regularization text of each node; determining a preset number of target feature dimensions with the largest information amount based on the text feature matrix; determining each text segment corresponding to the regularized text according to the preset number of target feature dimensions; constructing an intermediate layer between the current layer and the previous layer according to each text segment; wherein the intermediate layer comprises: presetting a number of first nodes, wherein each first node is constructed corresponding to one text fragment; and a second node; determining a dependency relationship between each current node and the first node and between each current node and the second node respectively based on the regularization text of each current node in the current layer and the matching relationship between each text segment; and the current node which is matched with the first node forms a dependency relationship with the second node which is not matched with the first node.
In some embodiments, the text passage is a vocabulary; the generating of the corresponding text feature matrix based on the regularized text of each node includes: and mapping the regularized text corresponding to each node into text characteristic vectors through a bag-of-words model, and stacking to form a text characteristic matrix.
In some embodiments, each node has a node type that is associated with the requirements of the corresponding rule.
In some embodiments, the node types include: a specific type and a general type; the specific types include: a frequency related type, a keyword group related type, and a frequency and keyword group related type.
In some embodiments, the node further has a rule method corresponding to the node, and the rule method is used for being called to acquire an event according to a matching result.
In some embodiments, each of the nodes caches the respective generated match results.
A second aspect of the present application provides an event acquiring apparatus based on a log, including: the rule tree building module is used for obtaining a rule tree file and building a rule tree based on the rule tree file; wherein each node of the rule tree is constructed corresponding to a rule; the node relation among all the nodes corresponds to the dependency relation among the rules; the rule tree matching module is used for inputting the log data into a root node of the rule tree and matching corresponding rules at each node along a matching path; and the event acquisition module is used for acquiring an event according to the matching result of the node.
A third aspect of the present application provides a computer device comprising: a communicator, a memory, and a processor; the communicator is used for communicating with the outside; the memory is to store program instructions; the processor is configured to execute the program instructions to perform the log-based event acquisition method according to any one of the first aspect.
A fourth aspect of the present application provides a computer-readable storage medium storing program instructions that, when executed, perform the log-based event acquisition method according to any one of the first aspects.
As described above, in the embodiments of the present application, a method, an apparatus, a device, and a medium for log-based event acquisition are provided, where a rule tree is constructed by acquiring a rule tree file and based on the rule tree file; each node of the rule tree is constructed corresponding to one rule; the node relation among all nodes corresponds to the dependency relation among the rules; inputting the log data into a root node of the rule tree and matching corresponding rules at each node along a matching path; and acquiring an event according to the matching result of the nodes. By establishing rule trees related to the rules, log data can pass through each node along a matching path, and the node relation corresponds to the dependency relation among the rules, so that the efficiency is effectively improved compared with the prior mode of traversing a rule set. In addition, when the rule tree is constructed, the rule tree structure can be optimized by forming multiple nodes through node balance and regular text splitting, and the matching efficiency is improved; node types can also be set according to rule requirements to call corresponding methods to meet the rule requirements when matching.
Drawings
Fig. 1 shows a flowchart of a log-based event acquisition method in an embodiment of the present application.
Fig. 2 shows a flowchart of a method for generating a rule tree file according to an embodiment of the present application.
FIG. 3 shows a class diagram and an inheritance relationship diagram of each node type in an embodiment of the present application.
Fig. 4a shows a schematic structural diagram of a rule tree in an embodiment of the present application.
FIG. 4b shows a schematic structure diagram of the rule tree after equalization according to an embodiment of the present application.
Fig. 5 shows a flowchart of an event obtaining method in an application example of the present application.
Fig. 6 shows a schematic diagram of an exemplary code of 4 rules in an application example of the present application.
FIG. 7 shows a normalized text diagram of a plurality of rules in an application example of the present application.
Fig. 8 shows a block diagram of a log-based event acquisition device according to an embodiment of the present application.
Fig. 9 shows a schematic structural diagram of a computer device in an embodiment of the present application.
Detailed Description
Embodiments of the present application are described below with specific examples, and other advantages and effects of the present application will be readily apparent to those skilled in the art from the disclosure herein. The present application is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present application. It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.
Embodiments of the present application will be described in detail below with reference to the accompanying drawings so that those skilled in the art to which the present application pertains can easily carry out the present application. The present application may be embodied in many different forms and is not limited to the embodiments described herein.
Reference throughout this specification to "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," or the like, means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present application. Furthermore, the particular features, structures, materials, or characteristics shown may be combined in any suitable manner in any one or more embodiments or examples. Moreover, various embodiments or examples and features of different embodiments or examples presented in this application can be combined and combined by those skilled in the art without contradiction.
Furthermore, the terms "first", "second" are used merely to denote an object and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the expressions of the present application, "plurality" means two or more unless specifically defined otherwise.
In order to clearly explain the present application, components that are not related to the description are omitted, and the same reference numerals are given to the same or similar components throughout the specification.
Throughout the specification, when a device is referred to as being "connected" to another device, this includes not only the case of being "directly connected" but also the case of being "indirectly connected" with another element interposed therebetween. In addition, when a device "includes" a certain component, unless otherwise stated, the device does not exclude other components, but may include other components.
Although the terms first, second, etc. may be used herein to refer to various elements in some examples, these elements should not be limited by these terms. These terms are only used to distinguish one element from another. For example, the first interface and the second interface are represented. Also, as used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context indicates otherwise. It will be further understood that the terms "comprises," "comprising," "includes" and/or "including," when used in this specification, specify the presence of stated features, steps, operations, elements, modules, items, species, and/or groups, but do not preclude the presence, or addition of one or more other features, steps, operations, elements, modules, items, species, and/or groups thereof. The terms "or" and/or "as used herein are to be construed as inclusive or meaning any one or any combination. Thus, "A, B or C" or "A, B and/or C" means "any of the following: a; b; c; a and B; a and C; b and C; A. b and C ". An exception to this definition will occur only when a combination of elements, functions, steps or operations are inherently mutually exclusive in some way.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used herein, the singular forms "a", "an" and "the" include plural forms as long as the words do not expressly indicate a contrary meaning. The term "comprises/comprising" when used in this specification is taken to specify the presence of stated features, regions, integers, steps, operations, elements, and/or components, but does not exclude the presence or addition of other features, regions, integers, steps, operations, elements, and/or components.
Although not defined differently, including technical and scientific terms used herein, all terms have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. Terms defined in commonly used dictionaries are to be additionally interpreted as having meanings consistent with those of related art documents and currently prompted messages, and should not be excessively interpreted as having ideal or very formulaic meanings unless defined.
In the related art, a mode of traversing the regular expression through each log is generally adopted, and the efficiency is extremely low.
In view of this, the embodiment of the present application may provide an event obtaining method based on a log, and perform matching through a rule tree generated by a corresponding rule, so as to effectively improve matching efficiency.
Fig. 1 shows a schematic flowchart of a log-based event acquisition method in an embodiment of the present application. The log-based event acquisition method comprises the following steps:
step S101: and acquiring a rule tree file, and constructing a rule tree based on the rule tree file.
Each node of the rule tree corresponds to a rule construction, and the node relation among the nodes corresponds to the dependency relation among the rules.
In some embodiments, the rule tree file may be pre-established, the rule tree file being established according to a rule tree, the rule tree being established according to rules. The dependency relationship between the rules is related to the matching path of the log data in each rule.
For example, when the rule tree is used for the first time, the rule tree may be generated and compiled for the first time, and then the rule tree may be operated, and then the rule tree may be serialized and stored as a rule tree file after being used. Wherein, the compiling is the process of program language translation; serialization (Serialization) is the process of converting state information of an object into a form that can be stored or transmitted. And then, as long as the rule is not changed, the content of the generated rule tree file can be directly loaded to restore the rule tree, so that the matching efficiency is favorably improved.
Fig. 2 is a flowchart illustrating a method for generating a rule tree file according to an embodiment of the present application. Alternatively, the flow in FIG. 2 may occur when rule trees are first used for matching. The process comprises the following steps:
step S201: traversing each rule, parsing the contents of the rules to obtain rule information, and determining node relationships based on dependencies between the rules.
In some embodiments, there may be a pre-set rule set, such as a rule set for network security, and the rules may be in the rule set.
In some embodiments, the rule information comprises: rule attributes and methods, the rule attributes including regularized text for matching. In some embodiments, the rule attributes may include, for example: rule id, regularization text (i.e., regular expression string), upstream rule id (i.e., id of upstream rule in matching path), etc., and if the rule has statistical information frequency requirement, the rule attribute may further include frequency, time window, etc. After the rule attributes of each rule are obtained by parsing, the rule attributes can be organized into a list, for example, each row is for each rule: rule 1, rule 2; the fields in each column are: rule id, regularized text, upstream rule id..
In some embodiments, some node relationships may be constructed by preprocessing the rules, including at least one of:
1) in response to a downstream rule relying on a first number of upstream rules, a downstream rule is created that has the same content as the downstream rule such that the number of downstream rules reaches the first number. For example, if a downstream rule has a upstream rules, a child nodes of a upstream nodes corresponding to the upstream rules are repeatedly generated corresponding to the downstream rule, and each child node contains an upstream rule id.
2) The regular text contained in a rule is split into a plurality of sub regular texts, and a rule is formed according to each sub regular text. For example, a rule comprising regular text "a | B", or, processing the regular text, removing redundant symbols, etc., extracting a and B may be disassembled to form nodes, respectively.
Still another, combine 1) and 2) together 3):
3) splitting a regular text contained in a downstream rule into a plurality of sub regular texts, and forming a rule according to each sub regular text to form a rule set corresponding to the downstream rule; in response to the downstream rule relying on a first number of upstream rules, a rule set is created that has the same content as the rule set such that the rule set reaches the first number.
For example, if the regular text of a downstream rule is split into b pieces with a upstream rules, the downstream rule may form a subset of a × b rules with the same id, forming a total of a × b sub-nodes of a upstream nodes, and b sub-nodes below each upstream node.
Step S202: and constructing a first rule tree according to the rule information and the dependency relationship of each rule.
Wherein, each node stores the rule information of a rule correspondingly.
In some embodiments, each node has a node type that is associated with the requirements of the corresponding rule. For example, the requirement of statistical information frequency, the rule of the requirement according to keyword classification, etc.
In some embodiments, a rule tree object may be created, a list of rules traversed, rule information retrieved, and a node established for each rule.
In some embodiments, the node types include: specific type and general type (Base); the specific types include: frequency (Frequency) related type, keyword (Key) grouping related type, and Frequency and keyword (KeyFrequency) grouping related type. The class diagrams and inheritance relationships are shown, for example, in FIG. 3. In each box in fig. 3, the top is the name of the class, the top separated by the dotted line is the parameter of the class, and the bottom is the method of the class.
It should be particularly noted that new node types may also be developed according to actual needs, and the 4 examples are not limited.
For a rule with frequency requirements, creating a frequency node instance; for a Key which needs to extract a certain substring as a result grouping, the position of the Key in a regular text needs to be specified, and a KeyNode instance is created; for the rule with the two characteristics, a KeyFrequencyNode example is created; general rules create a BaseNode instance. Alternatively, since a rule may be formed as a plurality of nodes due to its regularized text being split, the nodes may have the same node id (node _ id) but different indices (indices) for the purpose of facilitating identification and knowing their origin.
And connecting the nodes in the rule tree according to the upstream rule information of the downstream rule to complete the node relation modeling of the rule dependency relation. The structure of the rule tree is shown in fig. 4a, for example, each node corresponds to a rule and stores corresponding rule attributes. Optionally, the node further has a rule method corresponding to the node, where the rule method is used to be called to obtain an event according to the matching result, for example, a Frequency method of a Frequency node, and is used to be called to count the occurrence Frequency of information according to the matching result.
Step S203: and equalizing the dependency relationship of each node in the first rule tree to obtain a second rule tree.
In some embodiments, the equalizing the dependency of the nodes in the first rule tree includes: generating a corresponding text characteristic matrix based on the regularization text of each node; determining a preset number of target feature dimensions with the largest information amount based on the text feature matrix; determining each text segment corresponding to the regularized text according to the preset number of target feature dimensions; constructing an intermediate layer between the current layer and the previous layer according to each text segment; wherein the intermediate layer comprises: presetting a number of first nodes, wherein each first node is constructed corresponding to one text fragment; and a second node; determining a dependency relationship between each current node and the first node and between each current node and the second node respectively based on the regularization text of each current node in the current layer and the matching relationship between each text segment; and the current node which is matched with the first node forms a dependency relationship with the second node which is not matched with the first node.
In some embodiments, the text segment may be a word, or may also be a phrase or longer text, etc.; the generating of the corresponding text feature matrix based on the regularized text of each node includes: and mapping the regularized text corresponding to each node into text characteristic vectors through a bag-of-words model, and stacking to form a text characteristic matrix.
Equalization is specifically exemplified below. If a level in the rule tree has a large number of nodes, such as level 2 in the rule tree in fig. 4a, the number of nodes that need to be passed by the matching path during matching may be large. Illustratively, a natural language processing mode can be adopted to analyze text segments (vocabularies) in the regular text, extract nodes of a middle layer, further break up a plurality of nodes and construct a dependency relationship with the nodes of the middle layer.
In a possible implementation example, the regularized text of each node may be represented using a bag of words model (non-word characters may be removed) after pre-processing of all the regularized text (which may be a split-up regularized text). Thus, the regularized text of each rule can be quantized into a d-dimensional text feature vector, and the vectorization algorithm can adopt word frequency or TF-IDF and the like. Each feature dimension corresponds to a segment of text, which in this example is a vocabulary. Assuming that there are a total of n regularization texts,
in a possible example, in order to disperse the nodes forming the intermediate layer as much as possible, the vector of information quantities for each characteristic dimension may be calculated according to equation (1) according to the principle of maximum entropy:
Figure BDA0003448997800000081
wherein H represents vectors of a preset number of target feature dimensions,
Figure BDA0003448997800000082
is the feature value of the text feature matrix in the feature dimension (i.e. corresponding vocabulary) of the d-th column, and p (x) is the distribution of the feature values of the feature dimension x.
The larger the information quantity is, the more balanced the lower layer nodes of the reconstructed dependency relationship of the middle layer nodes can be distributed, and as shown in formula (2), the vocabulary corresponding to m dimensionalities with the largest information quantity is taken as the middle layer nodes, and the lower layer nodes are linked to the first nodes of the middle layer which are associated with the middle layer nodes (namely, the vocabulary corresponding to the target characteristic dimensionalities is contained); in order to deal with the situation that some regular texts only contain non-word character strings, a second node is added in the middle layer, and the node which does not contain the m vocabularies is linked to the default node.
Figure BDA0003448997800000091
Wherein
Figure BDA0003448997800000092
Is composed of
Figure BDA0003448997800000093
In descending order.
Reference may be made to the variation of fig. 4a to 4b, where fig. 4b adds individual nodes of the middle tier and reassigns dependencies to the middle tier nodes.
Step S204: and forming a rule tree file according to the second rule tree.
In some embodiments, the rule tree may be serialized and stored in binary form into a rule tree file for use in the next read.
Returning to fig. 2, step S102: inputting log data into a root node of the rule tree and matching corresponding rules at each node along a matching path;
in some embodiments, each of the nodes caches the respective generated match results.
In a specific example, step S102 may be implemented by reading log data from, for example, a distributed log storage system, and inputting the log data to a root node of a rule tree one by one. If the regular text of the current node is matched with the log, the index of the log is recorded into the node cache without directly storing the log content, so that the occupation of storage resources is reduced. Optionally, the node type may also affect the change of the matching path, so the setting of the node type may be changed according to the actual requirement. For example, if the node type is KeyNode, extracting a key value for grouping according to the position specified by the key, recording the index of the log to a key value list of the corresponding group, and then transmitting the log to the child node of the current node for continuous matching; if the current node is not matched with the log, the matching task of the current matching path can be terminated, and the subsequent node does not need to execute the matching operation of the log.
Step S103: and acquiring an event according to the matching result of the node.
In some embodiments, cached results for various nodes of the rule tree are read to analyze the fetch event.
Nodes of different node types obtain events in different ways.
For example, for a node of a FrequencyNode type, the number of logs hit by matching can be counted, whether the difference between the maximum time and the minimum time in a window is in a group of log indexes meeting the quantity requirement in a time range can be calculated in a window sliding mode, and if such a group of results can be found, an event meeting the frequency is output; otherwise, no event is output. For another example, for other types of nodes, the node information and the cached log index may be integrated according to specific service requirements, and an event may be output for each log. Further optionally, the output event may be persisted to storage.
After the rule tree is used for the first time, the compiled rule tree can also be stored in a binary form in a rule tree file after being serialized so as to be used for the next reading.
Referring to fig. 5, an application example is shown to show the above principle of the event acquisition method, taking a certain security log matching system as an example.
In fig. 5, the specific process may include:
step S501: and judging whether the rule tree file is used for the first time. If yes, the process goes to step S502, otherwise; step S509 is entered: and reading the rule tree file.
Step S502: rule content parsing and complex canonical splitting.
In some embodiments, parsing and splitting of complex regular text may be performed on the content of each rule in the rule definition file.
For example, the attributes in the read rule definition file include id, regular expression string, upstream rule id, frequency, time window, rule description, alarm level, whether to alarm, and the like.
In fig. 6, (a) to (d) schematically show exemplary codes of 4 rules, respectively, (a) is 5710 rule, (b) is 5719 rule, (c) is 5716 rule, and (d) is 5720 rule, each having different requirements.
The rule shown in (a) divides the outermost layer or symbol of the regular expression into two items, namely "illegal user" and "invalid user", and the upstream rule is 5710, so that the rule can occupy 2 nodes in the rule list, and the nodes are respectively corresponding to the nodes to be generated by the "illegal user" and the "invalid user".
Step S503, according to the dependency relationship of the rule, a rule tree is constructed:
in a specific example, a rule tree object may be created, a rule list traversed, rule information extracted, and a node established for each rule. The invention defines 4 types of nodes, and the class diagram and the inheritance relationship are shown in figure 3. The rule of fig. 6 (a) is exemplified as two basenodes; the rule of (b) in FIG. 6 is instantiated as a freqynoden; the rule of (c) in fig. 6 is instantiated as a KeyNode; the rule of (d) in fig. 6 is instantiated as a KeyFrequencyNode, and although the rule of (d) in fig. 6 does not define the location of a key, it may be inherited from the upstream node 5716. And performing mode pre-compiling on the regular regularized text, so that the matching efficiency is improved.
The nodes in the rule tree are connected into a tree structure as shown in fig. 4a according to the upstream rule information of the rule.
Step S504: node number equalization: then, the process may further proceed to step S508: serializing the rule tree, storing into the rule tree file, to be used next by step S509.
The number of nodes at the 2 nd layer of the rule tree in fig. 4a is large, and in order to further reduce the number of nodes required to pass through during matching, a natural language processing mode is adopted to analyze regular words, a middle layer is extracted, and the reconstruction dependency relationship of the nodes is scattered, as shown in fig. 4 b.
Illustrating the acquisition of the nodes of the intermediate layer:
the regularization text of all rules can be preprocessed to obtain the regularization text of each rule (corresponding node) illustrated in fig. 7: using bag-of-words (bag-of-words) models to represent the regularized text of each node, the feature dimensions may remove non-word characters. Each text may be quantized into a 169-dimensional word frequency vector. In order to disperse the nodes as much as possible, if 64 rules exist according to the maximum entropy principle, the text matrix is 64 x 169, the information quantity of each dimension is calculated according to the formula (1), vocabularies corresponding to 8 characteristic dimensions with the maximum information quantity are taken as middle layer nodes according to the formula (2), and the nodes are linked to the middle layer nodes related to the middle layer nodes.
The vocabulary corresponding to the 8 feature dimensions is: "error", "not", "failed", "user", "fast", "Connection", "to", "for". Extracting the 8 words to the middle layer to form a first node; and a second node is added.
Step S505: and matching the log with the rule by using the rule tree:
the log data can be read from the distributed log storage system and input to the root node of the rule tree one by one.
As shown in the following table, 4 logs 1-4 are provided as an example:
1 Invalid user abc from 192.168.2.1 2021-07-02 16:41:55
2 Failed password for invalid user abc from 192.168.2.1 port 26605 ssh2 2021-07-02 16:41:55
3 Failed password for invalid user abc from 192.168.2.1 port 26605 ssh2 2021-07-02 16:41:55
4 Failed password for invalid user abc from 192.168.2.1 port 26605 ssh2 2021-07-02 16:41:56
step S506: obtaining a matching result, and caching the matching result in the nodes of the regular tree;
assuming that the lower-layer nodes of the nodes corresponding to the rule 5700 have 5701-5709, after the log 1 passes through 5700, the log is matched with the subnodes of 5700 in sequence, and the 5701-5709 subnodes are not matched, so that the log is not matched with the next layer; when 5710 is reached, log 1 matches hit the regular text, so the index of log 1 is recorded in the cache of 5710 node; assuming that logs 2-4 match at node 5716 and are therefore all recorded in the cache of node 5716, since node 5716 is defined as a key type, the 3 rd value, i.e., ip value, is to be taken from the packet hit by the regular text match, and as the resulting key, the index of these 3 logs is stored in the hash table with key "192.168.2.1", and then these 3 logs are recorded in the hash table with key "192.168.2.1" of 5720.
Step S507: and organizing the content and format of the output data according to specific service requirements.
Reading each node result of the rule tree, and outputting an event to the node 5710; for the node of 5716, 3 events are output; for node 5720, statistics show that 3 matches occurred within 120 seconds, thus outputting an event.
The output events may be stored, for example, to an output event storage database.
Fig. 8 is a schematic block diagram of an event acquisition device based on a log according to an embodiment of the present application. The event acquisition device can be implemented by referring to the event acquisition method in the previous embodiment, and therefore technical features are not repeated in this embodiment.
The event acquiring apparatus 800 includes:
a rule tree construction module 801, configured to obtain a rule tree file, and construct a rule tree based on the rule tree file; wherein each node of the rule tree is constructed corresponding to a rule; the node relation among all the nodes corresponds to the dependency relation among the rules;
a rule tree matching module 802, configured to input log data into a root node of the rule tree and perform matching of corresponding rules at each node along a matching path;
and an event obtaining module 803, configured to obtain an event according to the matching result of the node.
In some embodiments, the method for generating a rule tree file includes: traversing each rule, analyzing the content of the rule to obtain rule information, and determining a node relation based on the dependency relation among the rules; the rule information includes: rule attributes and methods, the rule attributes including regularized text for matching; constructing a first rule tree according to the rule information and the dependency relationship of each rule; wherein, each node correspondingly stores the rule information of a rule; equalizing the dependency relationship of each node in the first rule tree to obtain a second rule tree; and forming a rule tree file according to the second rule tree.
In some embodiments, before determining the corresponding node relationship in the rule tree based on the dependency relationship between the rules, further comprising rule preprocessing; the rule preprocessing comprises at least one of: 1) in response to a downstream rule relying on a first number of upstream rules, creating a downstream rule having the same content as the downstream rule such that the number of downstream rules reaches the first number; 2) splitting a regular text contained in a rule into a plurality of sub regular texts, and forming a rule according to each sub regular text; 3) splitting a regular text contained in a downstream rule into a plurality of sub regular texts, and forming a rule according to each sub regular text to form a rule set corresponding to the downstream rule; in response to the downstream rule relying on a first number of upstream rules, a rule set is created that has the same content as the rule set such that the rule set reaches the first number.
In some embodiments, the equalizing the dependency of the nodes in the first rule tree includes: generating a corresponding text characteristic matrix based on the regularization text of each node; determining a preset number of target feature dimensions with the largest information amount based on the text feature matrix; determining each text segment corresponding to the regularized text according to the preset number of target feature dimensions; constructing an intermediate layer between the current layer and the previous layer according to each text segment; wherein the intermediate layer comprises: presetting a number of first nodes, wherein each first node is constructed corresponding to one text fragment; and a second node; determining a dependency relationship between each current node and the first node and between each current node and the second node respectively based on the regularization text of each current node in the current layer and the matching relationship between each text segment; and the current node which is matched with the first node forms a dependency relationship with the second node which is not matched with the first node.
In some embodiments, the text passage is a vocabulary; the generating of the corresponding text feature matrix based on the regularized text of each node includes: and mapping the regularized text corresponding to each node into text characteristic vectors through a bag-of-words model, and stacking to form a text characteristic matrix.
In some embodiments, each node has a node type that is associated with the requirements of the corresponding rule.
In some embodiments, the node types include: a specific type and a general type; the specific types include: a frequency related type, a keyword group related type, and a frequency and keyword group related type.
In some embodiments, the node further has a rule method corresponding to the node, and the rule method is used for being called to acquire an event according to a matching result.
In some embodiments, each of the nodes caches the respective generated match results.
It should be noted that, all or part of the functional blocks in the embodiment of fig. 8 may be implemented by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of program instruction products. The program instruction product includes one or more program instructions. The processes or functions according to the present application occur in whole or in part when program instruction instructions are loaded and executed on a computer. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The program instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium.
Moreover, the apparatus disclosed in the embodiment of fig. 8 can be implemented by other module division methods. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the modules described is merely a logical division, and in actual implementation, there may be other divisions, for example, multiple modules or modules may be combined or may be dynamic to another system, or some features may be omitted, or not implemented. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or modules, and may be in an electrical or other form.
In addition, each functional module and sub-module in the embodiment in fig. 8 may be dynamically in one processing unit, or each module may exist alone physically, or two or more modules may be dynamically in one unit. The dynamic component can be realized in a form of hardware or a form of a software functional module. The dynamic components described above, if implemented in the form of software functional modules and sold or used as a stand-alone product, may also be stored in a computer readable storage medium. The storage medium may be a read-only memory, a magnetic or optical disk, or the like.
It should be noted that the flow or method representations represented by the flow diagrams of the above-described embodiments of the present application may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process. And the scope of the preferred embodiments of the present application includes other implementations in which functions may be performed out of the order shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved.
For example, the order of the steps in the embodiments of fig. 1, fig. 2, fig. 5, etc. may be changed in a specific scenario, and is not limited to the above representation.
Fig. 9 is a schematic circuit diagram of a network device according to an embodiment of the present application.
In some embodiments, the computer device 900 may be implemented in a server, a group of servers, or the like.
The computer device 900 includes a bus 901, a processor 902, a memory 903, and a communicator 904. The processor 902 and the memory 903 may communicate with each other via a bus 901. The memory 903 may have stored therein program instructions (e.g., system or application software). The processor 902 implements the steps of the log-based event acquisition method in the embodiments of the present application by executing the program instructions in the memory 903.
The bus 901 may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. Although only one thick line is shown in fig. 1 for ease of illustration, only one bus or type of bus is not shown.
In some embodiments, the processor 902 may be implemented as a Central Processing Unit (CPU), a micro-Processing Unit (MCU), a System On a Chip (System On Chip), or a field programmable logic array (FPGA). The Memory 903 may include a Volatile Memory (Volatile Memory) for temporary storage of data when the program is executed, such as a Random Access Memory (RAM).
The Memory 903 may also include a non-volatile Memory (non-volatile Memory) for data storage, such as a Read-Only Memory (ROM), a flash Memory, a Hard Disk Drive (HDD) or a Solid-State Disk (SSD).
The communicator 904 is used for communicating with the outside. In particular examples, the communicator 904 can include one or more wired and/or wireless communication circuit modules. For example, the wired communication circuit module may include one or more of a wired network card, a USB module, a serial interface module, and the like, for example. As another example, the wireless communication protocol followed by the wireless communication module includes: such as one or more of Near Field Communication (NFC) technology, Infrared (IR) technology, Global System for Mobile communications (GSM), General Packet Radio Service (GPRS), Code Division Multiple Access (CDMA), Wideband Code Division Multiple Access (WCDMA), Time Division Code Division multiple access (TD-SCDMA), Long Term Evolution (LTE), BlueTooth (BT), Global Navigation Satellite System (GNSS), and the like.
A computer-readable storage medium may also be provided in embodiments of the present application, storing program instructions that, when executed, perform the previous embodiments.
That is, the method steps in the above-described embodiments are implemented as software or computer code storable in a recording medium such as a CD ROM, a RAM, a floppy disk, a hard disk, or a magneto-optical disk, or computer code originally stored in a remote recording medium or a non-transitory machine-readable medium and to be stored in a local recording medium downloaded through a network, so that the method represented herein can be stored in such software processes on a recording medium using a general-purpose computer, a dedicated processor, or programmable or dedicated hardware such as an ASIC or FPGA.
In summary, the embodiments of the present application provide a method, an apparatus, a device, and a medium for log-based event acquisition, where a rule tree file is acquired, and a rule tree is constructed based on the rule tree file; each node of the rule tree is constructed corresponding to one rule; the node relation among all nodes corresponds to the dependency relation among the rules; inputting the log data into a root node of the rule tree and matching corresponding rules at each node along a matching path; and acquiring an event according to the matching result of the nodes. By establishing rule trees related to the rules, log data can pass through each node along a matching path, and the node relation corresponds to the dependency relation among the rules, so that the efficiency is effectively improved compared with the prior mode of traversing a rule set. In addition, when the rule tree is constructed, the rule tree structure can be optimized by forming multiple nodes through node balance and regular text splitting, and the matching efficiency is improved; node types can also be set according to rule requirements to call corresponding methods to meet the rule requirements when matching.
And then the technical effects which can be achieved by the scheme in the embodiment of the application are specifically analyzed:
1) regularized matching is carried out through the upstream and downstream relations of the corresponding rules of the upper and lower layer relations of the nodes of the rule tree, unnecessary node matching is reduced, and traversing of all the rules is avoided;
2) the rule tree is compiled once, can be stored into a file for multiple use, and can be read to be uniformly compiled;
3) regular matching requirements that support complex functions such as key-type packet statistics;
4) the method has the advantages that the complicated regularized texts in one rule are converted into different rules and converted into a plurality of simple regularized texts, so that the efficiency of the execution of the regularized matching is improved;
5) the node type can be expanded, and the rule information stored by the node can be flexibly increased and decreased, so that different demand scenes can be conveniently supported;
6) node equalization processing can be realized based on natural language processing, and the number of nodes of the matching path is further reduced.
The above embodiments are merely illustrative of the principles and utilities of the present application and are not intended to limit the application. Any person skilled in the art can modify or change the above-described embodiments without departing from the spirit and scope of the present application. Accordingly, it is intended that all equivalent modifications or changes which can be made by those skilled in the art without departing from the spirit and technical concepts disclosed in the present application shall be covered by the claims of the present application.

Claims (12)

1. A log-based event acquisition method is characterized by comprising the following steps:
acquiring a rule tree file, and constructing a rule tree based on the rule tree file; wherein each node of the rule tree is constructed corresponding to a rule; the node relation among all the nodes corresponds to the dependency relation among the rules;
inputting log data into a root node of the rule tree and matching corresponding rules at each node along a matching path;
and acquiring an event according to the matching result of the node.
2. The log-based event acquisition method as claimed in claim 1, wherein the generation method of the rule tree file comprises:
traversing each rule, analyzing the content of the rule to obtain rule information, and determining a node relation based on the dependency relation among the rules; the rule information includes: rule attributes and methods, the rule attributes including regularized text for matching;
constructing a first rule tree according to the rule information and the dependency relationship of each rule; wherein, each node correspondingly stores the rule information of a rule;
equalizing the dependency relationship of each node in the first rule tree to obtain a second rule tree;
and forming a rule tree file according to the second rule tree.
3. The log-based event acquisition method of claim 2, further comprising rule preprocessing before determining the corresponding node relationship in the rule tree based on the dependency relationship between the rules; the rule preprocessing comprises at least one of:
1) in response to a downstream rule relying on a first number of upstream rules, creating a downstream rule having the same content as the downstream rule such that the number of downstream rules reaches the first number;
2) splitting a regular text contained in a rule into a plurality of sub regular texts, and forming a rule according to each sub regular text;
3) splitting a regular text contained in a downstream rule into a plurality of sub regular texts, and forming a rule according to each sub regular text to form a rule set corresponding to the downstream rule;
in response to the downstream rule relying on a first number of upstream rules, a rule set is created that has the same content as the rule set such that the rule set reaches the first number.
4. The log-based event retrieval method of claim 2, wherein the equalizing the dependency relationship of each node in the first rule tree comprises:
generating a corresponding text characteristic matrix based on the regularization text of each node;
determining a preset number of target feature dimensions with the largest information amount based on the text feature matrix; determining each text segment corresponding to the regularized text according to the preset number of target feature dimensions;
constructing a middle layer between the current layer and the previous layer according to each text segment; wherein the intermediate layer comprises: presetting a number of first nodes, wherein each first node is constructed corresponding to one text fragment; and a second node;
determining a dependency relationship between each current node and the first node and between each current node and the second node respectively based on the regularization text of each current node in the current layer and the matching relationship between each text segment; and the current node which is matched with the first node forms a dependency relationship with the second node which is not matched with the first node.
5. The log-based event retrieval method of claim 4, wherein the text segment is a vocabulary; the generating of the corresponding text feature matrix based on the regularized text of each node includes:
and mapping the regularized text corresponding to each node into text characteristic vectors through a bag-of-words model, and stacking to form a text characteristic matrix.
6. The log-based event retrieval method of claim 1, wherein each node has a node type, the node type being associated with requirements of a corresponding rule.
7. The log-based event retrieval method of claim 6, wherein the node types include: a specific type and a general type; the specific types include: a frequency-related type, a keyword group-related type, and at least one of a frequency and a keyword group-related type.
8. The log-based event retrieval method of claim 6 or 7, wherein the node further has a rule method corresponding thereto, the rule method being used to be called to retrieve an event according to the matching result.
9. The log-based event retrieval method of claim 1, wherein each of the nodes caches respective generated matching results.
10. A log-based event acquisition apparatus, comprising:
the rule tree building module is used for obtaining a rule tree file and building a rule tree based on the rule tree file; wherein each node of the rule tree is constructed corresponding to a rule; the node relation among all the nodes corresponds to the dependency relation among the rules;
the rule tree matching module is used for inputting the log data into a root node of the rule tree and matching corresponding rules at each node along a matching path;
and the event acquisition module is used for acquiring an event according to the matching result of the node.
11. A computer device, comprising: a communicator, a memory, and a processor; the communicator is used for communicating with the outside; the memory is to store program instructions; the processor is configured to execute the program instructions to perform the log-based event retrieval method according to any one of claims 1 to 9.
12. A computer-readable storage medium, in which program instructions are stored, which program instructions, when executed, perform the log-based event acquisition method of any of claims 1 to 9.
CN202111658789.6A 2021-12-31 2021-12-31 Event obtaining method, device, equipment and medium based on log Pending CN114297046A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111658789.6A CN114297046A (en) 2021-12-31 2021-12-31 Event obtaining method, device, equipment and medium based on log

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111658789.6A CN114297046A (en) 2021-12-31 2021-12-31 Event obtaining method, device, equipment and medium based on log

Publications (1)

Publication Number Publication Date
CN114297046A true CN114297046A (en) 2022-04-08

Family

ID=80974143

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111658789.6A Pending CN114297046A (en) 2021-12-31 2021-12-31 Event obtaining method, device, equipment and medium based on log

Country Status (1)

Country Link
CN (1) CN114297046A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118093725A (en) * 2024-04-22 2024-05-28 极限数据(北京)科技有限公司 Ultra-large-scale distributed cluster architecture and data processing method

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118093725A (en) * 2024-04-22 2024-05-28 极限数据(北京)科技有限公司 Ultra-large-scale distributed cluster architecture and data processing method

Similar Documents

Publication Publication Date Title
CN111241241B (en) Case retrieval method, device, equipment and storage medium based on knowledge graph
CN109670163B (en) Information identification method, information recommendation method, template construction method and computing device
Dey Sarkar et al. A novel feature selection technique for text classification using Naive Bayes
US8171029B2 (en) Automatic generation of ontologies using word affinities
US11449761B2 (en) Efficient value lookup in a set of scalar intervals
CN111985228B (en) Text keyword extraction method, text keyword extraction device, computer equipment and storage medium
US20230078918A1 (en) Devices and methods for efficient execution of rules using pre-compiled directed acyclic graphs
CN104283567A (en) Method for compressing or decompressing name data, and equipment thereof
US20090024616A1 (en) Content retrieving device and retrieving method
CN106874425A (en) Real time critical word approximate search algorithm based on Storm
CN111562920A (en) Method and device for determining similarity of small program codes, server and storage medium
JP2017045291A (en) Similar image searching system
CN113971210B (en) Data dictionary generation method and device, electronic equipment and storage medium
CN114297046A (en) Event obtaining method, device, equipment and medium based on log
US8484221B2 (en) Adaptive routing of documents to searchable indexes
CN116522003B (en) Information recommendation method, device, equipment and medium based on embedded table compression
CN110717014B (en) Ontology knowledge base dynamic construction method
CN112612810A (en) Slow SQL statement identification method and system
CN110134943B (en) Domain ontology generation method, device, equipment and medium
CN109902162B (en) Text similarity identification method based on digital fingerprints, storage medium and device
CN112711678A (en) Data analysis method, device, equipment and storage medium
CN116186708A (en) Class identification model generation method, device, computer equipment and storage medium
CN110598209B (en) Method, system and storage medium for extracting keywords
CN114328894A (en) Document processing method, document processing device, electronic equipment and medium
JP6666312B2 (en) Multidimensional data management system and multidimensional data management method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination