CN116010499A - Method and device for determining analysis rule and electronic equipment - Google Patents

Method and device for determining analysis rule and electronic equipment Download PDF

Info

Publication number
CN116010499A
CN116010499A CN202211714250.2A CN202211714250A CN116010499A CN 116010499 A CN116010499 A CN 116010499A CN 202211714250 A CN202211714250 A CN 202211714250A CN 116010499 A CN116010499 A CN 116010499A
Authority
CN
China
Prior art keywords
log
accessed
target
analysis
sample
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211714250.2A
Other languages
Chinese (zh)
Inventor
郭斌
黄�俊
孙杨杰
刘睿
郑维
刘华兵
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nsfocus Technologies Inc
Nsfocus Technologies Group Co Ltd
Original Assignee
Nsfocus Technologies Inc
Nsfocus Technologies Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nsfocus Technologies Inc, Nsfocus Technologies Group Co Ltd filed Critical Nsfocus Technologies Inc
Priority to CN202211714250.2A priority Critical patent/CN116010499A/en
Publication of CN116010499A publication Critical patent/CN116010499A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Debugging And Monitoring (AREA)

Abstract

The application relates to the technical field of computers, in particular to a method and a device for determining an analysis rule and electronic equipment, which are used for solving the problems of low efficiency and inaccuracy caused by manually determining the analysis rule in the related technology. In the method, firstly, the number of target variables of a log to be accessed is determined, the number of the target variables is used as the target length of the log to be accessed, then N analysis rules corresponding to the log to be accessed are determined according to the target length and the access type of the log to be accessed, then the N analysis rules are adopted to analyze the log to be accessed respectively to obtain N analysis results, and finally, the analysis rule corresponding to the analysis result containing the most analysis fields in the N analysis results is used as the target analysis rule corresponding to the log to be accessed. Based on the method, the analysis rule can be automatically determined, and the accuracy and efficiency of the finally determined analysis rule are improved.

Description

Method and device for determining analysis rule and electronic equipment
Technical Field
The present disclosure relates to the field of computer technologies, and in particular, to a method and an apparatus for determining an parsing rule, and an electronic device.
Background
When the log is transmitted to the log access component or the third party system, the log needs to be parsed by adopting a corresponding parsing rule, and the log is parsed by determining the corresponding parsing rule in a manual mode at present, namely, the parsing rule corresponding to the log is determined manually according to personal experience, and then the association between the log and the corresponding parsing rule is established.
However, in an actual application scenario, the requirement of determining the analysis rule corresponding to the log to be accessed on the professional ability of the human is high, which increases the corresponding learning cost and the labor cost, and in addition, the problem of inaccurate determined analysis rule exists due to the personal experience difference in the process of manual participation.
Disclosure of Invention
The application provides a method, a device and electronic equipment for determining an analysis rule, which are used for reducing manual dependence and improving the efficiency and accuracy of determining the analysis rule.
In a first aspect, the present application provides a method for determining parsing rules, the method comprising:
determining the number of target variables of a log to be accessed, and taking the number of the target variables as the target length of the log to be accessed; the target variable characterizes the field type in the log to be accessed;
According to the target length and the access type of the log to be accessed, N analysis rules corresponding to the log to be accessed are determined; wherein N is a positive integer greater than or equal to 1;
respectively analyzing the log to be accessed by adopting the N analysis rules to obtain N analysis results; wherein, one analysis result comprises a plurality of analysis fields;
and in the N analysis results, taking the analysis rule corresponding to the analysis result containing the most analysis fields as the target analysis rule corresponding to the log to be accessed.
As a possible implementation manner, the determining N parsing rules corresponding to the log to be accessed according to the target length and the access type of the log to be accessed includes: determining a first log set corresponding to the log to be accessed according to the target length and the access type of the log to be accessed; wherein the first set of journals comprises at least one sample journal; calculating the similarity between the target variable and the reference variable of each sample log in the first log set; selecting all sample logs corresponding to the similarity larger than a preset threshold as target sample logs; and taking N analysis rules corresponding to all the target sample logs as N analysis rules corresponding to the logs to be accessed.
As a possible implementation manner, the determining, according to the target length and the access type of the log to be accessed, the first log set corresponding to the log to be accessed includes: determining a second set of logs based on the target length; wherein a reference length of a sample log in the second log set is the same as the target length; and in the second log set, taking the sample log with the same access type as the log to be accessed as a first log set.
As a possible implementation manner, the calculating the similarity between the target variable and the reference variable of each sample log in the first log set includes: determining a target variable sequence of the log to be accessed based on the arrangement order of the target variables in the log to be accessed; determining, in the first log set, a sequence of reference variables for each sample log based on an order in which the reference variables for each sample log are arranged in each sample log; and taking the similarity between the target variable sequence and the reference variable sequence of each sample log as the similarity between the target variable and the reference variable of each sample log.
As a possible implementation manner, the step of using the similarity between the target variable sequence and the reference variable sequence of each sample log as the similarity between the target variable and the reference variable of each sample log includes: respectively calculating the longest common subsequence between the target variable sequence and the reference variable sequence of each sample log; and taking the length of the longest common subsequence obtained by calculation as the similarity between the target variable and the reference variable of each sample log.
In a second aspect, the present application provides an apparatus for determining parsing rules, the apparatus comprising:
the method comprises the steps of determining a target length module, determining the number of target variables of a log to be accessed, and taking the number of the target variables as the target length of the log to be accessed; the target variable characterizes the field type in the log to be accessed;
the first analysis rule determining module determines N analysis rules corresponding to the log to be accessed according to the target length and the access type of the log to be accessed; wherein N is a positive integer greater than or equal to 1;
the analysis result module is used for respectively analyzing the logs to be accessed by adopting the N analysis rules to obtain N analysis results; wherein, one analysis result comprises a plurality of analysis fields;
And the second analysis rule determining module is used for taking the analysis rule corresponding to the analysis result containing the most analysis fields in the N analysis results as the target analysis rule corresponding to the log to be accessed.
As a possible implementation manner, the determining N parsing rules corresponding to the log to be accessed according to the target length and the access type of the log to be accessed, and the first determining parsing rule module are specifically configured to: determining a first log set corresponding to the log to be accessed according to the target length and the access type of the log to be accessed; wherein the first set of journals comprises at least one sample journal; calculating the similarity between the target variable and the reference variable of each sample log in the first log set; selecting all sample logs corresponding to the similarity larger than a preset threshold as target sample logs; and taking N analysis rules corresponding to all the target sample logs as N analysis rules corresponding to the logs to be accessed.
As a possible implementation manner, the determining, according to the target length and the access type of the log to be accessed, a first log set corresponding to the log to be accessed, where the first determining parsing rule module is specifically configured to: determining a second set of logs based on the target length; wherein a reference length of a sample log in the second log set is the same as the target length; and in the second log set, taking the sample log with the same access type as the log to be accessed as a first log set.
As a possible implementation manner, the calculating the similarity between the target variable and the reference variable of each sample log in the first log set, the first determining parsing rule module is specifically configured to: determining a target variable sequence of the log to be accessed based on the arrangement order of the target variables in the log to be accessed; determining, in the first log set, a sequence of reference variables for each sample log based on an order in which the reference variables for each sample log are arranged in each sample log; and taking the similarity between the target variable sequence and the reference variable sequence of each sample log as the similarity between the target variable and the reference variable of each sample log.
As a possible implementation manner, the similarity between the target variable sequence and the reference variable sequence of each sample log is used as the similarity between the target variable and the reference variable of each sample log, and the first determining parsing rule module is specifically configured to: respectively calculating the longest common subsequence between the target variable sequence and the reference variable sequence of each sample log; and taking the length of the longest common subsequence obtained by calculation as the similarity between the target variable and the reference variable of each sample log.
In a third aspect, the present application provides an electronic device, including:
a memory for storing a computer program;
and the processor is used for realizing the method steps for determining the analysis rule when executing the computer program stored in the memory.
In a fourth aspect, the present application provides a computer readable storage medium having a computer program stored therein, which when executed by a processor, implements a method step of determining an parsing rule as described above.
In the embodiment of the application, firstly, the number of target variables of the log to be accessed, namely the target length of the log to be accessed, is determined, then N analysis rules corresponding to the log to be accessed are determined based on the target length of the log to be accessed and the access type of the log to be accessed, and finally, the N analysis rules with the best analysis effect are selected as target analysis rules. On the one hand, through the target length and the access type of the log to be accessed, the N-medium analysis rule corresponding to the log to be accessed can be determined from a large number of analysis rules, the problem that the log to be accessed is required to be matched with the large number of analysis rules by manual intervention in the related art is solved, further, the target analysis rule is automatically determined, and the efficiency of the determination process is improved. On the other hand, the log to be accessed is analyzed by adopting N analysis rules to obtain N analysis results, and then the analysis rule corresponding to the analysis result containing the most analysis fields is used as a target analysis rule, so that the log to be accessed can be intelligently and automatically matched with the target analysis rule with the highest analysis degree, and the analysis degree and the accuracy of the determined target analysis rule are further effectively improved.
Additional features and advantages of the application will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the application. The objectives and other advantages of the application will be realized and attained by the structure particularly pointed out in the written description and claims thereof as well as the appended drawings.
Drawings
FIG. 1 is a schematic diagram of one possible implementation environment provided herein;
FIG. 2 is a flow chart of a method of determining parsing rules provided herein;
FIG. 3 is a schematic diagram of a preset threshold setting mechanism provided in the present application;
FIG. 4 is a schematic diagram of an apparatus for determining parsing rules provided herein;
fig. 5 is a schematic diagram of a structure of an electronic device provided in the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application more clear, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application.
In the embodiment of the application, the related processes of collecting, storing, using, processing, transmitting, providing, disclosing and the like of the personal information of the user accord with the regulations of related laws and regulations, and the public order colloquial is not violated.
First, some terms of the embodiments of the present application will be explained for easy understanding by those skilled in the art.
And (3) log: the record information is designed for the service according to the system requirement, and is specifically used for recording the running content or business content of the system, such as IP (English name: internet Protocol, chinese name: internet protocol) address, port protocol, time, user name and the like. For example, in the embodiment of the present application, the log may be understood as operation record information or business record information of the third party security device or the network device.
Journal analysis: the purpose of this is to convert the unstructured raw log into structured data. Illustratively, in the embodiment of the present application, log parsing may be understood as performing normalization operations such as parsing, enriching, converting, etc. on the original log according to the log format.
Log access: reporting the log, processing the log, and storing the final processing result. For example, in the embodiment of the present application, log access may be understood as a full process of performing a normalization operation on a log through a corresponding parsing rule and then performing a persistence saving on the log.
Parsing rules: the parsing rules are used to identify text patterns for particular concepts, i.e., key information in the log can be identified by parsing rules. Illustratively, in embodiments of the present application, a parsing rule may be understood as a collection of parsing operations that are needed to be performed throughout the log parsing process.
It should be noted that, the scheme can be applied to determination of target analysis rules of logs to be accessed in multiple application scenarios such as security systems, security platforms, information security, log access and the like. The scheme can be further applied to tasks which need to meet the requirement of improving the accuracy of the target analysis rule aiming at the log to be accessed.
The execution subject of the scheme can be a computing device such as a computing terminal, a remote server and the like. The accuracy of the target analysis rule corresponding to the log to be accessed is improved by being deployed on the related computing equipment and based on the number of target variables and the access type of the log to be accessed. Of course, the main body to which the present embodiment is applied is merely exemplified herein, and is not particularly limited.
The following is a brief description of the design concept of the method for determining the analysis rule according to the embodiment of the present application.
In the log access scene, the related technology adopts a log access component or a third party system, which requires that the use of the object needs to fully understand the log original text format of the log and the analysis rule of the log access component or the third party system, namely mapping and associating the log original text of the analysis rule corresponding type through the manual participation of the applicable object.
As will be readily appreciated, the log access component or the third party system still relies on manual participation, specifically, when the log access is performed with analysis, the log original format needs to be understood or predicted by using the object, and the analysis rule of the log access component or the third party system is fully known, and then the final target analysis rule can be obtained by mapping the associated analysis rule and the corresponding type of log original based on the individual experience of the using object. For example, based on individual experience of using the object, matching is performed on source identifiers (such as manufacturer, equipment type, version and the like) of the log to be accessed, so as to obtain a target analysis rule corresponding to the log to be accessed. However, in practical application, the requirement of determining the analysis rule corresponding to the log to be accessed on the professional ability of the human is high, which increases the corresponding learning cost and labor cost, and in addition, the problem of inaccurate determined analysis rule exists in the process of manual participation due to personal experience difference, so that the related technology cannot better solve the problem of association between the original text of the log and the analysis rule which needs to be manually participated, and the problem of high knowledge requirement on operators exists.
As can be seen, in the related art, there is a problem that the target parsing rule corresponding to the log to be accessed is determined to be inaccurate.
In order to improve accuracy of a determined target parsing rule, the application provides a method for determining the parsing rule. In the method, firstly, the number of target variables of a log to be accessed is determined, the number of the target variables is used as the target length of the log to be accessed, then N analysis rules corresponding to the log to be accessed are determined according to the target length and the access type of the log to be accessed, then the N analysis rules are adopted to analyze the log to be accessed respectively to obtain N analysis results, and finally, the analysis rule corresponding to the analysis result containing the most analysis fields in the N analysis results is used as the target analysis rule corresponding to the log to be accessed.
The target variable characterizes the field type in the log to be accessed; n is a positive integer greater than or equal to 1; one parsing result includes a plurality of parsing fields.
In the embodiment of the application, firstly, the number of target variables of the log to be accessed, namely the target length of the log to be accessed, is determined, then N analysis rules corresponding to the log to be accessed are determined based on the target length of the log to be accessed and the access type of the log to be accessed, and finally, the N analysis rules with the best analysis effect are selected as target analysis rules. On the one hand, through the target length and the access type of the log to be accessed, the N-medium analysis rule corresponding to the log to be accessed can be determined from a large number of analysis rules, the problem that the log to be accessed is required to be matched with the large number of analysis rules by manual intervention in the related art is solved, further, the target analysis rule is automatically determined, and the efficiency of the determination process is improved. On the other hand, the log to be accessed is analyzed by adopting N analysis rules to obtain N analysis results, and then the analysis rule corresponding to the analysis result containing the most analysis fields is used as a target analysis rule, so that the log to be accessed can be intelligently and automatically matched with the target analysis rule with the highest analysis degree, and the analysis degree and the accuracy of the determined target analysis rule are further effectively improved.
The method for determining the analysis rule provided in the embodiment of the present application may be applied to an implementation environment shown in fig. 1, where the implementation environment may at least include an access node, an operation node, a management node, a computing node, and a storage node.
The access node is used for acquiring logs to be accessed.
The operation node is used for interacting with a user so that the user can perform the deployment, configuration and management of the task of determining the target analysis rule.
The management node is configured to obtain a log to be accessed from the access node, for example, referring to fig. 1, the access node uploads the log to be accessed to the cloud end, and the management node obtains the log to be accessed from the cloud end. The management node is also used for managing the computing node and the storage node in combination with determining the target parsing rule task. In the management process, the management node forwards the log to be accessed to the computing node.
And the computing node is used for completing the computing task involved in the task of determining the target analysis rule according to the received log to be accessed, and realizing the computing acceleration.
The storage node is used for storing the log to be accessed, various preset analysis rules and various mapping relations (such as N analysis rules corresponding to the log to be accessed, target analysis rules corresponding to the log to be accessed and the like) generated by determining the target analysis rules according to the management of the management node so as to trace the source.
It should be noted that, the access node, the management node, the computing node, and the storage node are different devices, or any two or three of the management node, the computing node, and the storage node may be integrated in the same device. The above-mentioned operation node is not necessarily required, and the present embodiment is not particularly limited.
The method for determining the analysis rule provided in the embodiment of the present application is specifically described below. Referring to fig. 2, the method includes steps 201-204, as follows.
Step 201: determining the number of target variables of the log to be accessed, and taking the number of the target variables as the target length of the log to be accessed;
the target variable characterizes the field type in the log to be accessed, and in order to describe the meaning of the target variable more clearly, the log characteristics of the log to be accessed are described as follows.
The log may include textual format description statements and system variables, the format description statements of the same log being generally fixed, while the system variables vary as log parameters vary. For example, one log feature may be exemplified as follows:
“<46>Nov 08 01:05:33linux messages[linux.host]:dev_ip=****;os=SUSE Linux Enterprise Server 11SP4;name=messages;object=lsyslog;msg=Nov 801:05:32linux--MARK–”;
wherein, "46", "Nov 08 01:05:33", "linux.host", "11SP4", and "Nov8 01:05:32" are used to identify content, i.e., system variables, and the remainder are used to identify formats, i.e., text description statements.
As an alternative implementation manner, before determining the number of target variables of the log to be accessed, normalization and feature vector extraction may be performed on the log to be accessed, where the key of the processing is to keep the format description statement of the log to be accessed and ignore the system variables. From the foregoing exemplary sample of one log feature, the system variables mainly include: IP, port protocol, etc.; digital information and time variable. In the embodiment of the present application, the corresponding feature vector library may be selected from a preset feature vector extraction library to normalize the log to be accessed and extract the feature vector. As a common feature vector extraction library, the following can be exemplified:
1)msg=re.sub(‘(\d+\.)+\d+’,’IP’,msg)
the method is used for filtering the data in the IP format;
2)msg=re.sub(‘Mar|Apr|Dec|Jan|Feb|Nov|Oct|May|Jun|Jul|Aug|Sep’,’M’,msg)
for filtering the time parameter month;
3)msg=re.sub(‘\\d{d}:\\d{2}:\\d{2}’,’H’,msg)
the filter is used for filtering time parameters in minutes and seconds;
4)msg=re.sub(‘\\d{2,4}[\/-]\\d{2}[\/-]\\d{2,4}’,’Y’,msg)
for filtering time parameters year, month and day 2020-12-11, 2020/12/11;
5)msg=re.sub(‘(=[1-9][0-9]*)’,’=p’,msg)
for filtering successive pure numbers;
6)msg=re.sub(‘\\b(0[xX])?[A-Fa-f0-9]+\\b’,’X’,msg)
for filtering hexadecimal memory addresses.
Of course, the above six are only exemplary and feasible manners, and the embodiment of the application does not specifically limit the feature vector extraction library, and can also support or supplement the feature vector extraction library according to actual application requirements, so as to reserve expansibility to support more types of feature vector data formats.
Further, after the operations of normalization and feature vector extraction are performed on the log to be accessed, the log to be accessed may be divided into a constant part and a variable part. It is easy to understand that the processed result of the text format description statement of the log to be accessed is collectively called a constant part, and the processed result of the system variable of the log to be accessed is collectively called a variable part. Therefore, the number of the target variables of the log to be accessed can be determined based on the variable part, and the number of the target variables is used as the target length of the log to be accessed.
Generally, the target variable is the result of the normalization and feature vector extraction operation of the system variable of the log to be accessed, and the number of the target variables is the number of variable parts in the result after the operation. However, optionally, the target variable may also perform a specified screening or screening operation on the variable according to the actual application situation, which is not described herein.
In view of this, in order to minimize the matching error caused by different log sample formats, step 201 of the embodiment of the present application mainly focuses on the target length of the log to be accessed, that is, the number of target variables, which helps to improve the accuracy of the finally determined target parsing rule.
Step 202: according to the target length and the access type of the log to be accessed, determining N analysis rules corresponding to the log to be accessed;
wherein N is a positive integer greater than or equal to 1.
In the embodiment of the present application, according to the target length of the log to be accessed and the access type of the log to be accessed, the first log set corresponding to the log to be accessed, that is, the set containing at least one sample log, may be determined. Then, calculating the similarity between the target variable and the reference variable of each sample log in the first log set, and selecting all sample logs corresponding to the similarity larger than a preset threshold value as target sample logs. And finally, using N analysis rules corresponding to all the target sample logs as N analysis rules corresponding to the logs to be accessed.
In the above determining process for the first log set corresponding to the log to be accessed, it can be understood that, according to the access type of the log to be accessed, the first log set corresponding to the target length of the log to be accessed is selected from the plurality of log sets. That is, each log set not only corresponds to an access type, but also corresponds to a target length, and further, according to the target length and the access type of the log to be accessed, a corresponding first log set is determined, so that a medium range can be defined from a larger range, namely, a log set hierarchy is defined from a plurality of log set hierarchies. In other words, the matching relationship between each log set and the target length can be understood as a hierarchical clustering idea of length matching, based on which, deep integration is performed according to the access type, and the first log set corresponding to the log to be accessed is determined, so that the matching accuracy can be improved for further matching out the analysis rule, and the matching performance can be effectively improved.
In more detail, as a possible implementation manner, according to the target length and the access type of the log to be accessed, determining the first log set corresponding to the log to be accessed may include: determining a second log set based on the target length, wherein the reference length of the sample logs in the second log set is the same as the target length; then, in the second log set, the sample log with the same access type as the log to be accessed is used as the first log set.
After determining the middle range of the first log set, the range needs to be further narrowed, in this embodiment of the present application, the target sample log is determined by calculating the similarity between the target variable and the reference variable of each sample log in the first log set, and then N parsing rules corresponding to the target sample log are used as N parsing rules corresponding to the log to be accessed.
In detail, as one possible implementation manner, calculating the similarity between the target variable and the reference variable of each sample log in the first log set may include: determining a target variable sequence of a log to be accessed based on the arrangement order of the target variable in the log to be accessed, determining a reference variable sequence of each sample log based on the arrangement order of the reference variable of each sample log in the first log set, and taking the similarity between the target variable sequence and the reference variable sequence of each sample log as the similarity between the target variable and the reference variable of each sample log.
In more detail, as one possible implementation manner, the similarity between the target variable sequence and the reference variable sequence of each sample log is used as the similarity between the target variable and the reference variable of each sample log, and the specific steps may include: and respectively calculating the longest common subsequence between the target variable sequence and the reference variable sequence of each sample log, and then taking the length of the longest common subsequence obtained by the respective calculation as the similarity between the target variable and the reference variable of each sample log.
Further, based on the similarity between the target variable and the reference variable of each sample log, selecting all sample logs corresponding to the similarity larger than a preset threshold as target sample logs, and taking the analysis rule corresponding to the target sample logs as the analysis rule corresponding to the logs to be accessed.
In view of this, first log set of similar processing is initially determined through length matching, then the similarity between the target variable and the reference variable of each sample log is obtained through calculating the longest public subsequence, then the similarity is compared with a preset threshold value, and the analysis rules corresponding to the optimal target sample log are obtained according to the comparison result, so that the association relation between the analysis rules and the log to be accessed is automatically established, the effect of mapping association between the intelligent association sample log and the log analysis rules is achieved, and the problem of low efficiency caused by manual participation in the related technology is solved.
In addition, as the calculation result of the similarity has more influence factors, for example, different device logs are limited by different log formats, log contents and log lengths, and the matching similarity will also have differences. For another example, the same device log may also have differences in matching similarity due to different log types and log lengths. Therefore, the setting of the preset threshold regarding the similarity will directly affect the accuracy of the finally matched parsing rule, and in order to improve the accuracy, a mechanism for training the preset threshold is proposed herein, and the mechanism for setting the preset threshold is described in the following supplementary explanation.
As a supplementary illustration, the preset threshold may support one of the following two approaches and combinations thereof. In one mode, one parsing rule has a preset threshold, that is, one parsing rule can correspond to a plurality of sample logs, that is, the preset threshold of the parsing rule can be obtained according to the similarity calculated by the plurality of sample logs. In the second mode, a preset threshold is set integrally, that is, each analysis rule corresponds to only one sample log, and then a public integral threshold can be trained for the analysis rule and used as the preset threshold. In other words, one parsing rule is corresponding to a same kind of parsing rule, and the same kind of parsing rule may correspond to a plurality of sample logs, that is, one threshold may be specified for each sample, or one threshold may be specified uniformly for all sample logs corresponding to the parsing rule.
For example, as shown in fig. 3, a schematic diagram of the combination of the first and second modes is shown, wherein the analysis rule 1 corresponds to the sample log 1, the sample log 2, and the sample log 3, and the threshold 1 of the analysis rule 1 can be obtained for the sample log 1, the sample log 2, and the sample log 3; the threshold 2 of the analysis rule 2 and the threshold 3 of the analysis rule 3 are similarly available, and repeated description is omitted. Furthermore, a common overall threshold may be trained for them.
It should be noted that the source of the preset threshold may be obtained through training, and the training threshold is related to the rule base of the log construction or the system, and is calculated by performing multidimensional (for example, the number of fields that each parsing rule can parse, and calculating the average value) evaluation on all rules in the rule base.
Optionally, in some embodiments, the first log set corresponding to the log to be accessed cannot be determined, that is, no target length corresponding to the log set matches the target length of the log to be accessed, at this time, a special partition not hit may be allocated to the log to be accessed, and a corresponding prompt is fed back to the user to be reminded. Of course, further, the missed special partition may also correspond to a corresponding parsing rule, where the parsing rule corresponding to the missed special partition may be directly used as the parsing rule corresponding to the log to be accessed.
Step 203: respectively analyzing the log to be accessed by adopting N analysis rules to obtain N analysis results;
one analysis result comprises a plurality of analysis fields, and if the analysis result comprises more analysis fields, the analysis accuracy of the analysis rule corresponding to the analysis result is proved to be higher.
Step 204: and in the N analysis results, taking the analysis rule corresponding to the analysis result containing the most analysis fields as a target analysis rule corresponding to the log to be accessed.
Further, the log to be accessed is automatically associated with the target analysis rule, so that the technical effect of intelligently associating the target analysis rule is achieved.
Based on the same inventive concept, the present application further provides an apparatus for determining an parsing rule, for referring to fig. 4, the apparatus includes:
the target length determining module 401 determines the number of target variables of the log to be accessed, and takes the number of the target variables as the target length of the log to be accessed; the target variable characterizes the field type in the log to be accessed;
a first analysis rule determining module 402, configured to determine N analysis rules corresponding to the log to be accessed according to the target length and the access type of the log to be accessed; wherein N is a positive integer greater than or equal to 1;
The analysis result obtaining module 403 is configured to analyze the log to be accessed by using the N analysis rules, so as to obtain N analysis results; wherein, one analysis result comprises a plurality of analysis fields;
and a second analysis rule determining module 404, configured to use, among the N analysis results, an analysis rule corresponding to the analysis result including the most analysis fields as the target analysis rule corresponding to the log to be accessed.
As a possible implementation manner, the determining N parsing rules corresponding to the log to be accessed according to the target length and the access type of the log to be accessed, the first determining parsing rule module 402 is specifically configured to: determining a first log set corresponding to the log to be accessed according to the target length and the access type of the log to be accessed; wherein the first set of journals comprises at least one sample journal; calculating the similarity between the target variable and the reference variable of each sample log in the first log set; selecting all sample logs corresponding to the similarity larger than a preset threshold as target sample logs; and taking N analysis rules corresponding to all the target sample logs as N analysis rules corresponding to the logs to be accessed.
As a possible implementation manner, the determining, according to the target length and the access type of the log to be accessed, a first log set corresponding to the log to be accessed, and the first determining parsing rule module 402 are specifically configured to: determining a second set of logs based on the target length; wherein a reference length of a sample log in the second log set is the same as the target length; and in the second log set, taking the sample log with the same access type as the log to be accessed as a first log set.
As a possible implementation manner, the calculating the similarity between the target variable and the reference variable of each sample log in the first log set, the first determining parsing rule module 402 is specifically configured to: determining a target variable sequence of the log to be accessed based on the arrangement order of the target variables in the log to be accessed; determining, in the first log set, a sequence of reference variables for each sample log based on an order in which the reference variables for each sample log are arranged in each sample log; and taking the similarity between the target variable sequence and the reference variable sequence of each sample log as the similarity between the target variable and the reference variable of each sample log.
As a possible implementation manner, the similarity between the target variable sequence and the reference variable sequence of each sample log is used as the similarity between the target variable and the reference variable of each sample log, and the first determining parsing rule module 402 is specifically configured to: respectively calculating the longest common subsequence between the target variable sequence and the reference variable sequence of each sample log; and taking the length of the longest common subsequence obtained by calculation as the similarity between the target variable and the reference variable of each sample log.
Based on the same inventive concept, the embodiment of the present application further provides an electronic device, where the electronic device may implement the function of the foregoing apparatus for determining an parsing rule, and referring to fig. 5, the electronic device includes:
the embodiment of the present application does not limit the specific connection medium between the processor 501 and the memory 502, but the connection between the processor 501 and the memory 502 through the bus 500 is exemplified in fig. 5. The connection between the other components of bus 500 is shown in bold lines in fig. 5, and is merely illustrative and not limiting. Bus 500 may be divided into an address bus, a data bus, a control bus, etc., and is represented by only one thick line in fig. 5 for ease of illustration, but does not represent only one bus or one type of bus. Alternatively, the processor 501 may be referred to as a controller, and the names are not limited.
In the embodiment of the present application, the memory 502 stores instructions executable by the at least one processor 501, and the at least one processor 501 may execute the method for determining parsing rules as discussed above by executing the instructions stored in the memory 502. The processor 501 may implement the functions of the various modules in the apparatus/system shown in fig. 4.
Wherein the processor 501 is the control center of the device/system, various interfaces and lines may be utilized to connect various portions of the overall control device, and the device/system may be monitored in its entirety by executing or executing instructions stored in the memory 502 and invoking data stored in the memory 502, the various functions of the device/system, and processing the data.
In one possible design, processor 501 may include one or more processing units, and processor 501 may integrate an application processor and a modem processor, where the application processor primarily processes operating systems, user interfaces, application programs, and the like, and the modem processor primarily processes wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 501. In some embodiments, processor 501 and memory 502 may be implemented on the same chip, or they may be implemented separately on separate chips in some embodiments.
The processor 501 may be a general purpose processor such as a Central Processing Unit (CPU), digital signal processor, application specific integrated circuit, field programmable gate array or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, and may implement or perform the methods, steps and logic blocks disclosed in embodiments of the present application. The general purpose processor may be a microprocessor or any conventional processor or the like. The steps of the method for determining parsing rules disclosed in connection with the embodiments of the present application may be directly embodied as a hardware processor executing, or may be executed by a combination of hardware and software modules in the processor.
The memory 502, as a non-volatile computer readable storage medium, may be used to store non-volatile software programs, non-volatile computer executable programs, and modules. The Memory 502 may include at least one type of storage medium, and may include, for example, flash Memory, hard disk, multimedia card, card Memory, random access Memory (Random Access Memory, RAM), static random access Memory (Static Random Access Memory, SRAM), programmable Read-Only Memory (Programmable Read Only Memory, PROM), read-Only Memory (ROM), charged erasable programmable Read-Only Memory (Electrically Erasable Programmable Read-Only Memory), magnetic Memory, magnetic disk, optical disk, and the like. Memory 502 is any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, but is not limited to such. The memory 502 in the present embodiment may also be circuitry or any other device/system capable of implementing a memory function for storing program instructions and/or data.
By programming the processor 501, the code corresponding to the method for determining the parsing rules described in the foregoing embodiment may be cured into the chip, so that the chip can execute the steps of the method for determining the parsing rules of the embodiment shown in fig. 2 at runtime. How to design and program the processor 501 is a technique well known to those skilled in the art, and will not be described in detail herein.
Based on the same inventive concept, the embodiments of the present application also provide a storage medium storing computer instructions that, when executed on a computer, cause the computer to perform the above-described method of determining parsing rules.
In some possible embodiments, aspects of the determination and analysis rule method provided herein may also be implemented in the form of a program product comprising program code for causing a control apparatus to carry out the steps of the determination and analysis rule method according to the various exemplary embodiments of the present application as described herein above when the program product is run on a device.
It will be apparent to those skilled in the art that embodiments of the present application may be provided as a method, apparatus/system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It will be apparent to those skilled in the art that various modifications and variations can be made in the present application without departing from the spirit or scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims and the equivalents thereof, the present application is intended to cover such modifications and variations.

Claims (10)

1. A method of determining parsing rules, the method comprising:
determining the number of target variables of a log to be accessed, and taking the number of the target variables as the target length of the log to be accessed; the target variable characterizes the field type in the log to be accessed;
according to the target length and the access type of the log to be accessed, N analysis rules corresponding to the log to be accessed are determined; wherein N is a positive integer greater than or equal to 1;
Respectively analyzing the log to be accessed by adopting the N analysis rules to obtain N analysis results; wherein, one analysis result comprises a plurality of analysis fields;
and in the N analysis results, taking the analysis rule corresponding to the analysis result containing the most analysis fields as the target analysis rule corresponding to the log to be accessed.
2. The method of claim 1, wherein the determining N parsing rules corresponding to the log to be accessed according to the target length and the access type of the log to be accessed includes:
determining a first log set corresponding to the log to be accessed according to the target length and the access type of the log to be accessed; wherein the first set of journals comprises at least one sample journal;
calculating the similarity between the target variable and the reference variable of each sample log in the first log set;
selecting all sample logs corresponding to the similarity larger than a preset threshold as target sample logs;
and taking N analysis rules corresponding to all the target sample logs as N analysis rules corresponding to the logs to be accessed.
3. The method of claim 2, wherein the determining the first set of logs corresponding to the log to be accessed according to the target length and the access type of the log to be accessed comprises:
Determining a second set of logs based on the target length; wherein a reference length of a sample log in the second log set is the same as the target length;
and in the second log set, taking the sample log with the same access type as the log to be accessed as a first log set.
4. The method of claim 2, wherein the calculating the similarity between the target variable and the reference variable for each sample log in the first set of journals comprises:
determining a target variable sequence of the log to be accessed based on the arrangement order of the target variables in the log to be accessed;
determining, in the first log set, a sequence of reference variables for each sample log based on an order in which the reference variables for each sample log are arranged in each sample log;
and taking the similarity between the target variable sequence and the reference variable sequence of each sample log as the similarity between the target variable and the reference variable of each sample log.
5. The method of claim 4, wherein said taking the similarity between the target variable sequence and the reference variable sequence of the respective sample logs as the similarity between the target variable and the reference variable of the respective sample logs comprises:
Respectively calculating the longest common subsequence between the target variable sequence and the reference variable sequence of each sample log;
and taking the length of the longest common subsequence obtained by calculation as the similarity between the target variable and the reference variable of each sample log.
6. An apparatus for determining parsing rules, the apparatus comprising:
the method comprises the steps of determining a target length module, determining the number of target variables of a log to be accessed, and taking the number of the target variables as the target length of the log to be accessed; the target variable characterizes the field type in the log to be accessed;
the first analysis rule determining module determines N analysis rules corresponding to the log to be accessed according to the target length and the access type of the log to be accessed; wherein N is a positive integer greater than or equal to 1;
the analysis result module is used for respectively analyzing the logs to be accessed by adopting the N analysis rules to obtain N analysis results; wherein, one analysis result comprises a plurality of analysis fields;
and the second analysis rule determining module is used for taking the analysis rule corresponding to the analysis result containing the most analysis fields in the N analysis results as the target analysis rule corresponding to the log to be accessed.
7. The apparatus of claim 6, wherein the first determination parsing rule module is specifically configured to: determining a first log set corresponding to the log to be accessed according to the target length and the access type of the log to be accessed; wherein the first set of journals comprises at least one sample journal; calculating the similarity between the target variable and the reference variable of each sample log in the first log set; selecting all sample logs corresponding to the similarity larger than a preset threshold as target sample logs; and taking N analysis rules corresponding to all the target sample logs as N analysis rules corresponding to the logs to be accessed.
8. The apparatus of claim 6, wherein the first determination parsing rule module is specifically configured to: determining a target variable sequence of the log to be accessed based on the arrangement order of the target variables in the log to be accessed; determining, in the first log set, a sequence of reference variables for each sample log based on an order in which the reference variables for each sample log are arranged in each sample log; and taking the similarity between the target variable sequence and the reference variable sequence of each sample log as the similarity between the target variable and the reference variable of each sample log.
9. An electronic device, comprising:
a memory for storing a computer program;
a processor for carrying out the method steps of any one of claims 1-5 when executing a computer program stored on said memory.
10. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored therein a computer program which, when executed by a processor, implements the method steps of any of claims 1-5.
CN202211714250.2A 2022-12-29 2022-12-29 Method and device for determining analysis rule and electronic equipment Pending CN116010499A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211714250.2A CN116010499A (en) 2022-12-29 2022-12-29 Method and device for determining analysis rule and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211714250.2A CN116010499A (en) 2022-12-29 2022-12-29 Method and device for determining analysis rule and electronic equipment

Publications (1)

Publication Number Publication Date
CN116010499A true CN116010499A (en) 2023-04-25

Family

ID=86029401

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211714250.2A Pending CN116010499A (en) 2022-12-29 2022-12-29 Method and device for determining analysis rule and electronic equipment

Country Status (1)

Country Link
CN (1) CN116010499A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116628451A (en) * 2023-05-31 2023-08-22 江苏华存电子科技有限公司 High-speed analysis method for information to be processed

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116628451A (en) * 2023-05-31 2023-08-22 江苏华存电子科技有限公司 High-speed analysis method for information to be processed
CN116628451B (en) * 2023-05-31 2023-11-14 江苏华存电子科技有限公司 High-speed analysis method for information to be processed

Similar Documents

Publication Publication Date Title
US10296307B2 (en) Method and system for template extraction based on source code similarity
US8626786B2 (en) Dynamic language checking
US8676965B2 (en) Tracking high-level network transactions
CN111240688B (en) excel file analysis method and device, computer equipment and storage medium
US12095630B2 (en) Agreement to service policy translation system
CN112181430A (en) Code change statistical method and device, electronic equipment and storage medium
CN116010499A (en) Method and device for determining analysis rule and electronic equipment
CN113704790A (en) Abnormal log information summarizing method and computer equipment
CN114598597B (en) Multisource log analysis method, multisource log analysis device, computer equipment and medium
CN117435480A (en) Binary file detection method and device, electronic equipment and storage medium
CN108733543A (en) A kind of method, apparatus of log analysis, electronic equipment and readable storage medium storing program for executing
CN112350890B (en) Message processing method, device, server and storage medium
CN111046393B (en) Vulnerability information uploading method and device, terminal equipment and storage medium
CN112883088B (en) Data processing method, device, equipment and storage medium
CN112130862A (en) Package file generation method, device, equipment and computer readable storage medium
CN116866241A (en) Internet of things terminal detection method, system and storage medium based on DPI
CN111178421A (en) Method, device, medium and electronic equipment for detecting user state
CN115809466A (en) Security requirement generation method and device based on STRIDE model, electronic equipment and medium
CN115328734A (en) Cross-service log processing method and device and server
CN111144086B (en) Log formatting method and device, electronic equipment and storage medium
US20240202824A1 (en) Smart contract security auditing
CN111913700B (en) Cloud-end interaction protocol analysis method, device, equipment and storage medium
US11934840B2 (en) Classification of hardware components
CN114003317B (en) Inline implementation method and device, electronic equipment, storage medium and program product
CN112434242B (en) Statistical method, device, server and storage medium of application program downloading channel

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination