CN110321410A - Method, apparatus, storage medium and the electronic equipment that log is extracted - Google Patents

Method, apparatus, storage medium and the electronic equipment that log is extracted Download PDF

Info

Publication number
CN110321410A
CN110321410A CN201910544248.7A CN201910544248A CN110321410A CN 110321410 A CN110321410 A CN 110321410A CN 201910544248 A CN201910544248 A CN 201910544248A CN 110321410 A CN110321410 A CN 110321410A
Authority
CN
China
Prior art keywords
log
event
template
target journaling
extracted
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910544248.7A
Other languages
Chinese (zh)
Other versions
CN110321410B (en
Inventor
李琛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Neusoft Corp
Original Assignee
Neusoft Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Neusoft Corp filed Critical Neusoft Corp
Priority to CN201910544248.7A priority Critical patent/CN110321410B/en
Publication of CN110321410A publication Critical patent/CN110321410A/en
Application granted granted Critical
Publication of CN110321410B publication Critical patent/CN110321410B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

This disclosure relates to method, apparatus, storage medium and electronic equipment that a kind of log is extracted, it is related to technical field of data processing, this method comprises: determining sample log from log to be extracted, sample log includes multiple log events, target journaling event is extracted from multiple log events, the determining target journaling template with target journaling event matches in preset log template set, include at least one log template in log template set, contents extraction is carried out to log to be extracted according to target journaling template.Without for different logs labyrinth design specialized decimation rule, reduce the workload of developer, can the log to various structures carry out contents extraction automatically, reduce extraction complexity, improve extraction efficiency and the scope of application.

Description

Method, apparatus, storage medium and the electronic equipment that log is extracted
Technical field
This disclosure relates to technical field of data processing, and in particular, to a kind of method, apparatus that log is extracted, storage are situated between Matter and electronic equipment.
Background technique
With the continuous development of information technology, more and more business start to realize by information-based means.It is a variety of Business is run in multiple platforms (or system) in maintenance process, can generate the log comprising different log events.Due to Operation system scale is continuously increased, and the scale and format of corresponding log are log-structured more and more multiple there has also been many variations It is miscellaneous, there is the scene of a variety of log mixing.
In the prior art, for the consistent log of row text structure, need to preset dedicated extracting rule to log It extracts, flexibility is poor, and the scope of application is small.For the inconsistent log of row text structure, need dedicated to sentence by writing Disconnected logic and processing logic (such as: regular expression, rule, shell script etc.) log is extracted, the complexity of realization Height, and developer is needed to develop, efficiency and accuracy are lower.And for text structure is excessively complicated or structure Unknown log can not be extracted effectively.
Summary of the invention
Purpose of this disclosure is to provide method, apparatus, storage medium and electronic equipments that a kind of log is extracted, to solve It is existing in the prior art to be difficult to the problem of extracting to the log of text structure complexity.
To achieve the goals above, according to the first aspect of the embodiments of the present disclosure, a kind of method that log is extracted, institute are provided The method of stating includes:
Sample log is determined from log to be extracted, the sample log includes multiple log events;
Target journaling event is extracted from multiple log events;
The determining target journaling template with the target journaling event matches, the log in preset log template set It include at least one log template in template set;
Contents extraction is carried out to the log to be extracted according to the target journaling template.
It is optionally, described that target journaling event is extracted from multiple log events, comprising:
For each log event, determine in the log event and multiple log events except the log event it The diversity factor of outer each log event, and according in the log event and multiple log events in addition to the log event Each log event diversity factor, determine the difference characteristic value of the log event;
According to the difference characteristic value of each log event, the corresponding event extraction parameter of the sample log is determined;
According to the event extraction parameter, the target journaling event is extracted from multiple log events.
Optionally, the diversity factor includes at least one of Difference of content, difference in length degree, format differences degree.
Optionally, described according to the event extraction parameter, the target journaling is extracted from multiple log events Event, comprising:
Using the event extraction parameter as the random coefficient of stochastic selection algorithm, by the stochastic selection algorithm from more The target journaling event is extracted in a log event.
Optionally, the target journaling mould with the target journaling event matches determining in preset log template set Plate, comprising:
For each target journaling event, each log mould in the target journaling event and the log template set is determined The matching degree of plate;
Using the maximum log template of matching degree as the target journaling template with the target journaling event matches.
Optionally, the target journaling template is multiple;It is described according to the target journaling template to the day to be extracted Will carries out contents extraction, comprising:
For each log event for including in the log to be extracted, the log event and multiple target days are determined The matching degree of each target journaling template in will template, and according to the maximum target journaling template of matching degree to the day Will event carries out contents extraction.
Optionally, the target journaling template is multiple;It is described according to the target journaling template to the day to be extracted Will carries out contents extraction, comprising:
For each log event for including in the log to be extracted, the log event and multiple target days are determined The matching degree of each target journaling template in will template, if the maximum value in matching degree is greater than or equal to matching degree threshold value, Contents extraction then is carried out to the log event according to the matching degree maximum target journaling template.
According to the second aspect of an embodiment of the present disclosure, a kind of device that log is extracted is provided, described device includes:
Sample determining module, for determining sample log from log to be extracted, the sample log includes multiple logs Event;
Abstraction module, for extracting target journaling event from multiple log events;
Template determining module, for the target with the target journaling event matches determining in preset log template set Log template includes at least one log template in the log template set;
Extraction module, for carrying out contents extraction to the log to be extracted according to the target journaling template.
Optionally, the abstraction module includes:
It determines submodule, for being directed to each log event, determines the log event and multiple log events In each log event in addition to the log event diversity factor, and according in the log event and multiple log events The diversity factor of each log event in addition to the log event determines the difference characteristic value of the log event;
The determining submodule is also used to the difference characteristic value according to each log event, determines the sample day The corresponding event extraction parameter of will;
Submodule is extracted, for extracting the target from multiple log events according to the event extraction parameter Log event.
Optionally, the diversity factor includes at least one of Difference of content, difference in length degree, format differences degree.
Optionally, the extraction submodule is used for:
Using the event extraction parameter as the random coefficient of stochastic selection algorithm, by the stochastic selection algorithm from more The target journaling event is extracted in a log event.
Optionally, the template determining module is used for:
For each target journaling event, each log mould in the target journaling event and the log template set is determined The matching degree of plate;
Using the maximum log template of matching degree as the target journaling template with the target journaling event matches.
Optionally, the target journaling template is multiple, and the extraction module is used for:
For each log event for including in the log to be extracted, the log event and multiple target days are determined The matching degree of each target journaling template in will template, and according to the maximum target journaling template of matching degree to the day Will event carries out contents extraction.
Optionally, the target journaling template is multiple, and the extraction module is used for:
For each log event for including in the log to be extracted, the log event and multiple target days are determined The matching degree of each target journaling template in will template, if the maximum value in matching degree is greater than or equal to matching degree threshold value, Contents extraction then is carried out to the log event according to the matching degree maximum target journaling template.
According to the third aspect of an embodiment of the present disclosure, a kind of computer readable storage medium is provided, calculating is stored thereon with The step of machine program, the method that the log that realization first aspect provides when which is executed by processor is extracted.
According to a fourth aspect of embodiments of the present disclosure, a kind of electronic equipment is provided, comprising:
Memory is stored thereon with computer program;
Processor, for executing the computer program in the memory, to realize the log of first aspect offer The step of method of extraction.
Through the above technical solutions, disclosure determination first from log to be extracted includes the sample of multiple log events Log, later from multiple log events extract target journaling event, then it is preset include at least one log template Log template set in, the determining target journaling templates with target journaling event matches are finally treated according to target journaling template It extracts log and carries out contents extraction.It, can be to various without the decimation rule of the labyrinth design specialized for different logs The log of structure carries out contents extraction automatically, reduces extraction complexity, improves extraction efficiency and the scope of application.
Other feature and advantage of the disclosure will the following detailed description will be given in the detailed implementation section.
Detailed description of the invention
Attached drawing is and to constitute part of specification for providing further understanding of the disclosure, with following tool Body embodiment is used to explain the disclosure together, but does not constitute the limitation to the disclosure.In the accompanying drawings:
Fig. 1 is the flow chart for the method that a kind of log shown according to an exemplary embodiment is extracted;
Fig. 2 is the flow chart for the method that another log shown according to an exemplary embodiment is extracted;
Fig. 3 is the block diagram for the device that a kind of log shown according to an exemplary embodiment is extracted;
Fig. 4 is the block diagram for the device that another log shown according to an exemplary embodiment is extracted;
Fig. 5 is the block diagram of a kind of electronic equipment shown according to an exemplary embodiment.
Specific embodiment
Example embodiments are described in detail here, and the example is illustrated in the accompanying drawings.Following description is related to When attached drawing, unless otherwise indicated, the same numbers in different drawings indicate the same or similar elements.Following exemplary embodiment Described in embodiment do not represent all implementations consistent with this disclosure.On the contrary, they be only with it is such as appended The example of the consistent device and method of some aspects be described in detail in claims, the disclosure.
Before method, apparatus, storage medium and the electronic equipment that the log for introducing disclosure offer is extracted, first to this Application scenarios involved by each embodiment are disclosed to be introduced.The application scenarios are to carry out contents extraction to log.Wherein, log Source can there are many, can be what kinds of platform was generated when executing multiple business, the log exported by unified port, It include various types of multiple log events in log, it can be understood as a record in log, i.e., it can be only in log An individual for vertical description once-through operation or an implementing result, is the basic unit of contents extraction.Log event for example may be used To include: log event of fire wall, interchanger log event, system execution journal event, business diary event, user's operation day Will event, database journal event etc..For various types of log events, log template set, log can be previously provided with Include the corresponding log template of each type of log event in template set, in the log event for extracting the type in Hold.Since the scale of operation system gradually increases, the type of log event is very much (several hundred kinds), therefore corresponding log template set In also contain a large amount of log template (such as: JSON template, XML template, date template, KV template, CSV template etc.), such as Fruit successively matches each log event in log to be extracted with each log template in log template set, calculation amount All very high with complexity, the efficiency of contents extraction is too low, is difficult to carry out practical application.
Fig. 1 is the flow chart for the method that a kind of log shown according to an exemplary embodiment is extracted, as shown in Figure 1, should Method includes:
Step 101, sample log is determined from log to be extracted, sample log includes multiple log events.
For example, due to containing a large amount of log event in log to be extracted, and it is difficult to predefine a large amount of It include the log event of which type in log event, it accordingly also can not be from log template a large amount of in log template set Selection is suitble to the log template of log to be extracted, carries out contents extraction to log to be extracted.It therefore, can be first from day to be extracted Determination includes the sample log of multiple log events in will.The quantity for the log event for including in sample log will be far smaller than The quantity for the log event for including in log to be extracted, therefore in subsequent processing, effectively reduce calculation amount and complexity.
In the present embodiment, can determine sample log according to pre-set rule, for example, according to time range into Row selection, or random selection.By taking the time range of log to be extracted is one month as an example, then can choose wherein 24 hours Inside include multiple log events composition sample log, can also be randomly choosed in log to be extracted preset percentage (such as: 10%) multiple log events form sample log.
Step 102, target journaling event is extracted from multiple log events.
It is exemplary, it, can be from sample log since the quantity for the log event for including in sample log is still more In extract the target journaling event for being able to reflect log to be extracted, to be further reduced the number of log event to be treated Amount, is effectively reduced the calculation amount and complexity of subsequent processing.For example, analyzing the difference of each log event in sample log first Characteristic value, difference characteristic value can be understood as uniqueness of the log event in sample log, that is, reflect the log event with The difference size between other log events in sample log.Come later further according to the difference characteristic value of each log event true The event extraction parameter of this log of random sample, event extraction parameter can be understood as the energy that sample log is able to reflect log to be extracted Power size, it is understood that contain the log event of which type in reflected sample log.Finally according to sample log Event extraction parameter extracts target journaling event from multiple log events, and target journaling event can be represented wait mention to maximum probability Take the log event in log.Wherein, target journaling event can be one or more, the quantity of target journaling event compared to The quantity for the log event for including in sample log will substantially reduce.
Step 103, the determining target journaling template with target journaling event matches, log in preset log template set It include at least one log template in template set.
Further, after determining target journaling event, at least one log mould that preset log template set includes In plate, the target journaling template of selection and target journaling event matches.Wherein, target journaling template can be to target journaling event Carry out correctly contents extraction.The corresponding target journaling template of one target journaling event, while may have multiple target days Will event corresponds to the same log template, therefore the quantity of target journaling template is less than or equal to the quantity of target journaling event. The log event in log to be extracted, corresponding target journaling template can be represented to maximum probability due to target journaling event It can be adapted to maximum probability log to be extracted.
Step 104, contents extraction is carried out to log to be extracted according to target journaling template.
Finally, according to the target journaling template determined in step 103, successively to each log event in log to be extracted Carry out contents extraction.If target journaling template only one (can be understood as log to be extracted at this time be row text structure it is consistent Log, the type of each of these log event is identical), then each log event in log to be extracted, all uses mesh It marks log template and carries out contents extraction.If target journaling template is multiple, then for each log thing in log to be extracted Part successively determines the matching degree of the log event Yu multiple target journaling templates, and according to the maximum target journaling mould of matching degree Plate carries out contents extraction to the log event.
In conclusion disclosure determination first from log to be extracted includes the sample log of multiple log events, it Afterwards from multiple log events extract target journaling event, then it is preset include the log mould of at least one log template Plate is concentrated, the determining target journaling template with target journaling event matches, finally according to target journaling template to log to be extracted Carry out contents extraction.Without the decimation rule of the labyrinth design specialized for different logs, reduce the work of developer Measure, can the log to various structures carry out contents extraction automatically, reduce extraction complexity, improve extraction efficiency and suitable Use range.
Fig. 2 is the flow chart for the method that another log shown according to an exemplary embodiment is extracted, as shown in Fig. 2, Step 102 can be realized by following steps:
Step 1021, it for each log event, determines in the log event and multiple log events except the log event Except each log event diversity factor, and according in the log event and multiple log events in addition to the log event The diversity factor of each log event determines the difference characteristic value of the log event.
For example, the difference characteristic value that each log event uniqueness is able to reflect in sample log can be first determined, Wherein, difference characteristic value can be according to each log thing in the log event and multiple log events in addition to the log event The diversity factor of part determines.For example, can by the log event with it is each in addition to the log event in multiple log events The diversity factor of log event is summed, and using summed result as the diversity factor characteristic value of the log event.Two log events Diversity factor may include Difference of content CS, difference in length degree CL, format differences degree CLAt least one of.So log thing The diversity factor of each log event in part and multiple log events in addition to the log event can sum for a variety of diversity factoies, Such as: CS+CL+CL
Wherein, CSIt, can be according to preset character for reflecting difference size of two log events on content of text String matching algorithm compares the character in two log events successively to obtain, it can be understood as will be in the text of two log events Hold and successively matched according to preset sequence (such as from left to right), so that it is determined that of the content of text of two log events With value, to characterize the similarity degree of two log events, matching value is bigger, corresponding CSValue it is lower, matching value is smaller, corresponding CSValue it is higher.Wherein, string matching algorithm for example may is that KMP (English: Knuth-Morris-Pratt Algorithm) algorithm, BF (English: Brute Force) algorithm or Horspool algorithm etc..
CLFor reflecting difference size of two log events on text size, in the present embodiment, CLIt may include two Part, a part are that the content of text of two log events includes the diversity factor C of number of charactersL1, illustratively, CL1Meter can be passed through Calculating the number of characters that two log events include asks absolute value of the difference to obtain, for example, the number of characters difference that two log events include It, can be using 30-25=5 as the C of two log events for 25 and 30L1Value.Another part is that two log events are pressed After being divided into multiple character strings according to spcial character collection (such as: space, ", ", "-", " _ ", "/", tab etc.), character string includes Number of characters diversity factor CL2, illustratively, CL2It can be obtained by a under type: by two log events according to spcial character Collection is divided, and N number of character string and M character string are obtained, will be in the first character string and M character string in N number of character string First character string be compared, if comprising number of characters it is identical, comparison result is denoted as 0, if comprising number of characters Comparison result is denoted as 1 by difference, successively more N number of character string and M character string.The value that finally comparison result is summed As CL2Value.It should be noted that if N and M are unequal, the character string that the character string having more and length are 0 can be carried out Compare, comparison result is 1.
For example, log event A are as follows: " account: 11642205800/ trade date: 2017-06-19 ", log event B are as follows: " IP:209.160.24.63/ date: 2018-05-21 ".First two log events can be divided according to "/", then day Will event A is divided into two character strings: " account: 11642205800 ", the number of characters for including is 14, " trade date: 2017- 06-19 ", the number of characters for including are 15.Log event B is also divided into two character strings: " IP:209.160.24.63 " includes Number of characters be 16, " date: 2018-05-21 ", the number of characters for including be 13.So successively compare log event A and log thing Character string in part B, " account: 11642205800 " different from the number of characters that " IP:209.160.24.63 " includes, comparison result It is 1, " trade date: 2017-06-19 " is different from the number of characters that " date: 2018-05-21 " includes, comparison result 1, then The log event A and corresponding C of log event BL2It sums for two comparison results: 2.
In this way, obtaining CL1And CL2Afterwards, by CL1And CL2Value after being added, as CL
CMFor reflecting difference size of two log events on text formatting, such as can be first by two log events Respectively with preset format (time format, " [] " data format, " () " data format, JSON object, JSON array, XML lattice Formula etc.) it is matched, the matching value for characterizing matching degree is obtained, further according to of two log events and preset format The C of two log events is determined with valueM, for example, the matching value of two log events and the format of JSON object is respectively 80% and 60%, then two matching values can be asked to absolute value of the difference, i.e., 20% is used as CMValue.
Further, when calculating the diversity factor of two log events, C can be calculated separatelyS、CLAnd CLCorresponding number Value, since the calculation of different diversity factoies is different, correspondingly, the range of the corresponding numerical value of every species diversity degree obtained may also Therefore difference is determining CS、CLAnd CLIt later, can be first respectively to CS、CLAnd CLIt is weighted normalization, finally to three kinds Diversity factor summation, to obtain the diversity factor of two log events.
It should be noted that difference characteristic value is an opposite concept, a log event is described compared to sample Unique degree of other log events in this log, to illustrate in sample log comprising 20 log events, if wherein a log Event is type-A, and remaining 19 log events are B type, then each log event in a log event and 19 log events Diversity factor it is very big, then the corresponding difference characteristic value of a log event is also very high, and b log event in 19 log events, with Remaining 18 log events are much like, only larger with a log event difference, then the corresponding difference characteristic value of b log event compared with It is small.
Step 1022, according to the difference characteristic value of each log event, the corresponding event extraction parameter of sample log is determined.
Step 1023, according to event extraction parameter, target journaling event is extracted from multiple log events.
It is exemplary, the difference characteristic value of each log event can be summed, and summed result is normalized, it will be through Normalized summed result is crossed as the corresponding event extraction parameter of sample log, decimation in time parameter is bigger, can indicate sample The type for the log event for including in this log is more, needs the target journaling event of selection also more accordingly.
Wherein, in step 1023 extract target journaling event implementation may is that using event extraction parameter as with The random coefficient (can be understood as operator) of machine selection algorithm extracts target from multiple log events by stochastic selection algorithm Log event.For example, event extraction parameter is 7, then can be in sample day if in sample log including 100 log events Every 7 extractions, one log event as target journaling event in will, or using 7 as Pseudo-Random Number calculation Son generates a pseudo-random sequence, and target journaling event is extracted in sample log according to pseudo-random sequence.
Optionally, the specific implementation of step 103 can be with are as follows:
Step a) is directed to each target journaling event, determines each log in the target journaling event and log template set The matching degree of template.
Step b) is using the maximum log template of matching degree as the target journaling template with the target journaling event matches.
For example, for each target journaling event, the target journaling event and log template set can successively be calculated In each log template matching degree.Later, using the maximum log template of matching degree as with the target journaling event matches Target journaling template, the maximum log template of matching degree can be understood as being best suited for carrying out content to target journaling event mentioning The log template taken.
The method for calculating the matching degree of target journaling event and each log template, can be according to the log template to mesh Mark log event and carry out contents extraction, determine matching degree according to the number of characters successfully extracted, matching degree with successfully extract Number of characters it is directly proportional, i.e., the number of characters successfully extracted is more, and matching degree is higher, and the number of characters successfully extracted is fewer, It is lower with spending.
It should be noted that the corresponding log template of each target journaling event can be determined according to matching degree, and target The quantity of log event should be greater than or equal to target journaling template quantity.If each target journaling event corresponding one different Target journaling template, then the quantity of target journaling event is equal to the quantity of target journaling template, if depositing in target journaling event The same target journaling template is corresponded at least two target journaling events, then the quantity of target journaling event is greater than target day The quantity of will template.
In a kind of realization scene, target journaling template may include it is multiple, to log to be extracted in corresponding step 104 The implementation for carrying out contents extraction may include two kinds:
The first implementation: for each log event for including in log to be extracted, the log event and more is determined The matching degree of each target journaling template in a target journaling template, and according to the maximum target journaling template of matching degree to the day Will event carries out contents extraction.
Second of implementation: for each log event for including in log to be extracted, the log event and more is determined The matching degree of each target journaling template in a target journaling template, if the maximum value in matching degree is greater than or equal to matching degree threshold Value then carries out contents extraction to the log event according to the maximum target journaling template of matching degree.
10 target journaling templates to be determined in step 103, the first log event is any day in log to be extracted For will event.The matching degree for successively determining the first log event and 10 target journaling templates obtains corresponding 10 matchings Degree.Wherein, the method for calculating the matching degree of the first log event and each target journaling template, can be according to the target journaling Template carries out contents extraction to the first log event, and matching degree is determined according to the number of characters successfully extracted, is successfully extracted Number of characters it is more, indicate that the target journaling template and the matching degree of the first log event are higher, the number of characters successfully extracted It is fewer, indicate that the target journaling template is lower with the matching degree of the first log event.
In the first implementation, directly the first log event is carried out according to the maximum target journaling template of matching degree Contents extraction.In the second implementation, first the maximum value in matching degree can be compared with matching degree threshold value, if With matching degree threshold value is less than to maximum value in degree, then indicating each mesh in the first log event and 10 target journaling templates The matching degree for marking log template is not high, and it is lower to carry out contents extraction accuracy to the first log event with target journaling template. The matching degree that can so determine each log template in the first log and log template set again, according in log template set Contents extraction is carried out to the first log event with maximum log template is spent, is further ensured that the accuracy of contents extraction.If It is greater than or equal to matching degree threshold value with the maximum value in degree, then indicating that the first log event is looked in 10 target journaling templates Most matched target journaling template is arrived, then carrying out content to the log event according to the maximum target journaling template of matching degree It extracts.
Wherein, matching degree threshold value can be determination based on experience value, can also be determined according to target journaling template.Example Such as, 10 target journaling templates have been determined according to 15 target journaling events in step 103.Further, it can also record Each target journaling event matching degree with 10 target journaling templates respectively in 15 target journaling events.At step 104, The matching degree for first determining the first log event and 10 target journaling templates, obtains corresponding 10 matching degrees, wherein matching degree Maximum target journaling template is first object log template.So by 15 target journaling events and first object log template Matching degree in minimum value as matching degree threshold value.Matching degree threshold value can be understood as in multiple target journaling events with first Minimum value in the matching degree of target journaling template.
For example, contents extraction is carried out to the first log event in log to be extracted, the first log event is for example are as follows: [195.160.24.63-- 05/Jan/2015:18:22:16-0800] " GET/product.screen? productId=WC- SH-A02&JSESSIONID=SD0SL6FF7ADFF4953 HTTP 1.1 " 200 3878 " http: // www.google.com""Mozilla/5.0(Windows NT 6.1;WOW64)AppleWebKit/536.5(KHTML,like Gecko) Chrome/19.0.1084.46 Safari/536.5 " 349, wherein containing multiple fields.It is corresponding, target journaling Template includes template X:agent field, auth field, ident field, referrer field, bytes field, response word Section, clientip field, rawrequest field, timestamp field and template Y:timestamp field ,@version word Section, clientip field, user's id field, flowing water id field.Template X and template Y and the first log event are so determined respectively Matching degree, such as the format for the field for including in template X and template Y can successively be compared with the first log event, really The matching degree of solid plate X is maximum, then carrying out contents extraction to the first log event according to template X, obtains the following contents:
Agent: " Mozilla/5.0 (Windows NT 6.1;WOW64)AppleWebKit/536.5(KHTML,like Gecko)Chrome/19.0.1084.46 Safari/536.5"
Auth:-
Ident:-
Referrer: " http://www.google.com "
Bytes:3878
Response:200
Clientip:195.160.24.63
Rawrequest: " GET/product.screen? productId=WC-SH-A02&JSESSIONID= SD0SL6FF7ADFF4953 HTTP 1.1"
Timestamp:05/Jan/2015:18:22:16-0800
In conclusion disclosure determination first from log to be extracted includes the sample log of multiple log events, it Afterwards from multiple log events extract target journaling event, then it is preset include the log mould of at least one log template Plate is concentrated, the determining target journaling template with target journaling event matches, finally according to target journaling template to log to be extracted Carry out contents extraction.Without the decimation rule of the labyrinth design specialized for different logs, reduce the work of developer Measure, can the log to various structures carry out contents extraction automatically, reduce extraction complexity, improve extraction efficiency and suitable Use range.
Fig. 3 is the block diagram for the device that a kind of log shown according to an exemplary embodiment is extracted, as shown in figure 3, the dress Setting 200 includes:
Sample determining module 201, for determining sample log from log to be extracted, sample log includes multiple log things Part.
Abstraction module 202, for extracting target journaling event from multiple log events.
Template determining module 203, for the target with target journaling event matches determining in preset log template set Log template includes at least one log template in log template set.
Extraction module 204, for carrying out contents extraction to log to be extracted according to target journaling template.
Fig. 4 is the block diagram for the device that another log shown according to an exemplary embodiment is extracted, and abstraction module 202 wraps It includes:
It determines submodule 2021, for being directed to each log event, determines and removed in the log event and multiple log events The diversity factor of each log event except the log event, and according to removing the log in the log event and multiple log events The diversity factor of each log event except event determines the difference characteristic value of the log event.
Wherein, diversity factor includes at least one of Difference of content, difference in length degree, format differences degree.
It determines submodule 2021, is also used to the difference characteristic value according to each log event, determine that sample log is corresponding Event extraction parameter.
Submodule 2022 is extracted, for extracting target journaling event from multiple log events according to event extraction parameter.
Optionally, submodule 2022 is extracted to be used for:
Using event extraction parameter as the random coefficient of stochastic selection algorithm, by stochastic selection algorithm from multiple log things Target journaling event is extracted in part.
In another embodiment, template determining module 203 is for executing following steps:
Step a) is directed to each target journaling event, determines each log in the target journaling event and log template set The matching degree of template.
Step b) is using the maximum log template of matching degree as the target journaling template with the target journaling event matches.
It is multiple scenes for target journaling template, extraction module 204 is for executing following steps:
For each log event for including in log to be extracted, determine in the log event and multiple target journaling templates The matching degree of each target journaling template, and content is carried out to the log event according to matching degree maximum target journaling template and is mentioned It takes.
Or extraction module 204 is for executing following steps:
For each log event for including in log to be extracted, determine in the log event and multiple target journaling templates The matching degree of each target journaling template, if the maximum value in matching degree is greater than or equal to matching degree threshold value, according to matching degree Maximum target journaling template carries out contents extraction to the log event.
About the device in above-described embodiment, wherein modules execute the concrete mode of operation in related this method Embodiment in be described in detail, no detailed explanation will be given here.
In conclusion disclosure determination first from log to be extracted includes the sample log of multiple log events, it Afterwards from multiple log events extract target journaling event, then it is preset include the log mould of at least one log template Plate is concentrated, the determining target journaling template with target journaling event matches, finally according to target journaling template to log to be extracted Carry out contents extraction.Without the decimation rule of the labyrinth design specialized for different logs, reduce the work of developer Measure, can the log to various structures carry out contents extraction automatically, reduce extraction complexity, improve extraction efficiency and suitable Use range.
Fig. 5 is the block diagram of a kind of electronic equipment 300 shown according to an exemplary embodiment.As shown in figure 5, the electronics is set Standby 300 may include: processor 301, memory 302.The electronic equipment 300 can also include multimedia component 303, input/ Export one or more of (I/O) interface 304 and communication component 305.
Wherein, processor 301 is used to control the integrated operation of the electronic equipment 300, to complete above-mentioned log extraction All or part of the steps in method.Memory 302 is for storing various types of data to support in the electronic equipment 300 Operation, these data for example may include the instruction of any application or method for operating on the electronic equipment 300, And the relevant data of application program, such as contact data, the message of transmitting-receiving, picture, audio, video etc..The memory 302 can be realized by any kind of volatibility or non-volatile memory device or their combination, such as static random is deposited Access to memory (Static Random Access Memory, abbreviation SRAM), electrically erasable programmable read-only memory (Electrically Erasable Programmable Read-Only Memory, abbreviation EEPROM), erasable programmable Read-only memory (Erasable Programmable Read-Only Memory, abbreviation EPROM), programmable read only memory (Programmable Read-Only Memory, abbreviation PROM), and read-only memory (Read-Only Memory, referred to as ROM), magnetic memory, flash memory, disk or CD.Multimedia component 303 may include screen and audio component.Wherein Screen for example can be touch screen, and audio component is used for output and/or input audio signal.For example, audio component may include One microphone, microphone is for receiving external audio signal.The received audio signal can be further stored in storage Device 302 is sent by communication component 305.Audio component further includes at least one loudspeaker, is used for output audio signal.I/O Interface 304 provides interface between processor 301 and other interface modules, other above-mentioned interface modules can be keyboard, mouse, Button etc..These buttons can be virtual push button or entity button.Communication component 305 is for the electronic equipment 300 and other Wired or wireless communication is carried out between equipment.Wireless communication, such as Wi-Fi, bluetooth, near-field communication (Near Field Communication, abbreviation NFC), 2G, 3G or 4G or they one or more of combination, therefore corresponding communication Component 305 may include: Wi-Fi module, bluetooth module, NFC module.
In one exemplary embodiment, electronic equipment 300 can be by one or more application specific integrated circuit (Application Specific Integrated Circuit, abbreviation ASIC), digital signal processor (Digital Signal Processor, abbreviation DSP), digital signal processing appts (Digital Signal Processing Device, Abbreviation DSPD), programmable logic device (Programmable Logic Device, abbreviation PLD), field programmable gate array (Field Programmable Gate Array, abbreviation FPGA), controller, microcontroller, microprocessor or other electronics member Part realization, the method extracted for executing above-mentioned log.
In a further exemplary embodiment, a kind of computer readable storage medium including program instruction is additionally provided, it should The step of method that above-mentioned log is extracted is realized when program instruction is executed by processor.For example, the computer-readable storage medium Matter can be the above-mentioned memory 302 including program instruction, and above procedure instruction can be held by the processor 301 of electronic equipment 300 Row is to complete the method that above-mentioned log is extracted.
In conclusion disclosure determination first from log to be extracted includes the sample log of multiple log events, it Afterwards from multiple log events extract target journaling event, then it is preset include the log mould of at least one log template Plate is concentrated, the determining target journaling template with target journaling event matches, finally according to target journaling template to log to be extracted Carry out contents extraction.Without the decimation rule of the labyrinth design specialized for different logs, reduce the work of developer Measure, can the log to various structures carry out contents extraction automatically, reduce extraction complexity, improve extraction efficiency and suitable Use range.
The preferred embodiment of the disclosure is described in detail in conjunction with attached drawing above, still, the disclosure is not limited to above-mentioned reality The detail in mode is applied, in the range of the technology design of the disclosure, a variety of letters can be carried out to the technical solution of the disclosure Monotropic type, these simple variants belong to the protection scope of the disclosure.
It is further to note that specific technical features described in the above specific embodiments, in not lance In the case where shield, can be combined in any appropriate way, in order to avoid unnecessary repetition, the disclosure to it is various can No further explanation will be given for the combination of energy.
In addition, any combination can also be carried out between a variety of different embodiments of the disclosure, as long as it is without prejudice to originally Disclosed thought equally should be considered as disclosure disclosure of that.

Claims (10)

1. a kind of method that log is extracted, which is characterized in that the described method includes:
Sample log is determined from log to be extracted, the sample log includes multiple log events;
Target journaling event is extracted from multiple log events;
The determining target journaling template with the target journaling event matches, the log template in preset log template set Concentrating includes at least one log template;
Contents extraction is carried out to the log to be extracted according to the target journaling template.
2. the method according to claim 1, wherein described extract target journaling from multiple log events Event, comprising:
For each log event, determine in the log event and multiple log events in addition to the log event The diversity factor of each log event, and according to the log event with it is every in addition to the log event in multiple log events The diversity factor of one log event determines the difference characteristic value of the log event;
According to the difference characteristic value of each log event, the corresponding event extraction parameter of the sample log is determined;
According to the event extraction parameter, the target journaling event is extracted from multiple log events.
3. according to the method described in claim 2, it is characterized in that, the diversity factor include Difference of content, difference in length degree, At least one of format differences degree.
4. according to the method described in claim 2, it is characterized in that, described according to the event extraction parameter, from multiple described The target journaling event is extracted in log event, comprising:
Using the event extraction parameter as the random coefficient of stochastic selection algorithm, by the stochastic selection algorithm from multiple institutes It states and extracts the target journaling event in log event.
5. the method according to claim 1, wherein the determining and mesh in preset log template set Mark the matched target journaling template of log event, comprising:
For each target journaling event, each log template in the target journaling event and the log template set is determined Matching degree;
Using the maximum log template of matching degree as the target journaling template with the target journaling event matches.
6. the method according to claim 1, wherein the target journaling template is multiple;It is described according to described Target journaling template carries out contents extraction to the log to be extracted, comprising:
For each log event for including in the log to be extracted, the log event and multiple target journaling moulds are determined The matching degree of each target journaling template in plate, and according to the maximum target journaling template of matching degree to the log thing Part carries out contents extraction.
7. the method according to claim 1, wherein the target journaling template is multiple;It is described according to described Target journaling template carries out contents extraction to the log to be extracted, comprising:
For each log event for including in the log to be extracted, the log event and multiple target journaling moulds are determined The matching degree of each target journaling template in plate is pressed if the maximum value in matching degree is greater than or equal to matching degree threshold value Contents extraction is carried out to the log event according to the matching degree maximum target journaling template.
8. the device that a kind of log is extracted, which is characterized in that described device includes:
Sample determining module, for determining sample log from log to be extracted, the sample log includes multiple log events;
Abstraction module, for extracting target journaling event from multiple log events;
Template determining module, for the target journaling with the target journaling event matches determining in preset log template set Template includes at least one log template in the log template set;
Extraction module, for carrying out contents extraction to the log to be extracted according to the target journaling template.
9. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the program is held by processor The step of any one of claim 1-7 the method is realized when row.
10. a kind of electronic equipment characterized by comprising
Memory is stored thereon with computer program;
Processor, for executing the computer program in the memory, to realize described in any one of claim 1-7 The step of method.
CN201910544248.7A 2019-06-21 2019-06-21 Log extraction method and device, storage medium and electronic equipment Active CN110321410B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910544248.7A CN110321410B (en) 2019-06-21 2019-06-21 Log extraction method and device, storage medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910544248.7A CN110321410B (en) 2019-06-21 2019-06-21 Log extraction method and device, storage medium and electronic equipment

Publications (2)

Publication Number Publication Date
CN110321410A true CN110321410A (en) 2019-10-11
CN110321410B CN110321410B (en) 2021-08-06

Family

ID=68120028

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910544248.7A Active CN110321410B (en) 2019-06-21 2019-06-21 Log extraction method and device, storage medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN110321410B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111046012A (en) * 2019-12-02 2020-04-21 东软集团股份有限公司 Inspection log extraction method and device, storage medium and electronic equipment
CN111813849A (en) * 2020-09-14 2020-10-23 杭州数梦工场科技有限公司 Data extraction method, device and equipment and storage medium
CN112463772A (en) * 2021-02-02 2021-03-09 北京信安世纪科技股份有限公司 Log processing method and device, log server and storage medium
CN112882900A (en) * 2021-02-26 2021-06-01 山东浪潮通软信息科技有限公司 Method and device for recording service data change log

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040236984A1 (en) * 2003-05-20 2004-11-25 Yasuo Yamasaki Data backup method in a network storage system
US20070239799A1 (en) * 2006-03-29 2007-10-11 Anirudh Modi Analyzing log files
CN101625703A (en) * 2009-08-21 2010-01-13 华中科技大学 Method and system for merging logs of memory database
CN102984161A (en) * 2012-12-05 2013-03-20 北京奇虎科技有限公司 Identification method and device for reliable website
CN103414758A (en) * 2013-07-19 2013-11-27 北京奇虎科技有限公司 Method and device for processing logs
CN105049287A (en) * 2015-07-28 2015-11-11 小米科技有限责任公司 Log processing method and log processing devices
CN109510721A (en) * 2018-11-01 2019-03-22 郑州云海信息技术有限公司 A kind of network log management method and system based on Syslog

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040236984A1 (en) * 2003-05-20 2004-11-25 Yasuo Yamasaki Data backup method in a network storage system
US20070239799A1 (en) * 2006-03-29 2007-10-11 Anirudh Modi Analyzing log files
CN101625703A (en) * 2009-08-21 2010-01-13 华中科技大学 Method and system for merging logs of memory database
CN102984161A (en) * 2012-12-05 2013-03-20 北京奇虎科技有限公司 Identification method and device for reliable website
CN103414758A (en) * 2013-07-19 2013-11-27 北京奇虎科技有限公司 Method and device for processing logs
CN105049287A (en) * 2015-07-28 2015-11-11 小米科技有限责任公司 Log processing method and log processing devices
CN109510721A (en) * 2018-11-01 2019-03-22 郑州云海信息技术有限公司 A kind of network log management method and system based on Syslog

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
DIEGO CALVANESE 等: "Ontology-driven extraction of event logs from relational databases", 《BUSINESS PROCESS MANAGEMENT》 *
崔元 等: "基于大规模网络日志的模板提取研究", 《计算机科学》 *
顾兆军 等: "多源日志聚合分析方法", 《计算机工程与设计》 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111046012A (en) * 2019-12-02 2020-04-21 东软集团股份有限公司 Inspection log extraction method and device, storage medium and electronic equipment
CN111046012B (en) * 2019-12-02 2023-09-26 东软集团股份有限公司 Method and device for extracting inspection log, storage medium and electronic equipment
CN111813849A (en) * 2020-09-14 2020-10-23 杭州数梦工场科技有限公司 Data extraction method, device and equipment and storage medium
CN112463772A (en) * 2021-02-02 2021-03-09 北京信安世纪科技股份有限公司 Log processing method and device, log server and storage medium
CN112463772B (en) * 2021-02-02 2022-05-27 北京信安世纪科技股份有限公司 Log processing method and device, log server and storage medium
CN112882900A (en) * 2021-02-26 2021-06-01 山东浪潮通软信息科技有限公司 Method and device for recording service data change log
CN112882900B (en) * 2021-02-26 2022-11-29 浪潮通用软件有限公司 Method and device for recording service data change log

Also Published As

Publication number Publication date
CN110321410B (en) 2021-08-06

Similar Documents

Publication Publication Date Title
CN110321410A (en) Method, apparatus, storage medium and the electronic equipment that log is extracted
WO2023124204A1 (en) Anti-fraud risk assessment method and apparatus, training method and apparatus, and readable storage medium
CN109215630B (en) Real-time voice recognition method, device, equipment and storage medium
CN115204183A (en) Knowledge enhancement based dual-channel emotion analysis method, device and equipment
CN108091333A (en) Sound control method and Related product
WO2017000743A1 (en) Method and device for software recommendation
CN111327466B (en) Alarm analysis method, system, equipment and medium
CN113536770B (en) Text analysis method, device and equipment based on artificial intelligence and storage medium
CN108733557A (en) A kind of test point generation method and device
CN110363248A (en) The computer identification device and method of mobile crowdsourcing test report based on image
JP2021047627A (en) Material property prediction system and material property prediction method
CN114116108A (en) Dynamic rendering method, device, equipment and storage medium
US9569614B2 (en) Capturing correlations between activity and non-activity attributes using N-grams
CN110427277B (en) Data verification method, device, equipment and storage medium
CN105550250B (en) A kind of processing method and processing device of access log
CN110766402B (en) Transaction sequence dependency vulnerability detection method, system, electronic device and storage medium
CN116977783A (en) Training method, device, equipment and medium of target detection model
CN107038117A (en) It is a kind of based on the web automated testing methods that reference is defined between event handling function
CN115774784A (en) Text object identification method and device
CN112784552B (en) Table editing method and apparatus
CN109815118A (en) Data base management method and device, electronic equipment and computer readable storage medium
CN109635287A (en) Method, apparatus, computer equipment and the storage medium of policy dynamics analysis
WO2022141793A1 (en) Method and apparatus for building durian tracing model, and durian tracing method
CN114581086A (en) Phishing account detection method and system based on dynamic time sequence network
CN108875374A (en) Malice PDF detection method and device based on document node type

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant