CN109144964A - log analysis method and device based on machine learning - Google Patents

log analysis method and device based on machine learning Download PDF

Info

Publication number
CN109144964A
CN109144964A CN201810957288.XA CN201810957288A CN109144964A CN 109144964 A CN109144964 A CN 109144964A CN 201810957288 A CN201810957288 A CN 201810957288A CN 109144964 A CN109144964 A CN 109144964A
Authority
CN
China
Prior art keywords
log
group
item
information
functional value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810957288.XA
Other languages
Chinese (zh)
Inventor
王吉伟
范渊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Dbappsecurity Technology Co Ltd
Original Assignee
Hangzhou Dbappsecurity Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Dbappsecurity Technology Co Ltd filed Critical Hangzhou Dbappsecurity Technology Co Ltd
Priority to CN201810957288.XA priority Critical patent/CN109144964A/en
Publication of CN109144964A publication Critical patent/CN109144964A/en
Pending legal-status Critical Current

Links

Landscapes

  • Debugging And Monitoring (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention provides the log analysis method and devices based on machine learning, comprising: obtains original log information;Original log information is grouped by dimension, obtains multiple groups log information, wherein includes multiple log text informations in every group of log information, each log text information includes N number of character string, and N is greater than and is equal to 2;Right according to N number of character string M item of composition, M is greater than and is equal to 1;According to clustering algorithm by multiple groups log information and item to being clustered into log event classification group;Every group of highest log event of frequency is chosen from log event classification group;Log event based on selection generates log template, so as to improve log analyzing efficiency and precision.

Description

Log analysis method and device based on machine learning
Technical field
The present invention relates to field of computer technology, more particularly, to the log analysis method and device based on machine learning.
Background technique
In order to guarantee system information safety, log is almost the indispensable a part of all systems.Log mainly by with The information that the generates when operation of record system, such as the exception of system, regular job, user behavior event association attributes with Information.These information have very important work to the operating status for understanding system and using the user behavior habit etc. of the system With, therefore it is usually used in system exception monitoring, system user behavioural analysis etc..
Data volume with system user scale, the growth of system complexity, log increases therewith, the developer of system or The behavior of state and system user when the O&M person of person's system will run according to log information monitoring system abundant, with This goes to the source of tracking system abnormal problem, and prediction user uses the behavior etc. of system.General relatively conventional log parses skill Art is based on regular expression and extracts mode, then the mode based on extraction is simply classified.The major defect of this technology is The journal format processing accuracy of diversification is very low, and performance is also comparatively low.
Summary of the invention
In view of this, improving day the purpose of the present invention is to provide the log analysis method and device based on machine learning Will analyzing efficiency and precision.
In a first aspect, the embodiment of the invention provides the log analytic methods based on machine learning, which comprises
Obtain original log information;
The original log information is grouped by dimension, obtains multiple groups log information, wherein every group of log information In include multiple log text informations, each log text information includes N number of character string, and N is greater than and is equal to 2;
Right according to M item of N number of character string composition, M is greater than and is equal to 1;
According to clustering algorithm by the multiple groups log information and the item to being clustered into log event classification group;
Every group of highest log event of frequency is chosen from the log event classification group;
Log event based on selection generates log template.
Further, described to be returned the multiple groups log information and the item to log event is clustered into according to clustering algorithm Class group includes repeating following iterative processing, until each log text information is traversed:
It is right based on the item, the log text information is calculated in the first potential functional value currently organized, and is worked as to described Preceding group is marked;
Calculate second potential functional value of the log text information in unmarked group;
Described first potential functional value is compared with the described second potential functional value;
If the second potential functional value be greater than the described first potential functional value, update the log text information from Described current group is moved to unmarked group of the information;
If the second potential functional value is equal to the described first potential functional value, using current group as the log Event classification group.
Further, described right based on the item, the log text information is calculated in the first potential function currently organized Value, comprising:
The described first potential functional value is calculated according to the following formula:
Wherein, ω (B) is the described first potential functional value, is the log text to r ∈ R (B), N (r, B) for the item It include the item in this information B to the log quantity of r, p (r, B)=N (r, B)/| B | to include in the log-file information B The item calculates the log proportion of r, the second potential functional value by above-mentioned formula.
It is further, described that every group of highest log event of frequency is chosen from the log event classification group, comprising:
Count in the log event classification group frequency of each item to appearance in every group of log information;
Every group of log information middle term is reached into the item of pre-determined number to as candidate item to the frequency of appearance;
By the candidate sets chosen in every group of log information at log event candidate;
The highest log event of every group of frequency of occurrences is chosen from the log event candidate.
It is further, described right according to M item of N number of character string composition, comprising:
It is right that the item is calculated according to the following formula:
Wherein, M is the number of the item pair, and N is the quantity of the character string.
Second aspect, the embodiment of the invention provides the log resolver based on machine learning, described device includes:
Acquiring unit, for obtaining original log information;
Grouped element obtains multiple groups log information for the original log information to be grouped by dimension, In, it include multiple log text informations in every group of log information, each log text information includes N number of character string, and N is greater than and waits In 2;
Component units, for being constituted according to N number of character string, M item is right, and M is greater than and is equal to 1;
Cluster cell, for being returned the multiple groups log information and the item to log event is clustered into according to clustering algorithm Class group;
Selection unit, for choosing every group of highest log event of frequency from the log event classification group;
Generation unit generates log template for the log event based on selection.
Further, the cluster cell is for repeating following iterative processing, until each log text envelope Breath is all traversed:
It is right based on the item, the log text information is calculated in the first potential functional value currently organized, and is worked as to described Preceding group is marked;
Calculate second potential functional value of the log text information in unmarked group;
Described first potential functional value is compared with the described second potential functional value;
If the second potential functional value be greater than the described first potential functional value, update the log text information from Described current group is moved to unmarked group of the information;
If the second potential functional value is equal to the described first potential functional value, using current group as the log Event classification group.
Further, the cluster cell is used for:
The described first potential functional value is calculated according to the following formula:
Wherein, ω (B) is the described first potential functional value, is the log text to r ∈ R (B), N (r, B) for the item It include the item in this information B to the log quantity of r, p (r, B)=N (r, B)/| B | to include in the log-file information B The item calculates the log proportion of r, the second potential functional value by above-mentioned formula.
Further, the selection unit is used for:
Count in the log event classification group frequency of each item to appearance in every group of log information;
Every group of log information middle term is reached into the item of pre-determined number to as candidate item to the frequency of appearance;
By the candidate sets chosen in every group of log information at log event candidate;
The highest log event of every group of frequency of occurrences is chosen from the log event candidate.
Further, the Component units are used for:
It is right that the item is calculated according to the following formula:
Wherein, M is the number of the item pair, and N is the quantity of the character string.
The embodiment of the invention provides the log analysis method and devices based on machine learning, comprising: obtains original log Information;Original log information is grouped by dimension, obtains multiple groups log information, wherein includes in every group of log information Multiple log text informations, each log text information include N number of character string, and N is greater than and is equal to 2;It is constituted according to N number of character string M item is right, and M is greater than and is equal to 1;According to clustering algorithm by multiple groups log information and item to being clustered into log event classification group;From Every group of highest log event of frequency is chosen in log event classification group;Log event based on selection generates log template, from And log analyzing efficiency and precision can be improved.
Other features and advantages of the present invention will illustrate in the following description, also, partly become from specification It obtains it is clear that understand through the implementation of the invention.The objectives and other advantages of the invention are in specification, claims And specifically noted structure is achieved and obtained in attached drawing.
To enable the above objects, features and advantages of the present invention to be clearer and more comprehensible, preferred embodiment is cited below particularly, and cooperate Appended attached drawing, is described in detail below.
Detailed description of the invention
It, below will be to specific in order to illustrate more clearly of the specific embodiment of the invention or technical solution in the prior art Embodiment or attached drawing needed to be used in the description of the prior art be briefly described, it should be apparent that, it is described below Attached drawing is some embodiments of the present invention, for those of ordinary skill in the art, before not making the creative labor It puts, is also possible to obtain other drawings based on these drawings.
Fig. 1 is the log analytic method flow chart based on machine learning that the embodiment of the present invention one provides;
The process of step S104 in the log analytic method based on machine learning that Fig. 2 provides for the embodiment of the present invention one Figure;
The process of step S105 in the log analytic method based on machine learning that Fig. 3 provides for the embodiment of the present invention one Figure;
Fig. 4 is the log resolver schematic diagram provided by Embodiment 2 of the present invention based on machine learning.
Icon:
10- acquiring unit;20- grouped element;30- Component units;40- cluster cell;50- selection unit;60- generates single Member.
Specific embodiment
In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with attached drawing to the present invention Technical solution be clearly and completely described, it is clear that described embodiments are some of the embodiments of the present invention, rather than Whole embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art are not making creative work premise Under every other embodiment obtained, shall fall within the protection scope of the present invention.
It is rule-based matching that general log, which parses the most common method, and mould is extracted from log by regular expression Formula, the subsequent mode based on extraction are simply classified, and are increased compared to relatively directly parsing content of text analyzing efficiency, but It is to traverse the timeliness that log parses log to have a significant impact, and if influence whether day comprising a large amount of unrelated logs in log The precision of the classification of will, log parsing can be also decreased obviously.
In this application, log information includes the text of variable part and the text of immutable part, most of log letters Breath is all non-structured text.Log parsing is the text by part immutable in original log from the text of variable part It separates, and is converted into the log event of a structuring.In log parsing, clustered by clustering algorithm.Cluster It is that similar object is divided into different group or more subsets (subset) by the method for static classification, allows in this way Member object in the same subset has similar some attributes.
To be described in detail to the embodiment of the present invention below convenient for understanding the present embodiment.
Embodiment one:
Fig. 1 is the log analytic method flow chart based on machine learning that the embodiment of the present invention one provides.
Referring to Fig.1, method includes the following steps:
Step S101 obtains original log information;
Here, original log information is the log information for having removed outlier.Due to including unrelated in original log information Therefore the log information of item when carrying out log parsing to original log information, needs to remove the log information of outlier, So as to improve the precision of log parsing.
Usually there are some changeless items in original log information, the first situation is that the position in log information is solid It is fixed constant, for example daily record data concentrates the timestamp for representing log generation, though to change attribute constant for content.These are not only Log classification is not helped, and will cause the increase of processing cost, therefore, it is necessary to remove;Second situation is in log Position in information is variation, such as IP address, port etc. that daily record data is concentrated, can be made a return journey using regular expression It removes.
Wherein, log parsing passes through by using specific potential functional value as the measurement standard for evaluating similar event Continuous iteration improves classification accuracy.Key step includes the generation of item pair, the cluster of log information and log classification template It generates.
Original log information is grouped by dimension, obtains multiple groups log information, wherein every group of day by step S102 It include multiple log text informations in will information, each log text information includes N number of character string, and N is greater than and is equal to 2;
Step S103, right according to N number of character string M item of composition, M is greater than and is equal to 1;
Here, since each log text information includes N number of character string, each character string is an item of log, every two The composable item of a item is right, and the relationship between M and N is from formula (2).For example, N number of character string include " 12 ", " 34 " and " AB ", then, 12 " is right with one item of " 34 " composition, and one item of " AB " and " 12 " composition is right, and " 34 " and " AB " constitute an item Right, i.e., it is right to may be constructed 3 items for 3 character strings.
Step S104, according to clustering algorithm by multiple groups log information and item to being clustered into log event classification group;
Here, right based on item, the potential functional value that each log text information is organized from a group to another is calculated separately, Whether increase by comparing the two potential functional values organized, so that it is determined that whether log text information moves, if increased, Illustrate that the log text information is moved to another group, and Update log grouping information from a group, passes through continuous iteration, choosing Bigger potential functional value is selected, to the last in an iteration, is increased without the potential functional value of any log text information Add, then current group can be determined as to log event classification group.
Step S105 chooses every group of highest log event of frequency from log event classification group;
Step S106, the log event based on selection generate log template.
Further, referring to Fig. 2, step S104 is the following steps are included: repeat following iterative processing, until each day Will text information is all traversed:
Step S201, it is right based on item, log text information is calculated in the first potential functional value currently organized, and to current group It is marked;
Step S202 calculates second potential functional value of the log text information in unmarked group;
First potential functional value is compared by step S203 with the second potential functional value;
Step S204, if the second potential functional value is greater than the first potential functional value, Update log text information is from working as Preceding group is moved to unmarked group of information;
Step S205, if the second potential functional value is equal to the first potential functional value, using current group as log thing Part classification group.
Specifically, right based on item, and log text information can be calculated in the first potential letter currently organized according to formula (1) Numerical value, the first potential functional value is the summation of all items pair in log text information, right after the first potential functional value has been calculated Current group is marked, and can distinguish with other groups, so that it is determined which can be moved to is unmarked for the log text information Group in.The second potential functional value is calculated by iteration, then by formula (1), then compares the first potential functional value and second Potential functional value if increased, illustrates the log text information from marked so that it is determined that whether log text information moves Current group be moved to another unlabelled group, and Update log grouping information, by continuous iteration, select bigger potential Functional value to the last in an iteration, increases without the potential functional value of any log text information, then can will work as Preceding grouping is determined as log event classification group.
Further, step S201 includes:
The first potential functional value is calculated according to formula (1):
Wherein, ω (B) is the first potential functional value, to r ∈ R (B), N (r, B) is wrapped in log text information B for item Item is included to the log quantity of r, p (r, B)=N (r, B)/| B | to include log proportion of the item to r in log-file information B, Second potential functional value is calculated by above-mentioned formula.
Further, referring to Fig. 3, step S105 the following steps are included:
Step S301, frequency of each item to appearance in every group of log information in statistical log event classification group;
Every group of log information middle term is reached the item of pre-determined number to as candidate item to the frequency of appearance by step S302;
Here, the item of pre-determined number is right to the item for being more than half for frequency of occurrence.
Step S303, by the candidate sets chosen in every group of log information at log event candidate;
Step S304 chooses every group of highest log event of the frequency of occurrences from log event candidate.
Specifically, each log text information in every group has the sequence Item of high matching score.Log template generates In the process, construct log information label first, i.e., in preservation log event classification group, each item in each log text information To the frequency of appearance, selecting frequency of occurrence in every group is more than the item of half as candidate item, i.e. message label;Then, by every group The candidate sets contained in log information are candidate at log event, and the highest log event candidate of the frequency of occurrences is current in every group The final log template output of group.
Further, step S103 includes:
According to formula (2) computational item pair:
Wherein, M is the number of item pair, and N is the quantity of character string.
The embodiment of the invention provides the log analytic methods based on machine learning, comprising: obtains original log information;It will Original log information is grouped by dimension, obtains multiple groups log information, wherein includes multiple logs in every group of log information Text information, each log text information include N number of character string, and N is greater than and is equal to 2;It is right according to N number of character string M item of composition, M is greater than and is equal to 1;According to clustering algorithm by multiple groups log information and item to being clustered into log event classification group;From log event Every group of highest log event of frequency is chosen in classification group;Log event based on selection generates log template, so as to mention High log analyzing efficiency and precision.
Embodiment two:
Fig. 4 is the log resolver schematic diagram provided by Embodiment 2 of the present invention based on machine learning.
Referring to Fig. 4, which includes acquiring unit 10, grouped element 20, Component units 30, cluster cell 40, chooses list Member 50 and generation unit 60.
Acquiring unit 10, for obtaining original log information;
Grouped element 20 obtains multiple groups log information for original log information to be grouped by dimension, wherein It include multiple log text informations in every group of log information, each log text information includes N number of character string, and N is greater than and is equal to 2;
Component units 30, for being constituted according to N number of character string, M item is right, and M is greater than and is equal to 1;
Cluster cell 40, for being sorted out the multiple groups log information and item to log event is clustered into according to clustering algorithm Group;
Selection unit 50, for choosing every group of highest log event of frequency from log event classification group;
Generation unit 60 generates log template for the log event based on selection.
Further, cluster cell 40 is for repeating following iterative processing, until each log text information by Traversal:
It is right based on item, log text information is calculated in the first potential functional value currently organized, and current group is marked;
Calculate second potential functional value of the log text information in unmarked group;
First potential functional value is compared with the second potential functional value;
If the second potential functional value is greater than the first potential functional value, Update log text information is moved to from current group Unmarked group of information;
If the second potential functional value is equal to the first potential functional value, using current group as log event classification group.
Further, cluster cell 40 is used for:
The first potential functional value is calculated according to formula (1):
Wherein, ω (B) is the first potential functional value, to r ∈ R (B), N (r, B) is wrapped in log text information B for item Item is included to the log quantity of r, p (r, B)=N (r, B)/| B | to include log proportion of the item to r in log-file information B, Second potential functional value is calculated by above-mentioned formula.
Further, selection unit 50 is used for:
Frequency of each item to appearance in every group of log information in statistical log event classification group;
Every group of log information middle term is reached into the item of pre-determined number to as candidate item to the frequency of appearance;
By the candidate sets chosen in every group of log information at log event candidate;
The highest log event of every group of frequency of occurrences is chosen from log event candidate.
Further, Component units 30 are used for:
According to formula (2) computational item pair:
Wherein, M is the number of item pair, and N is the quantity of character string.
The embodiment of the invention provides the log resolvers based on machine learning, comprising: obtains original log information;It will Original log information is grouped by dimension, obtains multiple groups log information, wherein includes multiple logs in every group of log information Text information, each log text information include N number of character string, and N is greater than and is equal to 2;It is right according to N number of character string M item of composition, M is greater than and is equal to 1;According to clustering algorithm by multiple groups log information and item to being clustered into log event classification group;From log event Every group of highest log event of frequency is chosen in classification group;Log event based on selection generates log template, so as to mention High log analyzing efficiency and precision.
The embodiment of the present invention also provides a kind of electronic equipment, including memory, processor and storage are on a memory and can The computer program run on a processor, processor are realized provided by the above embodiment based on machine when executing computer program The step of log analytic method of study.
The embodiment of the present invention also provides a kind of computer readable storage medium, and meter is stored on computer readable storage medium Calculation machine program executes the log analytic method based on machine learning of above-described embodiment when computer program is run by processor Step.
Computer program product provided by the embodiment of the present invention, the computer-readable storage including storing program code Medium, the instruction that said program code includes can be used for executing previous methods method as described in the examples, and specific implementation can be joined See embodiment of the method, details are not described herein.
It is apparent to those skilled in the art that for convenience and simplicity of description, the system of foregoing description It with the specific work process of device, can refer to corresponding processes in the foregoing method embodiment, details are not described herein.
In addition, in the description of the embodiment of the present invention unless specifically defined or limited otherwise, term " installation ", " phase Even ", " connection " shall be understood in a broad sense, for example, it may be being fixedly connected, may be a detachable connection, or be integrally connected;It can To be mechanical connection, it is also possible to be electrically connected;It can be directly connected, can also can be indirectly connected through an intermediary Connection inside two elements.For the ordinary skill in the art, above-mentioned term can be understood at this with concrete condition Concrete meaning in invention.
It, can be with if the function is realized in the form of SFU software functional unit and when sold or used as an independent product It is stored in a computer readable storage medium.Based on this understanding, technical solution of the present invention is substantially in other words The part of the part that contributes to existing technology or the technical solution can be embodied in the form of software products, the meter Calculation machine software product is stored in a storage medium, including some instructions are used so that a computer equipment (can be a People's computer, server or network equipment etc.) it performs all or part of the steps of the method described in the various embodiments of the present invention. And storage medium above-mentioned includes: that USB flash disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), arbitrary access are deposited The various media that can store program code such as reservoir (RAM, Random Access Memory), magnetic or disk.
In the description of the present invention, it should be noted that term " center ", "upper", "lower", "left", "right", "vertical", The orientation or positional relationship of the instructions such as "horizontal", "inner", "outside" be based on the orientation or positional relationship shown in the drawings, merely to Convenient for description the present invention and simplify description, rather than the device or element of indication or suggestion meaning must have a particular orientation, It is constructed and operated in a specific orientation, therefore is not considered as limiting the invention.In addition, term " first ", " second ", " third " is used for descriptive purposes only and cannot be understood as indicating or suggesting relative importance.
Finally, it should be noted that embodiment described above, only a specific embodiment of the invention, to illustrate the present invention Technical solution, rather than its limitations, scope of protection of the present invention is not limited thereto, although with reference to the foregoing embodiments to this hair It is bright to be described in detail, those skilled in the art should understand that: anyone skilled in the art In the technical scope disclosed by the present invention, it can still modify to technical solution documented by previous embodiment or can be light It is readily conceivable that variation or equivalent replacement of some of the technical features;And these modifications, variation or replacement, do not make The essence of corresponding technical solution is detached from the spirit and scope of technical solution of the embodiment of the present invention, should all cover in protection of the invention Within the scope of.Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (10)

1. a kind of log analytic method based on machine learning, which is characterized in that the described method includes:
Obtain original log information;
The original log information is grouped by dimension, obtains multiple groups log information, wherein wrap in every group of log information Multiple log text informations are included, each log text information includes N number of character string, and N is greater than and is equal to 2;
Right according to M item of N number of character string composition, M is greater than and is equal to 1;
According to clustering algorithm by the multiple groups log information and the item to being clustered into log event classification group;
Every group of highest log event of frequency is chosen from the log event classification group;
Log event based on selection generates log template.
2. the log analytic method according to claim 1 based on machine learning, which is characterized in that described to be calculated according to cluster The multiple groups log information and the item are included repeating following iterative processing to log event classification group is clustered by method, Until each log text information is traversed:
It is right based on the item, the log text information is calculated in the first potential functional value currently organized, and to described current group It is marked;
Calculate second potential functional value of the log text information in unmarked group;
Described first potential functional value is compared with the described second potential functional value;
If the second potential functional value is greater than the described first potential functional value, the log text information is updated from described Current group is moved to unmarked group of the information;
If the second potential functional value is equal to the described first potential functional value, using current group as the log event Classification group.
3. the log analytic method according to claim 2 based on machine learning, which is characterized in that described to be based on the item It is right, the log text information is calculated in the first potential functional value currently organized, comprising:
The described first potential functional value is calculated according to the following formula:
Wherein, ω (B) is the described first potential functional value, is the log text envelope to r ∈ R (B), N (r, B) for the item Ceasing in B includes log quantity of the item to r, and p (r, B)=N (r, B)/B is in the log-file information B including the item To the log proportion of r, the second potential functional value is calculated by above-mentioned formula.
4. the log analytic method according to claim 1 based on machine learning, which is characterized in that described from the log Every group of highest log event of frequency is chosen in event classification group, comprising:
Count in the log event classification group frequency of each item to appearance in every group of log information;
Every group of log information middle term is reached into the item of pre-determined number to as candidate item to the frequency of appearance;
By the candidate sets chosen in every group of log information at log event candidate;
The highest log event of every group of frequency of occurrences is chosen from the log event candidate.
5. the log analytic method according to claim 1 based on machine learning, which is characterized in that described according to the N It is right that a character string constitutes M item, comprising:
It is right that the item is calculated according to the following formula:
Wherein, M is the number of the item pair, and N is the quantity of the character string.
6. a kind of log resolver based on machine learning, which is characterized in that described device includes:
Acquiring unit, for obtaining original log information;
Grouped element obtains multiple groups log information for the original log information to be grouped by dimension, wherein every It include multiple log text informations in group log information, each log text information includes N number of character string, and N is greater than and is equal to 2;
Component units, for being constituted according to N number of character string, M item is right, and M is greater than and is equal to 1;
Cluster cell, for being sorted out the multiple groups log information and the item to log event is clustered into according to clustering algorithm Group;
Selection unit, for choosing every group of highest log event of frequency from the log event classification group;
Generation unit generates log template for the log event based on selection.
7. the log resolver according to claim 6 based on machine learning, which is characterized in that the cluster cell is used In repeating following iterative processing, until each log text information is traversed:
It is right based on the item, the log text information is calculated in the first potential functional value currently organized, and to described current group It is marked;
Calculate second potential functional value of the log text information in unmarked group;
Described first potential functional value is compared with the described second potential functional value;
If the second potential functional value is greater than the described first potential functional value, the log text information is updated from described Current group is moved to unmarked group of the information;
If the second potential functional value is equal to the described first potential functional value, using current group as the log event Classification group.
8. the log resolver according to claim 7 based on machine learning, which is characterized in that the cluster cell is used In:
The described first potential functional value is calculated according to the following formula:
Wherein, ω (B) is the described first potential functional value, is the log text envelope to r ∈ R (B), N (r, B) for the item Ceasing in B includes log quantity of the item to r, and p (r, B)=N (r, B)/B is in the log-file information B including the item To the log proportion of r, the second potential functional value is calculated by above-mentioned formula.
9. the log resolver according to claim 6 based on machine learning, which is characterized in that the selection unit is used In:
Count in the log event classification group frequency of each item to appearance in every group of log information;
Every group of log information middle term is reached into the item of pre-determined number to as candidate item to the frequency of appearance;
By the candidate sets chosen in every group of log information at log event candidate;
The highest log event of every group of frequency of occurrences is chosen from the log event candidate.
10. the log resolver according to claim 6 based on machine learning, which is characterized in that the Component units For:
It is right that the item is calculated according to the following formula:
Wherein, M is the number of the item pair, and N is the quantity of the character string.
CN201810957288.XA 2018-08-21 2018-08-21 log analysis method and device based on machine learning Pending CN109144964A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810957288.XA CN109144964A (en) 2018-08-21 2018-08-21 log analysis method and device based on machine learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810957288.XA CN109144964A (en) 2018-08-21 2018-08-21 log analysis method and device based on machine learning

Publications (1)

Publication Number Publication Date
CN109144964A true CN109144964A (en) 2019-01-04

Family

ID=64790971

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810957288.XA Pending CN109144964A (en) 2018-08-21 2018-08-21 log analysis method and device based on machine learning

Country Status (1)

Country Link
CN (1) CN109144964A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110321457A (en) * 2019-04-19 2019-10-11 杭州玳数科技有限公司 Access log resolution rules generation method and device, log analytic method and system
CN111160021A (en) * 2019-10-12 2020-05-15 华为技术有限公司 Log template extraction method and device
CN111258975A (en) * 2020-04-26 2020-06-09 中国人民解放军总医院 Method, apparatus, device and medium for locating abnormality in image archiving communication system
CN111462826A (en) * 2020-04-09 2020-07-28 合肥本源量子计算科技有限责任公司 Method for prompting quantum chemical simulation calculation progress, electronic equipment and storage medium
WO2021088385A1 (en) * 2019-11-06 2021-05-14 国网上海市电力公司 Online log analysis method, system, and electronic terminal device thereof
CN114745452A (en) * 2022-03-29 2022-07-12 烽台科技(北京)有限公司 Equipment management method and device and electronic equipment

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110321457A (en) * 2019-04-19 2019-10-11 杭州玳数科技有限公司 Access log resolution rules generation method and device, log analytic method and system
CN111160021A (en) * 2019-10-12 2020-05-15 华为技术有限公司 Log template extraction method and device
WO2021088385A1 (en) * 2019-11-06 2021-05-14 国网上海市电力公司 Online log analysis method, system, and electronic terminal device thereof
CN111462826A (en) * 2020-04-09 2020-07-28 合肥本源量子计算科技有限责任公司 Method for prompting quantum chemical simulation calculation progress, electronic equipment and storage medium
CN111462826B (en) * 2020-04-09 2023-04-28 合肥本源量子计算科技有限责任公司 Method for prompting quantum chemistry simulation calculation progress, electronic equipment and storage medium
CN111258975A (en) * 2020-04-26 2020-06-09 中国人民解放军总医院 Method, apparatus, device and medium for locating abnormality in image archiving communication system
CN114745452A (en) * 2022-03-29 2022-07-12 烽台科技(北京)有限公司 Equipment management method and device and electronic equipment
CN114745452B (en) * 2022-03-29 2023-05-16 烽台科技(北京)有限公司 Equipment management method and device and electronic equipment

Similar Documents

Publication Publication Date Title
CN109144964A (en) log analysis method and device based on machine learning
US10237295B2 (en) Automated event ID field analysis on heterogeneous logs
CN104298679B (en) Applied business recommended method and device
JP6233411B2 (en) Fault analysis apparatus, fault analysis method, and computer program
CN111797210A (en) Information recommendation method, device and equipment based on user portrait and storage medium
CN103678702A (en) Video duplicate removal method and device
CN111160021A (en) Log template extraction method and device
CN110928957A (en) Data clustering method and device
CN110134845A (en) Project public sentiment monitoring method, device, computer equipment and storage medium
CN112860685A (en) Automatic recommendation of analysis of data sets
CN110263121B (en) Table data processing method, apparatus, electronic apparatus and computer readable storage medium
WO2016093839A1 (en) Structuring of semi-structured log messages
CN113806492A (en) Record generation method, device and equipment based on semantic recognition and storage medium
CN117420998A (en) Client UI interaction component generation method, device, terminal and medium
CN113760891A (en) Data table generation method, device, equipment and storage medium
CN113468866B (en) Method and device for analyzing non-standard JSON string
CN114610955A (en) Intelligent retrieval method and device, electronic equipment and storage medium
CN115905630A (en) Graph database query method, device, equipment and storage medium
CN116822491A (en) Log analysis method and device, equipment and storage medium
CN115291931A (en) Version change processing method and device, electronic equipment and storage medium
CN117501275A (en) Method, computer program product and computer system for analyzing data consisting of a large number of individual messages
CN115051863A (en) Abnormal flow detection method and device, electronic equipment and readable storage medium
CN109947891B (en) Document analysis method and device
CN108846103A (en) A kind of data query method and device
CN106469086B (en) Event processing method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20190104

RJ01 Rejection of invention patent application after publication