CN108280021A - A kind of logging level analysis method based on machine learning - Google Patents

A kind of logging level analysis method based on machine learning Download PDF

Info

Publication number
CN108280021A
CN108280021A CN201810075006.3A CN201810075006A CN108280021A CN 108280021 A CN108280021 A CN 108280021A CN 201810075006 A CN201810075006 A CN 201810075006A CN 108280021 A CN108280021 A CN 108280021A
Authority
CN
China
Prior art keywords
machine learning
information
daily record
method based
analysis method
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810075006.3A
Other languages
Chinese (zh)
Inventor
梁盛楠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhengzhou Yunhai Information Technology Co Ltd
Original Assignee
Zhengzhou Yunhai Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhengzhou Yunhai Information Technology Co Ltd filed Critical Zhengzhou Yunhai Information Technology Co Ltd
Priority to CN201810075006.3A priority Critical patent/CN108280021A/en
Publication of CN108280021A publication Critical patent/CN108280021A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3466Performance evaluation by tracing or monitoring
    • G06F11/3476Data logging
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3452Performance evaluation by statistical analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/1805Append-only file systems, e.g. using logs or journals to store data
    • G06F16/1815Journaling file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2264Multidimensional index structures

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Quality & Reliability (AREA)
  • Computer Hardware Design (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Debugging And Monitoring (AREA)

Abstract

A kind of log analysis method based on machine learning algorithm of the present application.This method is used as training set by collecting largely existing log information, and builds machine learning model, then by the study and test to training set, forms a high-precision prediction model;Then it allows the operation of this prediction model to automatically analyze the new daily record of each hereafter occurred in the server, and predict the menace level of this daily record, if being the discovery that the daily record of a high-risk grade, notify maintenance personnel immediately, accomplishes to timely respond to, timely processing problem.

Description

A kind of logging level analysis method based on machine learning
Technical field
The present invention relates to server detection fields, and in particular to a kind of logging level analysis method based on machine learning.
Background technology
Under normal circumstances, a variety of services would generally be run on a server, open tens of kinds of softwares, while also having fortune Dimension personnel periodically safeguard and test to server machine, this series of operation can all generate a large amount of operation data.It is logical In the case of often, for the safety of server, the service of operating system and operation all can carry out log recording to some key operations, For example user's Telnet, connection database, program are broken down collapse etc., by these key message record logs, Ke Yiyou Help facilitating carry out positioning problems when something goes wrong in the future.
A large amount of daily record note can be also stored since the journal file on server is very more, and in each journal file Record, therefore how content therein analyzed, extract effective information such as warning message, miscue information etc. Deng being an important means for improving working efficiency for maintenance personnel.
At present more commonly used processing method be after generation problem, by manually extracting correlation log from server, and Correlation log is organized into table or the file of other set forms, is then found according to time point at the time of go wrong, then The daily record at the moment is analyzed, navigates to where problem, finally problem is handled.The method of this analyzing and positioning, Inefficiency, response lag can not find simultaneously orientation problem in first time, therefore there is an urgent need for a kind of more efficient ways, to clothes The journal file of business device carries out processing analysis.
In view of the today developed rapidly in artificial intelligence, machine learning field achieves very big progress, can incite somebody to action The machine learning algorithm of comparative maturity is applied in massive logs analysis.In consideration of it, to solve the above-mentioned problems, the present application A kind of log analysis method based on machine learning algorithm.This method is used as training by collecting largely existing log information Collection, and training pattern is built, then by the study and test to training set, form the very high prediction model of a precision.Then It allows this model running in the server, allows it to automatically analyze the new daily record of each hereafter occurred, and predict this daily record Menace level notifies maintenance personnel by mail or other means immediately if being the discovery that the daily record of a high-risk grade, accomplish and When response, timely processing problem.
Invention content
Specifically, a kind of logging level analysis method based on machine learning is claimed in the application, which is characterized in that should Method specifically includes:
It obtains massive logs in server and records information;
Extraction daily record text information and logging level information in information are recorded from the massive logs;
The daily record text information of extraction and logging level information are digitized processing, generate two-dimensional data structure;
Massive logs are recorded into information random arrangement, are divided into two parts, a part is used for as training set to machine Learning model is trained, and another part learns precision as test set, for testing it to machine learning model;
A higher model of precision is selected in the server, to allow it to automatically analyze as prediction model operation and hereafter occur The new daily record of each, and predict the menace level of this daily record, if it find that the daily record of high-risk grade, then notice dimension immediately Shield personnel.
Logging level analysis method based on machine learning as described above, it is further characterized in that, the machine learning model KNN or logistic regression algorithm can be selected as learning model.
Logging level analysis method based on machine learning as described above, it is further characterized in that, it is recorded in massive logs In two parts of information segmentation, the ratio of training set and test set is 9:1.
Logging level analysis method based on machine learning as described above, it is further characterized in that, generate two-dimensional data Structure, which is that all words of setting are corresponding, is classified as feature, sets last row result to label.
Logging level analysis method based on machine learning as described above, it is further characterized in that, for subsequent server Every log information of middle appearance, is broken down into feature information, and corresponding level is then predicted after prediction model, Prompt message is automatically generated further according to level.
Description of the drawings
Fig. 1, log analysis flow chart
Specific implementation mode
Log recording in journal file would generally include many information, and our prediction model needs to establish practical text Relationship between word content (English) and this daily record rank, and other time of origins, user, place process etc. are to daily record Rank influences less, to ignore.Therefore in the analysis of journal file, it is only necessary to out by this two key extracteds.
In concrete operations, because processor is more sensitive to number, it is therefore desirable to which text information is converted to digital information. Relatively simple daily record rank is processed first, for example, can be that different daily record ranks assigns different numbers, such as ' error ' corresponding 0, ' warning ' corresponding 1, ' normal ' corresponding 2, and so on.And in fact, daily record word content is complicated It is more, can not all be indicated with individual number for every daily record.In the present invention, it is by all log contents in training set It is split as word, forms the structure of similar one-dimension array, the word content of every daily record includes only sub-fraction word therein, If including word, indicates that the word not included is indicated with 0 with number 1.It illustrates:
Log recording log1 text informations are " User admin login ", and grade is ' normal ' (2), then data structure As shown in the table:
login admin User level
log1 1 0 0 1 0 0 1 0 2
Daily record word after the digitlization of table 1 and logging level structure
Wherein ' ... ' represents the word appeared in other log contents.Since daily record word content is ever-changing, so The row of the table structure may be very more, but last row must be level, that is, the rank of this daily record.
It can be arranged that all words are corresponding to be classified as feature, set last row result to label.By a large amount of Existing feature and level data are trained machine learning model, in the daily record to occur to future Appearance is predicted.In view of although daily record word content may be very various, it is in a limited set eventually In, therefore can be by obtaining a higher result of precision to a large amount of training of model.
To make the object, technical solutions and advantages of the present invention clearer, below in conjunction with specific embodiment to the present invention It does and is further described in detail:
Attached drawing 1 show how the process that machine learning model is trained by massive logs data.
100,000 log informations in server are obtained first, therefrom extract daily record text information and logging level letter Breath writes program and is digitized processing, forms the data structure of similar table 1, will have 100,000 row data in this structure.
Secondly, can be random arrange to this 100,000 data, is then split, wherein 90,000 are used as training set, Remaining 10,000 are used as test manifold, how to test trained model accuracy.
On algorithm, since this is a supervised learning scene for classification, can select KNN or logistic regression scheduling algorithm as Learning model, one higher model of precision of final choice is as final scheme.
After determining final learning model, the every log information occurred in later server is now broken down into feature Then information predicts corresponding level after model, notify maintenance personnel to carry out subsequent processing come automatic according to level.
The key technology point of the present invention is application of the machine learning algorithm in word language processing, by the way that magnanimity is literary Word information is converted into digital information, builds machine learning model, and be trained to it, obtains the higher prediction mould of a precision Type.
It should be evident that illustrated above is only the specific embodiment of the present invention, for the common skill in this field For art personnel, without creative efforts, other technical solutions can also be obtained according to above-described embodiment, And the equivalent variations made within the protection scope of the present invention should all be fallen within the scope of protection of the present invention, and belong to the present invention The range of protection.
In conclusion the logging level analysis method of the present invention based on machine learning, can liberate manual analysis The time cost of daily record, full automation analysis and prediction log content, substantially increase the efficiency of analysis journal file, save Time for solving the problems, such as.

Claims (5)

1. a kind of logging level analysis method based on machine learning, which is characterized in that this method specifically includes:
It obtains massive logs in server and records information;
Extraction daily record text information and logging level information in information are recorded from the massive logs;
The daily record text information of extraction and logging level information are digitized processing, generate two-dimensional data structure;
Massive logs are recorded into information random arrangement, are divided into two parts, a part is used for as training set to machine learning Model is trained, and another part learns precision as test set, for testing it to machine learning model;
Select a higher model of precision as prediction model operation in the server, allow it automatically analyze hereafter occur it is every One new daily record, and the menace level of this daily record is predicted, if it find that the daily record of high-risk grade, then notice safeguards people immediately Member.
2. the logging level analysis method based on machine learning as described in claim 1, it is further characterized in that, the machine learning Model can select KNN or logistic regression algorithm as learning model.
3. the logging level analysis method based on machine learning as claimed in claim 2, it is further characterized in that, in massive logs In two parts for recording information segmentation, the ratio of training set and test set is 9:1.
4. the logging level analysis method based on machine learning as claimed in claim 3, it is further characterized in that, it generates two-dimensional Data structure, which is that all words of setting are corresponding, is classified as feature, sets last row result to label.
5. the logging level analysis method based on machine learning as claimed in claim 4, it is further characterized in that, for subsequently taking The every log information occurred in business device, is broken down into feature information, is then predicted after prediction model corresponding Level automatically generates prompt message further according to level.
CN201810075006.3A 2018-01-25 2018-01-25 A kind of logging level analysis method based on machine learning Pending CN108280021A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810075006.3A CN108280021A (en) 2018-01-25 2018-01-25 A kind of logging level analysis method based on machine learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810075006.3A CN108280021A (en) 2018-01-25 2018-01-25 A kind of logging level analysis method based on machine learning

Publications (1)

Publication Number Publication Date
CN108280021A true CN108280021A (en) 2018-07-13

Family

ID=62805334

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810075006.3A Pending CN108280021A (en) 2018-01-25 2018-01-25 A kind of logging level analysis method based on machine learning

Country Status (1)

Country Link
CN (1) CN108280021A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108897674A (en) * 2018-07-12 2018-11-27 郑州云海信息技术有限公司 A kind of log analysis method and device
CN110806962A (en) * 2019-11-06 2020-02-18 星环信息科技(上海)有限公司 Log level prediction method, device and storage medium
CN111045847A (en) * 2019-12-18 2020-04-21 Oppo广东移动通信有限公司 Event auditing method and device, terminal equipment and storage medium
CN111565256A (en) * 2019-02-13 2020-08-21 精工爱普生株式会社 Information processing device, learning device, and non-transitory recording medium
CN111708681A (en) * 2020-06-15 2020-09-25 北京优特捷信息技术有限公司 Log processing method, device, equipment and storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104331716A (en) * 2014-11-20 2015-02-04 武汉图歌信息技术有限责任公司 SVM active learning classification algorithm for large-scale training data
CN104811452A (en) * 2015-04-30 2015-07-29 北京科技大学 Data mining based intrusion detection system with self-learning and classified early warning functions
CN105139714A (en) * 2015-10-10 2015-12-09 国电南瑞科技股份有限公司 Visualized simulation training system and method for electrified railway traction substation
CN105378699A (en) * 2013-11-27 2016-03-02 Ntt都科摩公司 Automatic task classification based upon machine learning
CN105528280A (en) * 2015-11-30 2016-04-27 中电科华云信息技术有限公司 Method and system capable of determining log alarm grades according to relationship between system logs and health monitoring
CN107301118A (en) * 2017-06-15 2017-10-27 中国科学院计算技术研究所 A kind of fault indices automatic marking method and system based on daily record
CN107577588A (en) * 2017-09-26 2018-01-12 北京中安智达科技有限公司 A kind of massive logs data intelligence operational system

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105378699A (en) * 2013-11-27 2016-03-02 Ntt都科摩公司 Automatic task classification based upon machine learning
CN104331716A (en) * 2014-11-20 2015-02-04 武汉图歌信息技术有限责任公司 SVM active learning classification algorithm for large-scale training data
CN104811452A (en) * 2015-04-30 2015-07-29 北京科技大学 Data mining based intrusion detection system with self-learning and classified early warning functions
CN105139714A (en) * 2015-10-10 2015-12-09 国电南瑞科技股份有限公司 Visualized simulation training system and method for electrified railway traction substation
CN105528280A (en) * 2015-11-30 2016-04-27 中电科华云信息技术有限公司 Method and system capable of determining log alarm grades according to relationship between system logs and health monitoring
CN107301118A (en) * 2017-06-15 2017-10-27 中国科学院计算技术研究所 A kind of fault indices automatic marking method and system based on daily record
CN107577588A (en) * 2017-09-26 2018-01-12 北京中安智达科技有限公司 A kind of massive logs data intelligence operational system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
郭子昂: "基于GPR预判模型的海量日志流实时异常检测研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108897674A (en) * 2018-07-12 2018-11-27 郑州云海信息技术有限公司 A kind of log analysis method and device
CN111565256A (en) * 2019-02-13 2020-08-21 精工爱普生株式会社 Information processing device, learning device, and non-transitory recording medium
CN111565256B (en) * 2019-02-13 2022-04-22 精工爱普生株式会社 Information processing device, learning device, and non-transitory recording medium
CN110806962A (en) * 2019-11-06 2020-02-18 星环信息科技(上海)有限公司 Log level prediction method, device and storage medium
CN111045847A (en) * 2019-12-18 2020-04-21 Oppo广东移动通信有限公司 Event auditing method and device, terminal equipment and storage medium
CN111045847B (en) * 2019-12-18 2023-07-21 Oppo广东移动通信有限公司 Event auditing method, device, terminal equipment and storage medium
CN111708681A (en) * 2020-06-15 2020-09-25 北京优特捷信息技术有限公司 Log processing method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
CN108280021A (en) A kind of logging level analysis method based on machine learning
CN105653444B (en) Software defect fault recognition method and system based on internet daily record data
US10565519B2 (en) Systems and method for performing contextual classification using supervised and unsupervised training
CN104067567A (en) Systems and methods for spam detection using character histograms
CN102541999A (en) Object-sensitive image search
CN111949480A (en) Log anomaly detection method based on component perception
CN111078979A (en) Method and system for identifying network credit website based on OCR and text processing technology
KR20190113680A (en) Method and apparatus for generating test case for web pages
CN118296164A (en) Automatic agricultural product information acquisition and updating method and system based on knowledge graph
CN112395513A (en) Public opinion transmission power analysis method
CN115269438A (en) Automatic testing method and device for image processing algorithm
CN116361147A (en) Method for positioning root cause of test case, device, equipment, medium and product thereof
CN117544482A (en) Operation and maintenance fault determining method, device, equipment and storage medium based on AI
CN112861530A (en) Course setting analysis method based on text mining
CN116484109B (en) Customer portrait analysis system and method based on artificial intelligence
CN117411780A (en) Network log anomaly detection method based on multi-source data characteristics
CN111209158A (en) Mining monitoring method and cluster monitoring system for server cluster
Eckstein et al. Towards extracting customer needs from incident tickets in it services
Roelands et al. Classifying businesses by economic activity using web-based text mining
KR20150142459A (en) Automated system and method of instrument index
CN117272198B (en) Abnormal user generated content identification method based on business travel business data
CN117556256B (en) Private domain service label screening system and method based on big data
CN111598159B (en) Training method, device, equipment and storage medium of machine learning model
CN117555501B (en) Cloud printer operation and data processing method based on edge calculation and related device
CN112968941B (en) Data acquisition and man-machine collaborative annotation method based on edge calculation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20180713

RJ01 Rejection of invention patent application after publication