CN108280021A - A kind of logging level analysis method based on machine learning - Google Patents
A kind of logging level analysis method based on machine learning Download PDFInfo
- Publication number
- CN108280021A CN108280021A CN201810075006.3A CN201810075006A CN108280021A CN 108280021 A CN108280021 A CN 108280021A CN 201810075006 A CN201810075006 A CN 201810075006A CN 108280021 A CN108280021 A CN 108280021A
- Authority
- CN
- China
- Prior art keywords
- machine learning
- information
- daily record
- method based
- analysis method
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/34—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
- G06F11/3466—Performance evaluation by tracing or monitoring
- G06F11/3476—Data logging
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/34—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
- G06F11/3452—Performance evaluation by statistical analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/18—File system types
- G06F16/1805—Append-only file systems, e.g. using logs or journals to store data
- G06F16/1815—Journaling file systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2228—Indexing structures
- G06F16/2264—Multidimensional index structures
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Quality & Reliability (AREA)
- Computer Hardware Design (AREA)
- Databases & Information Systems (AREA)
- Software Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Probability & Statistics with Applications (AREA)
- Debugging And Monitoring (AREA)
Abstract
A kind of log analysis method based on machine learning algorithm of the present application.This method is used as training set by collecting largely existing log information, and builds machine learning model, then by the study and test to training set, forms a high-precision prediction model;Then it allows the operation of this prediction model to automatically analyze the new daily record of each hereafter occurred in the server, and predict the menace level of this daily record, if being the discovery that the daily record of a high-risk grade, notify maintenance personnel immediately, accomplishes to timely respond to, timely processing problem.
Description
Technical field
The present invention relates to server detection fields, and in particular to a kind of logging level analysis method based on machine learning.
Background technology
Under normal circumstances, a variety of services would generally be run on a server, open tens of kinds of softwares, while also having fortune
Dimension personnel periodically safeguard and test to server machine, this series of operation can all generate a large amount of operation data.It is logical
In the case of often, for the safety of server, the service of operating system and operation all can carry out log recording to some key operations,
For example user's Telnet, connection database, program are broken down collapse etc., by these key message record logs, Ke Yiyou
Help facilitating carry out positioning problems when something goes wrong in the future.
A large amount of daily record note can be also stored since the journal file on server is very more, and in each journal file
Record, therefore how content therein analyzed, extract effective information such as warning message, miscue information etc.
Deng being an important means for improving working efficiency for maintenance personnel.
At present more commonly used processing method be after generation problem, by manually extracting correlation log from server, and
Correlation log is organized into table or the file of other set forms, is then found according to time point at the time of go wrong, then
The daily record at the moment is analyzed, navigates to where problem, finally problem is handled.The method of this analyzing and positioning,
Inefficiency, response lag can not find simultaneously orientation problem in first time, therefore there is an urgent need for a kind of more efficient ways, to clothes
The journal file of business device carries out processing analysis.
In view of the today developed rapidly in artificial intelligence, machine learning field achieves very big progress, can incite somebody to action
The machine learning algorithm of comparative maturity is applied in massive logs analysis.In consideration of it, to solve the above-mentioned problems, the present application
A kind of log analysis method based on machine learning algorithm.This method is used as training by collecting largely existing log information
Collection, and training pattern is built, then by the study and test to training set, form the very high prediction model of a precision.Then
It allows this model running in the server, allows it to automatically analyze the new daily record of each hereafter occurred, and predict this daily record
Menace level notifies maintenance personnel by mail or other means immediately if being the discovery that the daily record of a high-risk grade, accomplish and
When response, timely processing problem.
Invention content
Specifically, a kind of logging level analysis method based on machine learning is claimed in the application, which is characterized in that should
Method specifically includes:
It obtains massive logs in server and records information;
Extraction daily record text information and logging level information in information are recorded from the massive logs;
The daily record text information of extraction and logging level information are digitized processing, generate two-dimensional data structure;
Massive logs are recorded into information random arrangement, are divided into two parts, a part is used for as training set to machine
Learning model is trained, and another part learns precision as test set, for testing it to machine learning model;
A higher model of precision is selected in the server, to allow it to automatically analyze as prediction model operation and hereafter occur
The new daily record of each, and predict the menace level of this daily record, if it find that the daily record of high-risk grade, then notice dimension immediately
Shield personnel.
Logging level analysis method based on machine learning as described above, it is further characterized in that, the machine learning model
KNN or logistic regression algorithm can be selected as learning model.
Logging level analysis method based on machine learning as described above, it is further characterized in that, it is recorded in massive logs
In two parts of information segmentation, the ratio of training set and test set is 9:1.
Logging level analysis method based on machine learning as described above, it is further characterized in that, generate two-dimensional data
Structure, which is that all words of setting are corresponding, is classified as feature, sets last row result to label.
Logging level analysis method based on machine learning as described above, it is further characterized in that, for subsequent server
Every log information of middle appearance, is broken down into feature information, and corresponding level is then predicted after prediction model,
Prompt message is automatically generated further according to level.
Description of the drawings
Fig. 1, log analysis flow chart
Specific implementation mode
Log recording in journal file would generally include many information, and our prediction model needs to establish practical text
Relationship between word content (English) and this daily record rank, and other time of origins, user, place process etc. are to daily record
Rank influences less, to ignore.Therefore in the analysis of journal file, it is only necessary to out by this two key extracteds.
In concrete operations, because processor is more sensitive to number, it is therefore desirable to which text information is converted to digital information.
Relatively simple daily record rank is processed first, for example, can be that different daily record ranks assigns different numbers, such as
' error ' corresponding 0, ' warning ' corresponding 1, ' normal ' corresponding 2, and so on.And in fact, daily record word content is complicated
It is more, can not all be indicated with individual number for every daily record.In the present invention, it is by all log contents in training set
It is split as word, forms the structure of similar one-dimension array, the word content of every daily record includes only sub-fraction word therein,
If including word, indicates that the word not included is indicated with 0 with number 1.It illustrates:
Log recording log1 text informations are " User admin login ", and grade is ' normal ' (2), then data structure
As shown in the table:
login | … | … | admin | … | … | User | … | level | |
log1 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 2 |
Daily record word after the digitlization of table 1 and logging level structure
Wherein ' ... ' represents the word appeared in other log contents.Since daily record word content is ever-changing, so
The row of the table structure may be very more, but last row must be level, that is, the rank of this daily record.
It can be arranged that all words are corresponding to be classified as feature, set last row result to label.By a large amount of
Existing feature and level data are trained machine learning model, in the daily record to occur to future
Appearance is predicted.In view of although daily record word content may be very various, it is in a limited set eventually
In, therefore can be by obtaining a higher result of precision to a large amount of training of model.
To make the object, technical solutions and advantages of the present invention clearer, below in conjunction with specific embodiment to the present invention
It does and is further described in detail:
Attached drawing 1 show how the process that machine learning model is trained by massive logs data.
100,000 log informations in server are obtained first, therefrom extract daily record text information and logging level letter
Breath writes program and is digitized processing, forms the data structure of similar table 1, will have 100,000 row data in this structure.
Secondly, can be random arrange to this 100,000 data, is then split, wherein 90,000 are used as training set,
Remaining 10,000 are used as test manifold, how to test trained model accuracy.
On algorithm, since this is a supervised learning scene for classification, can select KNN or logistic regression scheduling algorithm as
Learning model, one higher model of precision of final choice is as final scheme.
After determining final learning model, the every log information occurred in later server is now broken down into feature
Then information predicts corresponding level after model, notify maintenance personnel to carry out subsequent processing come automatic according to level.
The key technology point of the present invention is application of the machine learning algorithm in word language processing, by the way that magnanimity is literary
Word information is converted into digital information, builds machine learning model, and be trained to it, obtains the higher prediction mould of a precision
Type.
It should be evident that illustrated above is only the specific embodiment of the present invention, for the common skill in this field
For art personnel, without creative efforts, other technical solutions can also be obtained according to above-described embodiment,
And the equivalent variations made within the protection scope of the present invention should all be fallen within the scope of protection of the present invention, and belong to the present invention
The range of protection.
In conclusion the logging level analysis method of the present invention based on machine learning, can liberate manual analysis
The time cost of daily record, full automation analysis and prediction log content, substantially increase the efficiency of analysis journal file, save
Time for solving the problems, such as.
Claims (5)
1. a kind of logging level analysis method based on machine learning, which is characterized in that this method specifically includes:
It obtains massive logs in server and records information;
Extraction daily record text information and logging level information in information are recorded from the massive logs;
The daily record text information of extraction and logging level information are digitized processing, generate two-dimensional data structure;
Massive logs are recorded into information random arrangement, are divided into two parts, a part is used for as training set to machine learning
Model is trained, and another part learns precision as test set, for testing it to machine learning model;
Select a higher model of precision as prediction model operation in the server, allow it automatically analyze hereafter occur it is every
One new daily record, and the menace level of this daily record is predicted, if it find that the daily record of high-risk grade, then notice safeguards people immediately
Member.
2. the logging level analysis method based on machine learning as described in claim 1, it is further characterized in that, the machine learning
Model can select KNN or logistic regression algorithm as learning model.
3. the logging level analysis method based on machine learning as claimed in claim 2, it is further characterized in that, in massive logs
In two parts for recording information segmentation, the ratio of training set and test set is 9:1.
4. the logging level analysis method based on machine learning as claimed in claim 3, it is further characterized in that, it generates two-dimensional
Data structure, which is that all words of setting are corresponding, is classified as feature, sets last row result to label.
5. the logging level analysis method based on machine learning as claimed in claim 4, it is further characterized in that, for subsequently taking
The every log information occurred in business device, is broken down into feature information, is then predicted after prediction model corresponding
Level automatically generates prompt message further according to level.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810075006.3A CN108280021A (en) | 2018-01-25 | 2018-01-25 | A kind of logging level analysis method based on machine learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810075006.3A CN108280021A (en) | 2018-01-25 | 2018-01-25 | A kind of logging level analysis method based on machine learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108280021A true CN108280021A (en) | 2018-07-13 |
Family
ID=62805334
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810075006.3A Pending CN108280021A (en) | 2018-01-25 | 2018-01-25 | A kind of logging level analysis method based on machine learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108280021A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108897674A (en) * | 2018-07-12 | 2018-11-27 | 郑州云海信息技术有限公司 | A kind of log analysis method and device |
CN110806962A (en) * | 2019-11-06 | 2020-02-18 | 星环信息科技(上海)有限公司 | Log level prediction method, device and storage medium |
CN111045847A (en) * | 2019-12-18 | 2020-04-21 | Oppo广东移动通信有限公司 | Event auditing method and device, terminal equipment and storage medium |
CN111565256A (en) * | 2019-02-13 | 2020-08-21 | 精工爱普生株式会社 | Information processing device, learning device, and non-transitory recording medium |
CN111708681A (en) * | 2020-06-15 | 2020-09-25 | 北京优特捷信息技术有限公司 | Log processing method, device, equipment and storage medium |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104331716A (en) * | 2014-11-20 | 2015-02-04 | 武汉图歌信息技术有限责任公司 | SVM active learning classification algorithm for large-scale training data |
CN104811452A (en) * | 2015-04-30 | 2015-07-29 | 北京科技大学 | Data mining based intrusion detection system with self-learning and classified early warning functions |
CN105139714A (en) * | 2015-10-10 | 2015-12-09 | 国电南瑞科技股份有限公司 | Visualized simulation training system and method for electrified railway traction substation |
CN105378699A (en) * | 2013-11-27 | 2016-03-02 | Ntt都科摩公司 | Automatic task classification based upon machine learning |
CN105528280A (en) * | 2015-11-30 | 2016-04-27 | 中电科华云信息技术有限公司 | Method and system capable of determining log alarm grades according to relationship between system logs and health monitoring |
CN107301118A (en) * | 2017-06-15 | 2017-10-27 | 中国科学院计算技术研究所 | A kind of fault indices automatic marking method and system based on daily record |
CN107577588A (en) * | 2017-09-26 | 2018-01-12 | 北京中安智达科技有限公司 | A kind of massive logs data intelligence operational system |
-
2018
- 2018-01-25 CN CN201810075006.3A patent/CN108280021A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105378699A (en) * | 2013-11-27 | 2016-03-02 | Ntt都科摩公司 | Automatic task classification based upon machine learning |
CN104331716A (en) * | 2014-11-20 | 2015-02-04 | 武汉图歌信息技术有限责任公司 | SVM active learning classification algorithm for large-scale training data |
CN104811452A (en) * | 2015-04-30 | 2015-07-29 | 北京科技大学 | Data mining based intrusion detection system with self-learning and classified early warning functions |
CN105139714A (en) * | 2015-10-10 | 2015-12-09 | 国电南瑞科技股份有限公司 | Visualized simulation training system and method for electrified railway traction substation |
CN105528280A (en) * | 2015-11-30 | 2016-04-27 | 中电科华云信息技术有限公司 | Method and system capable of determining log alarm grades according to relationship between system logs and health monitoring |
CN107301118A (en) * | 2017-06-15 | 2017-10-27 | 中国科学院计算技术研究所 | A kind of fault indices automatic marking method and system based on daily record |
CN107577588A (en) * | 2017-09-26 | 2018-01-12 | 北京中安智达科技有限公司 | A kind of massive logs data intelligence operational system |
Non-Patent Citations (1)
Title |
---|
郭子昂: "基于GPR预判模型的海量日志流实时异常检测研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108897674A (en) * | 2018-07-12 | 2018-11-27 | 郑州云海信息技术有限公司 | A kind of log analysis method and device |
CN111565256A (en) * | 2019-02-13 | 2020-08-21 | 精工爱普生株式会社 | Information processing device, learning device, and non-transitory recording medium |
CN111565256B (en) * | 2019-02-13 | 2022-04-22 | 精工爱普生株式会社 | Information processing device, learning device, and non-transitory recording medium |
CN110806962A (en) * | 2019-11-06 | 2020-02-18 | 星环信息科技(上海)有限公司 | Log level prediction method, device and storage medium |
CN111045847A (en) * | 2019-12-18 | 2020-04-21 | Oppo广东移动通信有限公司 | Event auditing method and device, terminal equipment and storage medium |
CN111045847B (en) * | 2019-12-18 | 2023-07-21 | Oppo广东移动通信有限公司 | Event auditing method, device, terminal equipment and storage medium |
CN111708681A (en) * | 2020-06-15 | 2020-09-25 | 北京优特捷信息技术有限公司 | Log processing method, device, equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108280021A (en) | A kind of logging level analysis method based on machine learning | |
CN105653444B (en) | Software defect fault recognition method and system based on internet daily record data | |
US10565519B2 (en) | Systems and method for performing contextual classification using supervised and unsupervised training | |
CN104067567A (en) | Systems and methods for spam detection using character histograms | |
CN102541999A (en) | Object-sensitive image search | |
CN111949480A (en) | Log anomaly detection method based on component perception | |
CN111078979A (en) | Method and system for identifying network credit website based on OCR and text processing technology | |
KR20190113680A (en) | Method and apparatus for generating test case for web pages | |
CN118296164A (en) | Automatic agricultural product information acquisition and updating method and system based on knowledge graph | |
CN112395513A (en) | Public opinion transmission power analysis method | |
CN115269438A (en) | Automatic testing method and device for image processing algorithm | |
CN116361147A (en) | Method for positioning root cause of test case, device, equipment, medium and product thereof | |
CN117544482A (en) | Operation and maintenance fault determining method, device, equipment and storage medium based on AI | |
CN112861530A (en) | Course setting analysis method based on text mining | |
CN116484109B (en) | Customer portrait analysis system and method based on artificial intelligence | |
CN117411780A (en) | Network log anomaly detection method based on multi-source data characteristics | |
CN111209158A (en) | Mining monitoring method and cluster monitoring system for server cluster | |
Eckstein et al. | Towards extracting customer needs from incident tickets in it services | |
Roelands et al. | Classifying businesses by economic activity using web-based text mining | |
KR20150142459A (en) | Automated system and method of instrument index | |
CN117272198B (en) | Abnormal user generated content identification method based on business travel business data | |
CN117556256B (en) | Private domain service label screening system and method based on big data | |
CN111598159B (en) | Training method, device, equipment and storage medium of machine learning model | |
CN117555501B (en) | Cloud printer operation and data processing method based on edge calculation and related device | |
CN112968941B (en) | Data acquisition and man-machine collaborative annotation method based on edge calculation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180713 |
|
RJ01 | Rejection of invention patent application after publication |