CN114118295A - Anomaly detection model training method, anomaly detection device and medium - Google Patents

Anomaly detection model training method, anomaly detection device and medium Download PDF

Info

Publication number
CN114118295A
CN114118295A CN202111485895.9A CN202111485895A CN114118295A CN 114118295 A CN114118295 A CN 114118295A CN 202111485895 A CN202111485895 A CN 202111485895A CN 114118295 A CN114118295 A CN 114118295A
Authority
CN
China
Prior art keywords
data
log data
detected
anomaly detection
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111485895.9A
Other languages
Chinese (zh)
Inventor
赵静
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Inspur Intelligent Technology Co Ltd
Original Assignee
Suzhou Inspur Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Inspur Intelligent Technology Co Ltd filed Critical Suzhou Inspur Intelligent Technology Co Ltd
Priority to CN202111485895.9A priority Critical patent/CN114118295A/en
Publication of CN114118295A publication Critical patent/CN114118295A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3347Query execution using vector based model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computational Linguistics (AREA)
  • Databases & Information Systems (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention provides an anomaly detection model training method, an anomaly detection device and a medium. The anomaly detection model training method comprises the following steps: obtaining log data for training, and extracting non-structural data in the log data, wherein the non-structural data comprises text data and numerical data. Text data is converted into word vectors, and data type data is converted into vectors. And inputting the word vectors and the vectors into a random forest model for training, and adjusting parameters of the random forest model to obtain an anomaly detection model. By the method, the non-structural data which belongs to the server system operation information in the log data used for training is extracted and trained to train the random forest model, so that the obtained abnormal detection model can rapidly distinguish normal data from abnormal data, interference of invalid information is reduced, and training efficiency is improved.

Description

Anomaly detection model training method, anomaly detection device and medium
Technical Field
The invention relates to the technical field of network security, in particular to an anomaly detection model training method, an anomaly detection device and a medium.
Background
Anomaly detection is a process of finding "minority patches" that attract our attention because the anomalous data is not the same as the majority of data. By acquiring abnormal data, the method is helpful for discovering potential problems such as structural defects, equipment faults and the like related to the server system. Timely anomaly detection helps system developers (or operators) locate problems in time and resolve them immediately, thereby reducing system downtime.
During the operation of the system, a log generation mode is usually adopted to record detailed operation information of the system during the operation, so that the log can be used as a main data source for carrying out anomaly detection on the system.
In the related art, when the log data of the system is subjected to the anomaly detection, two detection modes are mainly adopted. One is supervision type anomaly detection, which mainly uses a support vector machine and Logistic regression as main detection. Among them, a Support Vector Machine (SVM) is a generalized linear classifier that performs binary classification on data in a supervised learning (supervised learning) manner. Regression is also called logistic regression analysis, is a generalized linear regression analysis model, and is commonly used in the fields of data mining, automatic disease diagnosis, economic detection and the like. However, when the method is used for detection, Logistic regression cannot solve the problem that linearity is not separable, but the support vector machine can solve the problem, self parameters are difficult to adjust, and a large amount of labor cost is consumed during modeling.
The other is unsupervised anomaly detection, which mainly includes Principal Component Analysis (PCA), invariant mining and some clustering methods. However, the detection method takes a long time, and is prone to false detection, resulting in low detection accuracy.
Disclosure of Invention
Therefore, the technical problem to be solved by the present invention is to overcome the defects of low efficiency and low accuracy of anomaly detection performed on a system log in the prior art, so as to provide an anomaly detection model training method, an anomaly detection device and a medium.
According to a first aspect, the present invention provides a method of anomaly detection model training, the method comprising:
acquiring log data for training, and extracting non-structural data from the log data, wherein the non-structural data comprises text data and numerical data;
converting the text data into word vectors and converting the data type data into vectors;
and inputting the word vector and the vector into a random forest model for training, and adjusting parameters of the random forest model to obtain an anomaly detection model.
In the method, the non-structural data which belongs to the server system operation information in the log data used for training is extracted and trained to obtain the random forest model, so that the obtained abnormal detection model can rapidly distinguish normal data from abnormal data, the interference of invalid information is reduced, and the training efficiency is improved.
With reference to the first aspect, in a first embodiment of the first aspect, the extracting unstructured data in the log data includes:
and carrying out structural analysis on the log data through a Drain algorithm, and extracting non-structural data in the log data.
With reference to the first aspect or the first embodiment of the first aspect, in a second embodiment of the first aspect, the converting the text data into a word vector includes:
and converting the text data into Word vectors by using a Word2vec algorithm.
In this method, a technique of natural language processing can be applied to detection of log data, so that the obtained abnormality detection model can recognize the contents of the log data or the writing rule of the log data, thereby performing targeted detection.
According to a second aspect, the present invention also provides an anomaly detection method, the method comprising:
acquiring log data to be tested of a server system;
preprocessing the log data to be detected to obtain a word vector to be detected and a vector to be detected;
and inputting the word vector to be detected and the vector to be detected into an anomaly detection model to obtain an anomaly detection result of the log data, wherein the anomaly detection model is obtained by adopting the anomaly detection model training method of any one of the first aspect and the optional implementation mode.
In the mode, the acquired log data to be detected can be automatically detected based on the trained abnormity detection model, and the log data field needing to be detected is determined without artificial monitoring, so that when the server system is abnormal, the problem can be quickly positioned, and the problem can be timely solved.
With reference to the second aspect, in the first embodiment of the second aspect, the preprocessing the log data to obtain a to-be-detected word vector and a to-be-detected vector to be detected includes:
performing structural analysis on the log data to be tested through a Drain algorithm, and extracting non-structural data in the log data to be tested, wherein the non-structural data comprises text data and numerical data;
converting the text data into a Word vector to be detected through a Word2vec algorithm;
and converting the numerical data into a vector to be detected.
In the mode, the acquired log data to be detected can be automatically detected based on the trained abnormity detection model, and the log data field needing to be detected is determined without artificial monitoring, so that when the server system is abnormal, the problem can be quickly positioned, and the problem can be timely solved.
With reference to the second aspect or the first embodiment of the second aspect, in the second embodiment of the second aspect, if the log data to be tested includes multiple pieces of log data, the method further includes:
and if the abnormal detection result is log data with abnormal data, sending alarm information to a user, wherein the alarm information comprises the log data of which the abnormal detection result is the abnormal data in the log data to be detected.
In the method, a user can determine that the abnormal detection result in the log data to be detected is the log data with abnormal data according to the received alarm information, so that the fault can be quickly positioned, the problem can be solved in time, and the system downtime of the server can be reduced.
According to a third aspect, the present invention provides an abnormality detection model training apparatus, the apparatus including:
the device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring log data for training and extracting non-structural data from the log data, and the non-structural data comprises text data and numerical data;
the conversion unit is used for converting the text data into word vectors and converting the data type data into vectors;
and the training unit is used for inputting the word vectors and the vectors into a random forest model for training, and adjusting the parameters of the random forest model to obtain an anomaly detection model.
With reference to the third aspect, in a first embodiment of the third aspect, the obtaining unit includes:
and the extraction unit is used for carrying out structural analysis on the log data through a Drain algorithm and extracting non-structural data in the log data.
With reference to the third aspect or the first embodiment of the third aspect, in a second embodiment of the third aspect, the conversion unit includes:
and the conversion subunit is used for converting the text data into Word vectors through a Word2vec algorithm.
According to a fourth aspect, the present invention also provides an abnormality detection apparatus characterized by comprising:
the log acquiring unit is used for acquiring log data to be detected of the server system;
the preprocessing unit is used for preprocessing the log data to be detected to obtain a word vector to be detected and a vector to be detected;
and the detection unit is used for inputting the word vector to be detected and the vector to be detected into an anomaly detection model to obtain an anomaly detection result of the log data, wherein the anomaly detection model is obtained by adopting the anomaly detection model training method of any one of the first aspect and the optional implementation manner.
With reference to the fourth aspect, in a first embodiment of the fourth aspect, the preprocessing unit includes:
the analysis unit is used for carrying out structural analysis on the log data to be tested through a Drain algorithm and extracting non-structural data in the log data to be tested, wherein the non-structural data comprises text data and numerical data;
the first conversion unit is used for converting the text data into a Word vector to be detected through a Word2vec algorithm;
and the second conversion unit is used for converting the numerical data into a vector to be detected.
With reference to the fourth aspect or the first embodiment of the fourth aspect, in a second embodiment of the fourth aspect, if the log data to be tested includes a plurality of pieces of log data, the apparatus further includes:
and the alarm unit is used for sending alarm information to a user if the abnormal detection result is log data with abnormal data, wherein the alarm information comprises the log data of which the abnormal detection result is the abnormal data in the log data to be detected.
According to a fifth aspect, the embodiments of the present invention further provide a computer device, including a memory and a processor, where the memory and the processor are communicatively connected to each other, and the memory stores computer instructions, and the processor executes the computer instructions so as to execute the anomaly detection model training method of any one of the first aspect and the optional embodiments thereof or execute the anomaly detection method of any one of the second aspect and the optional embodiments thereof.
According to a sixth aspect, embodiments of the present invention further provide a computer-readable storage medium storing computer instructions for causing a computer to execute the anomaly detection model training method of the first aspect and any one of its optional embodiments or the anomaly detection method of the second aspect and any one of its optional embodiments.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.
Fig. 1 is a flowchart of a proposed anomaly detection model training method according to an exemplary embodiment.
Fig. 2 is a flow chart of a proposed anomaly detection method according to an exemplary embodiment.
FIG. 3 is a flow chart of another proposed anomaly detection method according to an example embodiment.
Fig. 4 is a block diagram of an anomaly detection model training apparatus according to an exemplary embodiment.
Fig. 5 is a block diagram of a structure of an abnormality detection apparatus according to an exemplary embodiment.
Fig. 6 is a schematic diagram of a hardware structure of a computer device according to an exemplary embodiment.
Detailed Description
The technical solutions of the present invention will be described clearly and completely with reference to the accompanying drawings, and it should be understood that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The anomaly detection is used for discovering the abnormal behavior of the system in time and plays an important role in the event management of a large-scale system. Timely anomaly detection helps system developers (or operators) to find and locate problems timely, and then solve the problems timely, so that the downtime of the server is reduced. The log is information for recording specific operation conditions of the server system during operation. Therefore, in the case of abnormality detection of the system operation state, detection can be performed from the log generated during the operation of the system.
In the related art, when the log data of the system is subjected to the anomaly detection, two detection modes are mainly adopted. One is supervision type anomaly detection, which mainly uses a support vector machine and Logistic regression as main detection. Among them, a Support Vector Machine (SVM) is a generalized linear classifier that performs binary classification on data in a supervised learning (supervised learning) manner. Regression is also called logistic regression analysis, is a generalized linear regression analysis model, and is commonly used in the fields of data mining, automatic disease diagnosis, economic detection and the like. However, when the method is used for detection, Logistic regression cannot solve the problem that linearity is not separable, but the support vector machine can solve the problem, self parameters are difficult to adjust, and a large amount of labor cost is consumed during modeling.
The other is unsupervised anomaly detection, which mainly includes Principal Component Analysis (PCA), invariant mining and some clustering methods. However, the detection method takes a long time, and is prone to false detection, resulting in low detection accuracy.
The embodiment of the invention provides an anomaly detection model training method, which is used in computer equipment, and it needs to be explained that an execution main body of the method can be an anomaly detection model training device, and the device can be realized to be part or all of the computer equipment in a software, hardware or software and hardware combination mode, wherein the storage equipment can be a terminal or a client or a server, the server can be a server or a server cluster consisting of a plurality of servers, and the terminal in the embodiment of the invention can be other intelligent hardware equipment such as a smart phone, a personal computer, a tablet computer, wearable equipment, an intelligent robot and the like. In the following method embodiments, the execution subject is a computer device as an example.
The computer device in this embodiment is configured to train an anomaly detection model, so as to deploy the trained anomaly detection model to a server, and detect whether an anomaly exists in a server system during operation according to log data generated by the operation of the server system. According to the anomaly detection model training method provided by the invention, when the anomaly detection model is trained, non-structural data in log data used for training is converted into word vectors and then input into the random forest model for training, so that the obtained anomaly detection model can automatically detect the obtained log data, and further, the interference of artificial subjective factors is avoided, and the detection efficiency is improved.
Fig. 1 is a flowchart of a proposed anomaly detection model training method according to an exemplary embodiment. As shown in fig. 1, the abnormality detection model training method includes steps S101 to S103 as follows.
In step S101, log data for training is acquired, and unstructured data is extracted from the log data.
In the embodiment of the present invention, the log data used for training may be historical log data of the server system during operation, wherein the quantity type of the log data may include normal data and abnormal data. The abnormal data may be log data intercepted during a failure time period when the server system fails. In one example, when the trained anomaly detection model is applied, the output anomaly detection result can clarify the fault type corresponding to the anomaly data, and the anomaly data can be further classified into specific faults such as a dead halt of a server system, an overrun of remote access connection and the like.
In the log data of the server system, structural data and non-structural data are included, wherein the non-structural data may include text data and numerical data. In the log data, the operation information of the server system is embodied by non-structural data, and the structural data can be understood as template data for reporting the log information of the server system, such as: XX logs in at XX and does not include specific operating information of the server system. Therefore, in order to improve the training efficiency and reduce the interference of useless information, the unstructured data in the log data is extracted for the subsequent training of the unstructured data.
In step S102, the text data is converted into a word vector, and the data type data is converted into a vector.
In the embodiment of the invention, the non-structural data is data which has an irregular or incomplete data structure, does not have a predefined data model and is inconvenient to represent by a database two-dimensional logic table. When training the random forest model, the training is performed by using structural data. Therefore, in order to train the random forest model smoothly, the extracted non-structural data is subjected to vectorization processing, text data in the non-structural data is converted into word vectors, data type data is converted into vectors, and the random forest model is trained according to the word vectors and the vectors.
In step S103, the word vectors and the vectors are input into a random forest model for training, and parameters of the random forest model are adjusted to obtain an anomaly detection model.
In the embodiment of the present invention, a random forest refers to a classifier that trains and detects a sample by using a plurality of trees. When the random forest model is trained, word vectors and vectors of log data used for training are input into the random forest model, and a nonlinear relation in the log data is mined through a random forest algorithm to obtain a corresponding abnormal detection result, wherein the abnormal detection result comprises data normality or data abnormality. The method comprises the steps of continuously adjusting parameters of the random forest model, improving the accuracy of the random forest model in detecting abnormal data, and finishing training of the random forest model when the accuracy of detection reaches a specified threshold value to obtain an abnormal detection model.
In an implementation scenario, to verify the accuracy of the anomaly detection model, log data for testing is obtained, wherein the data type of the log data for testing may include normal data or abnormal data. And obtaining word vectors and vectors input into the abnormal detection model by using the tested log data in the same data processing mode as the log data used for training, and further verifying the accuracy of the abnormal detection model according to the output result. If the verification result meets the requirement, the parameters of the abnormal detection model are fixed, so that detection can be performed based on the fixed parameters when the abnormal detection model is adopted to perform abnormal detection on the acquired log data to be detected in the server in the following process. If the verification result does not meet the requirement, the parameters of the abnormal detection model are readjusted until the verification result meets the requirement.
Through the embodiment, the non-structural data which belongs to the server system operation information in the log data used for training is extracted and trained to train the random forest model, so that the obtained abnormal detection model can rapidly distinguish normal data from abnormal data, the interference of invalid information is reduced, and the training efficiency is improved.
In an embodiment, when the obtained anomaly detection model detects log data, single detection can be performed on the log data, and batch detection can be performed on the log data. And if single detection is carried out on the log data, the obtained abnormal detection result is the abnormal detection result corresponding to the log data. If the log data are detected in batch, the log data belonging to the abnormal data are judged in the abnormal detection model, and when the abnormal detection result is further output, the log data are output in a targeted manner.
In another embodiment, when extracting the unstructured data in the log data, a Drain algorithm may be adopted to perform structure analysis on the log data, and the log data is split into a structural part and an unstructured part, so that the unstructured data is obtained according to the unstructured part.
In another embodiment, when vectorizing text data in unstructured data, Word2vec algorithm may be used to perform vectorization, and convert the text data into Word vectors, and then when training a random forest model, the natural language processing technique may be applied to detection of log data, so that the obtained anomaly detection model may identify log data content or writing rules of the log data, thereby performing targeted detection. The Word2Vec algorithm is a language model, and semantic knowledge can be learned from a large amount of text corpora in an unsupervised manner.
In an implementation scenario, the log data used for training is subjected to structural analysis by adopting a Drain algorithm, so that non-structural data including the server system operation information is obtained. And performing vectorization processing on the text data in the non-structural data by adopting a Word2Vec algorithm to obtain Word vectors corresponding to the text data. And vectorizing the data type data in the non-structural data to obtain a vector corresponding to the numerical type data. And inputting the word vectors and the vectors into a random forest model for training to obtain an anomaly detection model. The accuracy of the abnormal detection model is verified through the log data for testing, and if the verification result meets the requirement, the parameters of the abnormal detection model are fixed, so that detection can be carried out based on the fixed parameters when the abnormal detection model is adopted to carry out abnormal detection on the obtained log data to be tested in the server system in the subsequent process.
By the method, the random forest model is trained by extracting the unstructured data in the log data, so that the interference of invalid data is reduced, the training process is accelerated, and the training efficiency is improved. Moreover, vectorization processing is carried out on the text data through the Word2Vec algorithm, the natural language processing technology is favorably applied to the abnormal detection of the log data, and the obtained abnormal detection model can automatically detect the log data without artificial interference, so that the applicability of the abnormal detection model is improved.
Based on the same conception, the invention also provides an abnormality detection method. In the anomaly detection method, the adopted anomaly detection model is obtained by training by adopting any one of the anomaly detection model training methods provided by the invention. By the method, the acquired log data to be detected can be automatically detected based on the trained anomaly detection model, and the field of the log data to be detected does not need to be determined by artificial monitoring, so that when the server system is anomalous, the problem can be quickly positioned, and the problem can be timely solved.
Fig. 2 is a flow chart of a proposed anomaly detection method according to an exemplary embodiment. As shown in fig. 2, the abnormality detection method includes steps S201 to S203 as follows.
In step S201, log data to be tested of the server system is acquired.
In the embodiment of the present invention, the log data to be detected may be historical log data acquired within a specified time or log data obtained through real-time monitoring. And if the historical log data is acquired within the specified time, the log data to be detected is batch log data. And if the log data is obtained by real-time monitoring, the log data to be detected is single log data. In one example, the acquisition of batch log data for detection or the acquisition of single log data for inspection may be determined according to the data size of the server system or the detection requirement of the server system.
In step S202, the log data to be detected is preprocessed to obtain a word vector to be detected and a vector to be detected.
In the embodiment of the invention, because the log data to be detected comprises the non-structural data, in order to enable the abnormal detection model to detect the log data to be detected, the log data to be detected is preprocessed, and the log data to be detected is converted into a word vector to be detected and a vector to be detected, which can be used for detection.
In step S203, the word vector to be detected and the vector to be detected are input to the anomaly detection model, so as to obtain an anomaly detection result of the log data.
In the embodiment of the present invention, the anomaly detection result may include data normality or data anomaly.
And if the acquired log data to be detected is single log data, the anomaly detection model performs detection according to the input word vector to be detected and the input vector to be detected to obtain an anomaly detection result corresponding to the log data to be detected.
If the acquired log data to be detected is batch log data, when the abnormality detection model detects according to the input word vector to be detected and the input vector to be detected, the log data belonging to the abnormal data in the log data to be detected is firstly judged, and when an abnormality detection result is output, the log data is output in a targeted manner according to the log data.
Through the embodiment, the acquired log data to be detected can be automatically detected based on the trained abnormity detection model, and the log data field needing to be detected is determined without artificial monitoring, so that when the server system is abnormal, the problem can be quickly positioned, and the problem can be timely solved.
In an embodiment, in order to improve the accuracy of the anomaly detection result and reduce the interference of invalid information, when the log data to be detected is preprocessed, the log data to be detected is structurally analyzed through a Drain algorithm, and non-structural data in the log data to be detected is extracted. Wherein the unstructured data comprises text data and numerical data. The text data Word2vec algorithm is converted into a Word vector to be detected, the numerical data algorithm is converted into a vector to be detected, and then the non-structural data is converted into structural data, so that the abnormal detection model can automatically detect log data to be detected, automatic detection of the log data is realized, the detection efficiency is improved, and the labor cost is saved.
In one example, in order to enable the extracted unstructured data to only include text data and numerical data, before the unstructured data is extracted, the log data to be tested is cleaned, and irrelevant data such as punctuations and the like in the log data to be tested are removed, so that the extraction cleanliness is improved.
In another embodiment, when the anomaly detection result is data anomaly, the server system is characterized to be anomalous in the operation process, so that in order to facilitate a user to find out that the system of the server system has a fault in time, an alarm message is sent to the user to prompt the user that the current server system has the fault. In one example, the alarm information may be sent to a client used by the user so that the alarm information received by the user through the client identifies that the server system has a failure. In another example, if the server system includes a display for displaying the warning information, the warning information is sent to the display for displaying, so as to prompt the user.
In another embodiment, if the log data to be detected includes a plurality of pieces of log data, and it is detected that there is log data whose abnormal detection result is data abnormality in the log data to be detected, the user may be prompted by using the abnormality detection method shown in fig. 3. FIG. 3 is a flow chart of another proposed anomaly detection method according to an example embodiment. As shown in fig. 3, the abnormality detection method includes the following steps.
In step S301, log data to be tested of the server system is acquired.
In step S302, the log data to be detected is preprocessed to obtain a word vector to be detected and a vector to be detected.
In step S303, the word vector to be detected and the vector to be detected are input to the anomaly detection model, so as to obtain an anomaly detection result of the log data.
In step S304, if the abnormal detection result is log data with abnormal data, an alarm message is sent to the user.
In the embodiment of the invention, because the log data to be detected comprises a plurality of pieces of log data, the abnormal detection results corresponding to the log data may be different. And when the abnormal detection result is log data with abnormal data, representing the fault of the system of the server system in the running process. Therefore, in order to make the user clear that the current server system has a fault and clear that the abnormal detection result is the log data with abnormal data, the warning information is sent to the user. The alarm information comprises log data of which the abnormal detection result is abnormal data in the log data to be detected. And then the user can clarify the fault of the system of the server system in the operation process according to the received alarm information, and quickly locate the fault for the log data of the abnormal data according to the received abnormal detection result, so that the problem is solved in time, and the downtime of the server system is reduced.
Through the embodiment, a user can determine that the abnormal detection result in the log data to be detected is the log data with abnormal data according to the received alarm information, so that the fault can be quickly positioned, the problem can be timely solved, and the system downtime of the server can be reduced.
In one implementation scenario, implementing the anomaly detection method may involve a number of modules: log data splitting, unstructured data vectorization processing, machine learning model training, model deployment, model application and the like. The log data splitting module is used for splitting the log data into structural data and non-structural data and further extracting the non-structural data. And the unstructured data vectorization processing module is used for converting the text data into word vectors and converting the data type data into vectors. And the machine learning model training module is used for training the random forest model according to the word vectors and the vectors to obtain an anomaly detection model. And the model deployment module is used for deploying the trained anomaly detection model in the corresponding server. And the model application module is used for automatically acquiring the log data to be detected of the server system and detecting the log data through the abnormity detection model to obtain an abnormity detection result.
By the anomaly detection method provided by the invention, the log data for recording the operation of the server system can be automatically detected by deploying the anomaly detection model in the server, and further, the fault existing in the server system can be found in time according to the anomaly detection result, so that a user can locate and remove the fault in time, the operation safety and reliability of the server system are effectively improved, and high loss of an enterprise caused by the fault of the server system is avoided.
Based on the same inventive concept, the invention also provides an anomaly detection model training device.
Fig. 4 is a block diagram of an anomaly detection model training apparatus according to an exemplary embodiment. As shown in fig. 4, the abnormality detection model training apparatus includes: an acquisition unit 401, a conversion unit 402 and a training unit 403.
An obtaining unit 401, configured to obtain log data for training, and extract non-structural data from the log data, where the non-structural data includes text data and numerical data;
a conversion unit 402, configured to convert text data into word vectors and data type data into vectors;
and the training unit 403 is configured to input the word vectors and the vectors into a random forest model for training, and adjust parameters of the random forest model to obtain an anomaly detection model.
In one embodiment, the acquisition unit includes: and the extraction unit is used for carrying out structural analysis on the log data through a Drain algorithm and extracting non-structural data in the log data.
In another embodiment, a conversion unit includes: and the conversion subunit is used for converting the text data into Word vectors through a Word2vec algorithm.
The specific limitations and beneficial effects of the above training apparatus for the anomaly detection model can be referred to the limitations of the above training method for the anomaly detection model, and are not described herein again. The various modules described above may be implemented in whole or in part by software, hardware, and combinations thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
Based on the same inventive concept, the invention also provides an abnormality detection device.
Fig. 5 is a block diagram of a structure of an abnormality detection apparatus according to an exemplary embodiment. As shown in fig. 5, the abnormality detection device includes: a log acquisition unit 501, a preprocessing unit 502 and a detection unit 503.
A log obtaining unit 501, configured to obtain log data to be tested of a server system;
a preprocessing unit 502, configured to preprocess the log data to be detected, to obtain a word vector to be detected and a vector to be detected;
the detecting unit 503 is configured to input the word vector to be detected and the vector to be detected to an anomaly detection model, so as to obtain an anomaly detection result of the log data, where the anomaly detection model is obtained by training with any one of the above anomaly detection model training methods.
In one embodiment, the pre-processing unit 502 includes: and the analysis unit is used for performing structural analysis on the log data to be detected through a Drain algorithm, and extracting non-structural data in the log data to be detected, wherein the non-structural data comprises text data and numerical data. And the first conversion unit is used for converting the text data into the Word vector to be detected through a Word2vec algorithm. And the second conversion unit is used for converting the numerical data into a vector to be detected.
In another embodiment, if the log data to be tested includes a plurality of pieces of log data, the apparatus further includes: and the alarm unit is used for sending alarm information to the user if the abnormal detection result is the log data with abnormal data, wherein the alarm information comprises the log data of which the abnormal detection result is the abnormal data in the log data to be detected.
The specific limitations and beneficial effects of the above-mentioned abnormality detection device can be referred to the limitations of the above-mentioned abnormality detection method, and are not described herein again. The various modules described above may be implemented in whole or in part by software, hardware, and combinations thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
Fig. 6 is a schematic diagram of a hardware structure of a computer device according to an exemplary embodiment. As shown in fig. 6, the apparatus includes one or more processors 610 and a storage 620, where the storage 620 includes a persistent memory, a volatile memory, and a hard disk, and one processor 610 is taken as an example in fig. 6. The apparatus may further include: an input device 630 and an output device 640.
The processor 610, the memory 620, the input device 630, and the output device 640 may be connected by a bus or other means, such as the bus connection in fig. 6.
Processor 610 may be a Central Processing Unit (CPU). The Processor 610 may also be other general purpose processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components, or combinations thereof. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory 620, serving as a non-transitory computer-readable storage medium, includes a persistent memory, a volatile memory, and a hard disk, and may be used to store non-transitory software programs, non-transitory computer-executable programs, and modules, such as program instructions/modules corresponding to the service management method in the embodiments of the present application. The processor 610 executes various functional applications and data processing of the server system by executing non-transitory software programs, instructions and modules stored in the memory 620, so as to implement any one of the above-described distributed cluster capacity expansion methods.
The memory 620 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data used as needed or desired, and the like. Further, the memory 620 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, memory 620 optionally includes memory located remotely from processor 610, which may be connected to a data processing device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The input device 630 may receive input numeric or character information and generate key signal inputs related to user settings and function control. The output device 640 may include a display device such as a display screen.
One or more modules are stored in the memory 620 and, when executed by the one or more processors 610, perform the methods shown in fig. 1-3.
The product can execute the method provided by the embodiment of the invention, and has corresponding functional modules and beneficial effects of the execution method. Details of the technique not described in detail in the present embodiment may be specifically referred to the related description in the embodiments shown in fig. 1 to 3.
Embodiments of the present invention further provide a non-transitory computer storage medium, where a computer-executable instruction is stored in the computer storage medium, and the computer-executable instruction may execute the authentication method in any of the above method embodiments. The storage medium may be a magnetic Disk, an optical Disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a Flash Memory (Flash Memory), a Hard Disk (Hard Disk Drive, abbreviated as HDD) or a Solid State Drive (SSD), etc.; the storage medium may also comprise a combination of memories of the kind described above.
It should be understood that the above examples are only for clarity of illustration and are not intended to limit the embodiments. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. And are neither required nor exhaustive of all embodiments. And obvious variations or modifications therefrom are within the scope of the invention.

Claims (10)

1. An anomaly detection model training method, characterized in that the method comprises:
acquiring log data for training, and extracting non-structural data from the log data, wherein the non-structural data comprises text data and numerical data;
converting the text data into word vectors and converting the data type data into vectors;
and inputting the word vector and the vector into a random forest model for training, and adjusting parameters of the random forest model to obtain an anomaly detection model.
2. The method of claim 1, wherein extracting unstructured data from the log data comprises:
and carrying out structural analysis on the log data through a Drain algorithm, and extracting non-structural data in the log data.
3. The method of claim 1 or 2, wherein said converting the text data into a word vector comprises:
and converting the text data into Word vectors by using a Word2vec algorithm.
4. An anomaly detection method, characterized in that it comprises:
acquiring log data to be tested of a server system;
preprocessing the log data to be detected to obtain a word vector to be detected and a vector to be detected;
inputting the word vector to be detected and the vector to be detected into an anomaly detection model to obtain an anomaly detection result of the log data, wherein the anomaly detection model is obtained by training by adopting the anomaly detection model training method of any one of claims 1 to 3.
5. The method according to claim 4, wherein the preprocessing the log data to obtain a word vector to be detected and a vector to be detected comprises:
performing structural analysis on the log data to be tested through a Drain algorithm, and extracting non-structural data in the log data to be tested, wherein the non-structural data comprises text data and numerical data;
converting the text data into a Word vector to be detected through a Word2vec algorithm;
and converting the numerical data into a vector to be detected.
6. The method according to claim 4 or 5, wherein if the log data to be tested includes a plurality of pieces of log data, the method further comprises:
and if the abnormal detection result is log data with abnormal data, sending alarm information to a user, wherein the alarm information comprises the log data of which the abnormal detection result is the abnormal data in the log data to be detected.
7. An abnormality detection model training apparatus, characterized in that the apparatus comprises:
the device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring log data for training and extracting non-structural data from the log data, and the non-structural data comprises text data and numerical data;
the conversion unit is used for converting the text data into word vectors and converting the data type data into vectors;
and the training unit is used for inputting the word vectors and the vectors into a random forest model for training, and adjusting the parameters of the random forest model to obtain an anomaly detection model.
8. An abnormality detection apparatus, characterized in that the apparatus comprises:
the log acquiring unit is used for acquiring log data to be detected of the server system;
the preprocessing unit is used for preprocessing the log data to be detected to obtain a word vector to be detected and a vector to be detected;
a detection unit, configured to input the word vector to be detected and the vector to be detected into an anomaly detection model, so as to obtain an anomaly detection result of the log data, where the anomaly detection model is obtained by using the anomaly detection model training method according to any one of claims 1 to 3.
9. A computer device comprising a memory and a processor, wherein the memory and the processor are communicatively connected to each other, the memory stores computer instructions, and the processor executes the computer instructions to perform the anomaly detection model training method according to any one of claims 1 to 3 or the anomaly detection method according to any one of claims 4 to 6.
10. A computer-readable storage medium storing computer instructions for causing a computer to execute the abnormality detection model training method according to any one of claims 1 to 3 or the abnormality detection method according to any one of claims 4 to 6.
CN202111485895.9A 2021-12-07 2021-12-07 Anomaly detection model training method, anomaly detection device and medium Pending CN114118295A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111485895.9A CN114118295A (en) 2021-12-07 2021-12-07 Anomaly detection model training method, anomaly detection device and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111485895.9A CN114118295A (en) 2021-12-07 2021-12-07 Anomaly detection model training method, anomaly detection device and medium

Publications (1)

Publication Number Publication Date
CN114118295A true CN114118295A (en) 2022-03-01

Family

ID=80367320

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111485895.9A Pending CN114118295A (en) 2021-12-07 2021-12-07 Anomaly detection model training method, anomaly detection device and medium

Country Status (1)

Country Link
CN (1) CN114118295A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114881112A (en) * 2022-03-31 2022-08-09 北京优特捷信息技术有限公司 System anomaly detection method, device, equipment and medium
CN115333973A (en) * 2022-08-05 2022-11-11 武汉联影医疗科技有限公司 Equipment abnormality detection method and device, computer equipment and storage medium
EP4290383A1 (en) * 2022-06-10 2023-12-13 Nokia Solutions and Networks Oy Method and apparatus for anomaly detection
CN115333973B (en) * 2022-08-05 2024-07-23 武汉联影医疗科技有限公司 Device abnormality detection method, device, computer device, and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111344721A (en) * 2017-11-13 2020-06-26 国际商业机器公司 Anomaly detection using cognitive computation
CN111339052A (en) * 2020-02-28 2020-06-26 中国银联股份有限公司 Unstructured log data processing method and device
CN113239006A (en) * 2021-05-12 2021-08-10 中国联合网络通信集团有限公司 Log detection model generation method and device and log detection method and device
CN113656254A (en) * 2021-08-25 2021-11-16 上海明略人工智能(集团)有限公司 Abnormity detection method and system based on log information and computer equipment

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111344721A (en) * 2017-11-13 2020-06-26 国际商业机器公司 Anomaly detection using cognitive computation
CN111339052A (en) * 2020-02-28 2020-06-26 中国银联股份有限公司 Unstructured log data processing method and device
CN113239006A (en) * 2021-05-12 2021-08-10 中国联合网络通信集团有限公司 Log detection model generation method and device and log detection method and device
CN113656254A (en) * 2021-08-25 2021-11-16 上海明略人工智能(集团)有限公司 Abnormity detection method and system based on log information and computer equipment

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114881112A (en) * 2022-03-31 2022-08-09 北京优特捷信息技术有限公司 System anomaly detection method, device, equipment and medium
EP4290383A1 (en) * 2022-06-10 2023-12-13 Nokia Solutions and Networks Oy Method and apparatus for anomaly detection
CN115333973A (en) * 2022-08-05 2022-11-11 武汉联影医疗科技有限公司 Equipment abnormality detection method and device, computer equipment and storage medium
CN115333973B (en) * 2022-08-05 2024-07-23 武汉联影医疗科技有限公司 Device abnormality detection method, device, computer device, and storage medium

Similar Documents

Publication Publication Date Title
US10795753B2 (en) Log-based computer failure diagnosis
CN113282461B (en) Alarm identification method and device for transmission network
US20210035022A1 (en) Method for updating service system electronic device, and readable storage medium
CN111435366A (en) Equipment fault diagnosis method and device and electronic equipment
JP2018045403A (en) Abnormality detection system and abnormality detection method
CN114118295A (en) Anomaly detection model training method, anomaly detection device and medium
CN105577440A (en) Network fault time location method and analyzing device
US20200166921A1 (en) System and method for proactive repair of suboptimal operation of a machine
US20190138542A1 (en) Classification of log data
CN109145030B (en) Abnormal data access detection method and device
CN116089231B (en) Fault alarm method and device, electronic equipment and storage medium
CN113313280B (en) Cloud platform inspection method, electronic equipment and nonvolatile storage medium
CN115269314A (en) Transaction abnormity detection method based on log
CN116361147A (en) Method for positioning root cause of test case, device, equipment, medium and product thereof
CN111143191A (en) Website testing method and device, computer equipment and storage medium
CN114647558A (en) Method and device for detecting log abnormity
CN113282920A (en) Log abnormity detection method and device, computer equipment and storage medium
CN110838940B (en) Underground cable inspection task configuration method and device
CN115062144A (en) Log anomaly detection method and system based on knowledge base and integrated learning
CN114756850A (en) Data acquisition method, device, equipment and storage medium
CN115186001A (en) Patch processing method and device
CN114881112A (en) System anomaly detection method, device, equipment and medium
CN113010339A (en) Method and device for automatically processing fault in online transaction test
CN112860527A (en) Fault monitoring method and device of application server
CN112131090A (en) Business system performance monitoring method and device, equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination