WO2022227388A1 - Log anomaly detection model training method, apparatus and device - Google Patents

Log anomaly detection model training method, apparatus and device Download PDF

Info

Publication number
WO2022227388A1
WO2022227388A1 PCT/CN2021/120446 CN2021120446W WO2022227388A1 WO 2022227388 A1 WO2022227388 A1 WO 2022227388A1 CN 2021120446 W CN2021120446 W CN 2021120446W WO 2022227388 A1 WO2022227388 A1 WO 2022227388A1
Authority
WO
WIPO (PCT)
Prior art keywords
log
anomaly detection
word
detection model
trained
Prior art date
Application number
PCT/CN2021/120446
Other languages
French (fr)
Chinese (zh)
Inventor
吴凡
阿克尔·亚历山大
维特科普·托尔斯滕·菲利普
高·奥德伊
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2022227388A1 publication Critical patent/WO2022227388A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3034Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system component is a storage system, e.g. DASD based or network based
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3037Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system component is a memory, e.g. virtual memory, cache
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3051Monitoring arrangements for monitoring the configuration of the computing system or of the computing system component, e.g. monitoring the presence of processing resources, peripherals, I/O links, software programs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3065Monitoring arrangements determined by the means or processing involved in reporting the monitored data
    • G06F11/3072Monitoring arrangements determined by the means or processing involved in reporting the monitored data where the reporting involves data filtering, e.g. pattern matching, time or event triggered, adaptive or policy-based reporting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Definitions

  • the present application relates to the field of artificial intelligence, and in particular, to a log anomaly detection model training method, device, and device.
  • HDD Hard disk drives
  • network devices such as routers, switches, etc.
  • processors will generate various logs during operation to record their own status and important events. Information can be represented through logs, so logs can be used for anomaly detection and troubleshooting.
  • a user wants to perform anomaly detection on the logs generated by a specific object (such as a certain type of memory produced by a certain manufacturer), he mainly obtains the historical logs generated by the specific object as training Samples are used to train the initial log anomaly detection model to obtain a trained log anomaly detection model with better detection effect for the logs of the specific object, and then use the trained model to implement anomaly detection for the logs generated by the specific object.
  • a specific object such as a certain type of memory produced by a certain manufacturer
  • the log anomaly detection model trained by the above method has the problem of low generalization ability, which causes the user to perform anomaly detection on the logs generated by similar specific objects (such as the memory of another model produced by another manufacturer).
  • the log anomaly detection model that has been trained cannot be used, and the historical logs generated by similar specific objects can only be re-acquired as training samples to train the initial log anomaly detection model, and log anomalies with better detection effect can be obtained for the logs generated for similar specific objects.
  • Detecting models, for different specific objects the process of retraining to obtain new models usually consumes a lot of manpower and time costs, and the efficiency is low.
  • the present application provides a log anomaly detection model training method, device and equipment, which can solve the problem that the log anomaly detection model obtained by training in the prior art has low generalization ability, which causes users to perform anomaly detection on logs generated by similar specific objects. , it is necessary to train a new log anomaly detection model for the specific object, which will cost a lot of manpower and time, and has low efficiency.
  • a first aspect provides a method for training a log anomaly detection model, the method comprising:
  • the second log sample set is obtained by processing log data of a target sub-object, and the target sub-object belongs to the target object;
  • the model training method provided by the present application can solve the problem that the log anomaly detection model obtained by training in the prior art has a low generalization ability, which causes the user to perform anomaly detection on logs generated by similar specific objects.
  • a new log anomaly detection model needs to be retrained for the specific object, it will cost a lot of manpower and time, and the efficiency will be low.
  • the target object includes at least one of the following sub-objects: hard disk, memory, flash memory, network device and processor, and the target sub-object is any one of the target objects Subobjects of the type.
  • the first log sample set includes m log samples, where m is a natural number greater than 1, and the initial log anomaly detection model is pre-trained by using the first log sample set to obtain Pre-trained log anomaly detection models, including:
  • the initial log anomaly detection model is pre-trained to obtain a pre-trained log anomaly detection model.
  • the initial log anomaly detection model is pre-trained through the m word sequences to obtain a pre-trained log anomaly detection model, including:
  • the initial log anomaly detection model is pre-trained through the masked m word sequences to obtain a pre-trained log anomaly detection model.
  • the initial log anomaly detection model is pre-trained through the masked m word sequences to obtain a pre-trained log anomaly detection model, which can better learn the context information of the masked words.
  • the pre-trained log anomaly detection model learns the semantic information of each word sequence, so that the trained log anomaly detection model obtained subsequently can detect whether the to-be-detected log is abnormal according to the semantic information of the to-be-detected log.
  • the initial log anomaly detection model is pre-trained with the m word sequences processed by the mask to obtain a pre-trained log anomaly detection model, including:
  • the word embedding vector corresponding to each word is used to represent each word.
  • a multi-dimensional vector, the position embedding vector corresponding to each word represents the position of each word in the word sequence to which it belongs;
  • the word embedding vector and the position embedding vector corresponding to each word in the m word sequences after the mask processing respectively, obtain m first row vectors corresponding to the m word sequences after the mask processing;
  • pre-train the initial log anomaly detection model to obtain a pre-trained log anomaly detection model.
  • using the m first row vectors to pre-train an initial log anomaly detection model to obtain the pre-trained log anomaly detection model including:
  • the m first row vectors are respectively input into the initial log anomaly detection model for training to obtain m second row vectors;
  • an initial log anomaly detection model is trained to obtain the pre-trained log anomaly detection model and a target cluster center.
  • the method further includes:
  • a classification threshold is determined according to the percentile corresponding to the loss of the m second row vectors to the target cluster center, wherein the classification threshold is used for the trained log anomaly detection model to detect the to-be-detected Anomaly detection is performed on the log, and the detection result is obtained.
  • the formula for obtaining the loss from the m second row vectors to the initial cluster center is:
  • V i represents the ith second row vector among the m second row vectors
  • c represents the initial cluster center
  • loss(c,V i ) represents the ith second row vector to The loss of the initial cluster center
  • i is a natural number.
  • a log anomaly detection model training device comprising:
  • an acquisition module configured to acquire a first log sample set, wherein the first log sample set is obtained by processing log data of the target object
  • a training module which pre-trains the initial log anomaly detection model through the first log sample set to obtain a pre-trained log anomaly detection model
  • the obtaining module is further configured to obtain a second log sample set, wherein the second log sample set is obtained by processing log data of a target sub-object, and the target sub-object belongs to the target object;
  • the training module is further configured to fine-tune the pre-trained log anomaly detection model through the second log sample set to obtain a trained log anomaly detection model.
  • the target object includes at least one of the following sub-objects: hard disk, memory, flash memory, network device and processor, and the target sub-object is any one of the target objects Subobjects of the type.
  • the first log sample set includes m log samples, where m is a natural number greater than 1, and the training module is specifically used for:
  • the initial log anomaly detection model is pre-trained to obtain a pre-trained log anomaly detection model.
  • the training module is specifically used for:
  • the initial log anomaly detection model is pre-trained through the masked m word sequences to obtain a pre-trained log anomaly detection model.
  • the training module is specifically used for:
  • the word embedding vector corresponding to each word is a multi-dimensional representation of each word.
  • the position embedding vector corresponding to each word represents the position of each word in the word sequence to which it belongs;
  • the word embedding vector and the position embedding vector corresponding to each word in the m word sequences after the mask processing respectively, obtain m first row vectors corresponding to the m word sequences after the mask processing;
  • pre-train the initial log anomaly detection model to obtain a pre-trained log anomaly detection model.
  • the training module is specifically used for:
  • the m first row vectors are respectively input into the initial log anomaly detection model for training, and m second row vectors are obtained, wherein the m second row vectors and the m word sequences processed by the mask There is a one-to-one correspondence, and each second row vector in the m second row vectors includes the semantic information of the word sequence after its corresponding mask processing;
  • an initial log anomaly detection model is trained to obtain a pre-trained log anomaly detection model and a target cluster center.
  • the training module is further used for:
  • a classification threshold is determined according to the percentile corresponding to the loss of the m second row vectors to the target cluster center, wherein the classification threshold is used for the trained log anomaly detection model to detect the to-be-detected Anomaly detection is performed on the log, and the detection result is obtained.
  • the formula for obtaining the loss from the m second row vectors to the initial cluster center is:
  • V i represents the ith second row vector among the m second row vectors
  • c represents the initial cluster center
  • loss(c,V i ) represents the ith second row vector to The loss of the initial cluster center
  • i is a natural number.
  • a non-transitory computer-readable storage medium stores instructions for implementing the first aspect or any possible implementation of the first aspect. method provided.
  • a computing device in a fourth aspect, includes a processor and a memory; the processor is configured to execute instructions stored in the memory, so that the computing device implements the first aspect or the first aspect above Methods provided by any possible implementation.
  • a computer program product including a computer program, which, when the computer program is read and executed by a computing device, causes the computing device to perform the above-mentioned first aspect or any possible implementation of the first aspect method provided.
  • Fig. 1 is the schematic diagram of a kind of prior art involved in this application
  • FIG. 2 is a schematic diagram of masking a word in an input sequence by a masked language model (MLM) method involved in the present application;
  • MLM masked language model
  • FIG. 3 is a schematic diagram of the principle of a log anomaly detection model training method provided by the present application.
  • FIG. 4 is a schematic flowchart of a method for training a log anomaly detection model provided by the present application
  • FIG. 5 is a schematic diagram of obtaining a word sequence corresponding to the i-th first log sample provided by the present application
  • FIG. 6 is a schematic diagram of a first row vector and a second row vector corresponding to a word sequence obtained after mask processing provided by the present application;
  • FIG. 7 is a schematic diagram of an exemplary word embedding vector and position embedding vector provided by the present application.
  • FIG. 8 is a schematic diagram of an exemplary word vector provided by the present application.
  • FIG. 9 is a schematic flowchart of a pre-trained log anomaly detection model provided by the present application.
  • Fig. 10 is the schematic diagram of using m first row vectors involved in the present application to pre-train the initial log anomaly detection model to obtain m second row vectors and target cluster centers;
  • FIG. 11 is a schematic flowchart of a log anomaly detection method involved in the present application.
  • FIG. 12 is a schematic diagram of anomaly detection obtained by performing anomaly detection on the first row vector x corresponding to the sequence to be detected involved in the present application;
  • FIG. 13 is a schematic structural diagram of a log anomaly detection model training device provided by the present application.
  • FIG. 14 is a schematic structural diagram of a computing device provided by the present application.
  • first and second in the embodiments of the present application are only used for the purpose of description, and cannot be understood as indicating or implying relative importance or implying the number of indicated technical features. Thus, a feature defined as “first” or “second” may expressly or implicitly include one or more of that feature.
  • “at least one” refers to one or more, and “multiple” refers to two or more.
  • “And/or”, which describes the association relationship of the associated objects indicates that there can be three kinds of relationships, for example, A and/or B, which can indicate: the existence of A alone, the existence of A and B at the same time, and the existence of B alone, where A, B can be singular or plural.
  • the character “/” generally indicates that the associated objects are an "or” relationship.
  • “At least one of the following” or similar expressions refers to any combination of these items, including any combination of a single item(s) or a plurality of items(s).
  • At least one (a) of a, b or c may represent: a, b, c, a-b, a-c, b-c or a-b-c, wherein a, b, c may be single or multiple.
  • Logs are records generated by objects such as hard disks, network devices, processors, etc., which are used to indicate the status of hard disks, network devices, processors, etc. and what events have occurred. For example, hard drives generate logs when a failure occurs or when it is thought to fail.
  • the log is generally stored in the device in the form of a log file, and the log file may be a directly readable text file or a machine-readable binary file, or a file existing in other forms, which is not specifically limited in this application.
  • Each log file consists of line-by-line log records, one or several consecutive records describe an independent event, and a log record describing an independent event can be called a log entry.
  • a log file contains multiple log entries.
  • a log entry usually contains the event time, event content, event type, event level, and so on.
  • the formats of log entries generated by different objects are different.
  • the format of log entries generated by device A is: event occurrence time, identification of the device accessing device A, and event content.
  • a log entry generated by device A contains 20 characters.
  • the format of the log entry generated by device B is: the identifier of the device accessing device B, the event occurrence time and the event content, and a log generated by device B contains 50 characters.
  • Neural network which can be composed of neural units (also called neurons).
  • the neural unit can refer to the operation unit that takes the variable x s and the intercept b as input, and the output of the operation unit can be:
  • W s is the weight of x s
  • b is the bias of the neural unit.
  • f is an activation function of the neural unit, which is used to introduce nonlinear characteristics into the neural network to convert the input signal in the neural unit into an output signal. The output signal of this activation function can be used as the input of the next convolutional layer.
  • the activation function can be a sigmoid function and other functions, which are not limited here.
  • a neural network is a network formed by connecting many of the above single neural units together, that is, the output of one neural unit can be the input of another neural unit.
  • the input of each neural unit can be connected with the local receptive field of the previous layer to extract the features of the local receptive field, and the local receptive field can be an area composed of several neural units.
  • Pre-training refers to a process in which the initial model is trained by using a large data set, so that the initial model can learn to identify common features in the large data set, and the model is obtained by pre-training (hereinafter referred to as pre-training).
  • Model has a strong generalization ability, which can provide high-quality model parameters for subsequent model training on a specific data set, and can adapt to a variety of specific data sets.
  • Fine-tuning refers to a process of further training a pre-trained model using a specific data set to obtain a trained model applied to the specific data set.
  • the amount of data for a specific dataset used in the fine-tuning stage is smaller than the amount of data for a larger dataset used in the pre-training stage, e.g.
  • the ratio of the amount is 1:100, 1:500, 1:1000, etc., which is not specifically limited here; the number of times that the pre-training model is trained using a specific data set in the fine-tuning stage is less than that using a large data set in the pre-training stage.
  • the number of times of training for example, in order to get a pre-trained model, set the number of times to train the initial model using a large dataset to 100 times in the pre-training phase.
  • the number of times the model is trained is 20 times.
  • MLM method a pre-training method of the model, the model trained by this method can learn the semantic information of the input sequence, and the semantic information is stored in the vector output by the model corresponding to the word "CLS" in the input sequence.
  • MLM masked sequence to sequence
  • the input sequence is ⁇ CLS> ⁇ raise> ⁇ head> ⁇ look> ⁇ ming> ⁇ month> ⁇ low> ⁇ head> ⁇ think> ⁇ gu> ⁇ xiang>
  • the input sequence Including 11 words, 15% of the words randomly selected by MLM from the input sequence are the third word "head” and the seventh word "low” in the input sequence, MLM replaces the randomly selected input with the special symbol MASK
  • the third word in the sequence, "head” replaces the seventh word "low” in the randomly chosen input sequence with the random word "person”.
  • the advantage of masking the sequences input to the model for training is that it can improve the fault tolerance and inference accuracy of the pre-trained model.
  • the first input sequence is ⁇ CLS> ⁇ raise> ⁇ head> ⁇ look> ⁇ ming> ⁇ month> ⁇ low> ⁇ head> ⁇ think> ⁇ so> ⁇ Township>, that is, the sequence without mask processing
  • use the model to learn the first sequence because each word in the first sequence is not masked, that is, each word in the first sequence are known, therefore, the model only needs to learn the words in the first sequence, without learning the context of the words in the first sequence, it can learn that the semantic information of the first sequence is "Looking up at the bright moon” Looking down and thinking about my hometown”, therefore, the resulting pre-trained model usually does not have the ability to reason about the semantic information of the sequence based on the context of the words in the sequence.
  • the model inference obtains the second sequence It is less likely that the semantic information of the second sequence is "looking up at the bright moon and thinking about the hometown", and it is more likely that the semantic information of the second sequence is "homesick” or "looking at the bright moon and homesick", the fault tolerance of the model and the accuracy rate is lower.
  • the first input sequence is ⁇ CLS> ⁇ raise> ⁇ MASK> ⁇ look> ⁇ MASK> ⁇ month> ⁇ low> ⁇ head> ⁇ think> ⁇ so> ⁇ MASK>, that is After masking the sequence, use the model to learn the first sequence, because some words in the first sequence (ie ⁇ head> ⁇ ming> ⁇ township>) are masked, that is to say, part of the first sequence The word is unknown, and another part of the word is known. Therefore, the model not only needs to learn the known words in the first sequence, but also learn the masked words according to the context of the masked words in the first sequence.
  • the semantic information of the first sequence can be learned as "looking up at the bright moon and bowing your head and thinking about hometown". Therefore, the obtained pre-trained model usually has the ability to reason about the context of the words in the sequence. The ability to sequence semantic information.
  • the pre-trained model is used to infer the semantic information of the second sequence ⁇ CLS> ⁇ ju> ⁇ wang> ⁇ ming> ⁇ month> ⁇ head> ⁇ thinking> ⁇ xiang>, it can be seen that the second sequence is similar to Compared with the first sequence, the two words ⁇ low> and ⁇ hence> are missing, and the two words ⁇ ming> and ⁇ township> are more, because the pre-trained model has the ability to infer the semantic information of the sequence according to the context of the words in the sequence , therefore, it is more likely that the semantic information of the second sequence of model inference is "looking up at the bright moon and bowing your head to think of hometown", and the semantic information of the second sequence obtained by its inference is the possibility of "homesickness" or "homesickness while looking at the bright moon”
  • the smaller the model the higher the fault tolerance and accuracy of the model.
  • Log anomaly detection which refers to detecting the event information included in the log entry, to determine whether the event information included in the log entry is the abnormal information of the device that generated the log entry, and determining whether the event information included in the log entry is the device In the case of abnormal information, it is determined that the device is abnormal.
  • the initial log anomaly detection model refers to a model (also called an algorithm) that is not trained using log training samples. detected model.
  • Anomaly detection is one of the supporting technologies to ensure system security.
  • the system's hard disk, processor, and network devices that provide network services for the system will generate various log files to record the system operation status and events.
  • the log contains rich information, and a large amount of log data contains The huge amount of information provides a way for system anomaly detection, making log anomaly detection a research hotspot in the field of anomaly detection.
  • using log training samples to train a trained log anomaly detection model to perform log anomaly detection is a relatively popular log anomaly detection method at present.
  • the log anomaly detection model trained by the existing log anomaly detection model training method has the problem of low generalization ability, so that when the user implements anomaly detection on logs generated by similar specific objects, the user cannot use the already trained log anomaly detection model.
  • the log anomaly detection model can only re-acquire the historical logs generated by similar specific objects as training samples to retrain the initial log anomaly detection model, and obtain a log anomaly detection model with better detection effect for logs generated by similar specific objects.
  • the process of obtaining a new model usually consumes a lot of manpower and time, and is inefficient.
  • the user has already trained a trained log anomaly detection model A that has better detection effect on logs generated by the hard disk
  • the model A is generated for the hard disk
  • the log generated by the memory has a good detection effect, but the log generated by the memory usually has a poor detection effect.
  • the user uses the historical log generated by the memory as a training sample to continue to train the model A, due to the low generalization ability of the model A, it is usually A log anomaly detection model with better detection effect for logs generated in memory cannot be obtained.
  • Users can only re-acquire historical logs generated in memory as training samples, and use the re-acquired training samples to train the initial log anomaly detection model, and obtain a log anomaly detection model for memory-generated logs.
  • the log has a well-trained log anomaly detection model with better detection effect.
  • model B a trained log anomaly detection model B (hereinafter referred to as model B) that has a better detection effect on the logs generated by the hard disk of type B produced by manufacturer B (hereinafter referred to as the B hard disk)
  • model C a trained log anomaly detection model
  • the present application provides a log anomaly detection model training method.
  • the model training method provided by the present application includes two stages: pre-training and fine-tuning.
  • the first log sample set from the target object (the target object includes multiple target sub-objects) can be used to pre-train the initial log anomaly detection model to obtain a model with high-quality model parameters and strong generalization ability.
  • the second log sample set from the target sub-object (the target sub-object belongs to the target object) is used to fine-tune the pre-trained log anomaly detection model to obtain a trained log anomaly detection model for the target sub-object.
  • the model training method provided by the present application can solve the problem of low generalization ability of the log anomaly detection model obtained by training in the prior art, and achieve the purpose of improving the efficiency of model training.
  • the target object includes but is not limited to at least one of the following sub-objects: hard disk, memory, flash memory, network device and processor; the target sub-object is any type of sub-object in the target object. It is easy to understand that the above-mentioned target object and target sub-object are merely exemplary examples, which are not specifically limited in this application.
  • the target sub-object may be hard disks of different models.
  • the target sub-object may be hard disk or memory.
  • the target object may include as many sub-objects as possible, and the data volume of the first log sample set obtained from the target object may also be as large as possible, so that the parameters of the pre-trained log anomaly detection model obtained in this way are of better quality. , the generalization ability of the model is stronger.
  • the method provided by the present application will be described in detail below with reference to the schematic flowchart shown in FIG. 4 . As shown in FIG. 4 , the method includes the following steps:
  • the computing device acquires a first log sample set including m first log samples, where the first log sample set is obtained by processing log data of a target object.
  • log entries include event occurrence time, event content, event type, event level, etc., among which the event content can reflect the event that occurred.
  • event content can reflect the event that occurred.
  • the computing device may obtain m first event contents from a large number of log entries included in the log data of a large number of target objects as m first log samples, and delete the event contents except for the event contents in the large number of log entries. other parts other than the event occurrence time, event type, event level, etc.
  • the log data of a large number of target objects may be obtained by a computing device from crawlers on the Internet, or collected manually from the target objects, which is not limited herein.
  • the log data of the target object includes log entry A: 2021/06/03Thu 18:18:33PD_Vendor Done Check done,0xd ms.Flag 8ALL, where 2021/06/03Thu 18:18:33 is the event occurrence time, PD_Vendor Done Check done, 0xd ms.Flag 8 is the event content, ALL is the event level, then the first log sample obtained by the computing device from log entry A is PD_Vendor Done Check done, 0xd ms.Flag 8.
  • m first event contents are obtained from the log data of a large number of target objects as m first log samples, and the event occurrence time, event type, event level, etc. in the log data of a large number of target objects except the event contents are deleted.
  • the difference between the formats of the m first log samples can be shielded, so as to achieve the purpose of increasing the number of first log samples in the first log sample set.
  • the computing device can obtain as many first log samples as possible and add them to the log anomaly detection model.
  • the computing device may also select m log entries with the same format from the log data of a large number of target objects directly as m first log samples, or obtain m preset contents from the log data of a large number of target objects
  • the preset content includes, in addition to the event content in the log entry, other content other than the event content in the log entry, such as event level and/or event type.
  • the computing device performs word segmentation on the m first log samples, respectively, to obtain m word sequences corresponding to the m log samples.
  • the process for the computing device to obtain the word sequence corresponding to the i-th first log sample includes:
  • the computing device performs word segmentation on the log sample, and the obtained first word sequence is:
  • the first word sequence includes a mixed word composed of numbers and characters
  • no replacement operation is required, and the first word sequence is directly determined as the second word sequence.
  • the computing device will The second word sequence obtained after replacing the mixed word 0xd in the word sequence ⁇ PD_VendorDone> ⁇ Check> ⁇ done> ⁇ 0xd> ⁇ ms> ⁇ Flag> ⁇ 8> is:
  • the above-mentioned replacement of the compound word with number is only an example, and in a specific implementation, the compound word may also be replaced with other words such as num or sep, which is not specifically limited here.
  • the obtained CLS mark at the beginning of the third sequence sentence marks the beginning of the third word sequence.
  • the computing device adds a CLS tag to the beginning of the sentence of the second word sequence ⁇ PD_VendorDone> ⁇ Check> ⁇ done> ⁇ number> ⁇ ms> ⁇ Flag> ⁇ 8>, and the obtained third The word sequence is:
  • the third word sequence ⁇ CLS> ⁇ PD_VendorDone> ⁇ Check> ⁇ done> ⁇ number> ⁇ ms> ⁇ Flag> ⁇ 8> includes The number of words in 8 is 8, which is less than the preset threshold of 10, then the computing device adds two words at the end of the sentence of the third word sequence ⁇ CLS> ⁇ PD_VendorDone> ⁇ Check> ⁇ done> ⁇ number> ⁇ ms> ⁇ Flag> ⁇ 8> pad mark, the obtained fourth word sequence is:
  • the preset dictionary includes a large number of words and the correspondence between the large number of words and their corresponding token (identification, ID for short), for example, the word “Check”, the token ID "6” and the word “ Correspondence between Check” and token ID "6".
  • the token ID corresponding to CLS included in the preset dictionary is 1, the token ID corresponding to pad is 0, the token ID corresponding to PD_VendorDone is 5, and the token ID corresponding to Check is 5.
  • the token ID corresponding to done is 7
  • the token ID corresponding to number is 4
  • the token ID corresponding to ms is 8
  • the token ID corresponding to Flag is 9
  • the token ID corresponding to 8 is 10, and the token ID corresponding to pad is 10.
  • the computing device uses the preset dictionary to perform the fourth word sequence ⁇ CLS> ⁇ PD_VendorDone> ⁇ Check> ⁇ done> ⁇ number> ⁇ ms> ⁇ Flag> ⁇ 8> ⁇ pad> ⁇ pad> Conversion, the fifth word sequence obtained is:
  • the word and the token ID corresponding to the word can be added to the preset dictionary.
  • the word “identification” and the word “identification” can be added to the default dictionary.
  • the token ID corresponding to the word is "100001" or "100008", etc.
  • S4023 can be executed before S4022, or S4024 can be executed before S4022, There is no specific limitation here.
  • the computing device when the computing device performs steps S4021 to S4025 for each of the m first log samples, the computing device can obtain m word sequences, and the words included in the m word sequences are equal in number.
  • the computing device pre-trains the initial log anomaly detection model through m word sequences, to obtain a pre-trained log anomaly detection model.
  • the method for pre-training the initial log anomaly detection model by the computing device through m word sequences may be the MLM method or the MASS method.
  • the MLM method or the MASS method.
  • the computing device pre-trains the initial log anomaly detection model, and the process of obtaining the pre-trained log anomaly detection model may specifically include the following steps:
  • the computing device respectively performs mask processing on words in a preset proportion in the m word sequences, and obtains m word sequences after mask processing.
  • the preset ratio may be 10%, 15%, 20%, and the like.
  • the computing device pre-trains the initial log anomaly detection model through the masked m word sequences, to obtain a pre-trained log anomaly detection model.
  • the computing device pre-trains the initial log anomaly detection model through the masked m word sequences, and the specific process of obtaining the pre-trained log anomaly detection model may include the following steps:
  • the computing device separately obtains a word embedding vector and a position embedding vector corresponding to each word in the m word sequences after mask processing.
  • the word embedding vector corresponding to each word above is a multi-dimensional vector used to represent each word.
  • word embedding vector is a general term for a set of language modeling and feature learning technologies in the field of natural language processing. Convert to a multidimensional vector.
  • the word embedding vector corresponding to each word can be obtained by one-hot encoding, or it can be obtained by word-to-vector (Word2Vec) model or glove (Glove) model.
  • the dimension of the word embedding vector may be 256 dimensions or 512 dimensions, and may also be more or less dimensions, which are not specifically limited here.
  • the method of obtaining the word embedding vector corresponding to each word in the i-th word sequence after mask processing is the Word2Vec model, and the dimension of the word embedding vector is 5 dimensions
  • the obtained mask-processing i-th word The word "1" in the sequence can be a 5-dimensional vector Vi ,1 (-0.065, -0.035, 0.019, -0.026, 0.085)
  • the word "MASK” can be a 5-dimensional vector Vi ,2 (0.000, 0.000 , 0.000, 0.000, 0.000), ...
  • the word "0" at the end of the sentence can be a 5-dimensional vector Vi ,10 (-0.027, -0.013, 0.006, 0.023, 0.014), as shown in Figure 7.
  • the above-exemplified value in the word embedding vector includes three decimal places only as an example, and in a specific implementation, it may include fewer or more decimal places, which is not specifically limited herein.
  • the position embedding vector corresponding to each word above is used to represent the position of each word in the word sequence, and its dimension is the same as that of the word embedding vector.
  • the position embedding vector of a word can be obtained by the following formula:
  • PE() represents the position embedding vector
  • pos represents the position of the word in the word sequence
  • its value range is [0, the number of words included in the word sequence)
  • d model represents the dimension of the position embedding vector
  • 2j represents the position
  • 2j+1 represents the odd-numbered dimension index of the position embedding vector
  • the dimension d model of the position embedding vector is 5 as an example.
  • j takes 0 respectively.
  • PE(pos, 2j) when j is 0, the calculated value of PE(pos, 2j) (ie PE(pos, 0)) is the value of the zeroth dimension of the embedded vector at this position, PE(pos, 2j+ 1) (ie PE(pos, 1)) is the value of the first dimension of the embedded vector at this position.
  • the calculated PE(pos, 2j) (ie PE(pos, 2)) The value of is the value of the second dimension of the embedding vector at this position, and the calculated value of PE(pos,2j+1) (ie PE(pos,3)) is the value of the third dimension of the embedding vector at this position,
  • the calculated value of PE(pos, 2j) (that is, PE(pos, 4)) is the value of the fourth dimension of the embedded vector at the position.
  • the dimension d model of the obtained position embedding vector is 5.
  • the position embedding vector V i,1 ' corresponding to the word "1" in the i-th word sequence after mask processing obtained by the above formula is:
  • the numerical value in the position embedding vector exemplified above includes three decimal places only as an example, and in a specific implementation, it may include fewer or more decimal places, which is not specifically limited herein.
  • the position embedding vector of a word can be obtained by the following formula:
  • the computing device obtains m first row vectors corresponding to the masked m word sequences according to the word embedding vector and the position embedding vector corresponding to each word in the masked m word sequences, respectively.
  • the computing device can obtain each word in each masked word sequence by superimposing the word embedding vector and the position embedding vector corresponding to each word in each masked word sequence
  • the corresponding word vector so as to obtain the first row vector corresponding to each masked word sequence, and other ways to obtain the first row vector through the word embedding vector and the position embedding vector are also within the scope of protection of this application. No specific restrictions are imposed.
  • the word vector V i,10 " corresponding to the word "0" at the end of the sentence is:
  • the combination of word vectors corresponding to all words in the i-th word sequence after mask processing is the result of mask processing.
  • the process of obtaining the first row vector corresponding to each word sequence in the m word sequences after mask processing is the same as obtaining the first row vector V i corresponding to the ith word sequence after mask processing. ' The process is similar, for details, please refer to the above related description, which will not be repeated here.
  • the computing device uses m first row vectors to train an initial log anomaly detection model to obtain a pre-trained log anomaly detection model.
  • the computing device uses m first row vectors to train an initial log anomaly detection model, and the specific process of obtaining a pre-trained log anomaly detection model is shown in Figure 9, which may include the following steps:
  • A1 Input the m first row vectors into the initial log anomaly detection model for training, and obtain m second row vectors corresponding to the m word sequences after mask processing.
  • the second row vector corresponding to the masked word sequence represents a vector including semantic information of the masked word sequence, and the second row vector corresponding to each word sequence is the CLS mark corresponding to each word sequence.
  • One row vector V i ' is taken as an example, if the first row vector V i ' is input into the initial log anomaly detection model for training, the second row vector V i can be obtained, as shown in Figure 6, the second row vector V i includes the mask.
  • the semantic information of the i-th word sequence after code processing is "The supplier information has been checked”.
  • the loss from the ith second row vector V i to the initial cluster center c can be obtained through the following loss function:
  • the ith second row vector can be assigned to the normal log class when the loss (c, V i ) from the ith second row vector to the initial cluster center c is less than the first classification threshold , otherwise, it is determined that the i-th second row vector cannot be assigned to the normal log class.
  • the first classification threshold may be set by the user according to the actual situation.
  • the termination condition may be the maximum number of iterations, the minimum square error, the rate of change of the cluster center point, etc., which are not specifically limited here.
  • the centroid of the normal log class no longer change.
  • C that is, the target cluster center described below.
  • the m first row vectors are input into the initial log anomaly detection model for training, and the m second row vectors obtained are finally divided into the second row vectors belonging to the normal log category (refer to Figure 10).
  • the vector inside the circle) and the second row vector that does not belong to the normal log class (referring to the vector outside the circle shown in Figure 10), the centroid of the normal log class is C.
  • the pre-trained log anomaly detection model already has high-quality model parameters, since the pre-trained log anomaly detection model is trained using the first log sample set from the target object, if the model is directly used to generate the target sub-object If the parameters of the model are not accurate enough, the accuracy of the obtained detection results will be low. Therefore, the training sample set from the target sub-object (that is, the first set of n second log samples described below) can be further used. Second log sample set), fine-tune the pre-trained log anomaly detection model to obtain more accurate model parameters, and when the parameters of the model are more accurate, perform anomaly detection on the logs generated by the target sub-object, and the obtained detection results are also would be more accurate.
  • the second training sample set since the pre-trained log anomaly detection model already has high-quality model parameters, when using the second training sample set from the target sub-object to fine-tune the pre-trained log anomaly detection model, the second training sample set only More accurate model parameters can be obtained by including a small number of second training samples, and the process of fine-tuning to obtain more accurate model parameters only requires a small amount of labor and time cost, and the model training efficiency is high.
  • the computing device acquires a second log sample set including n second log samples, where the second log sample set is obtained by processing log data of the target sub-object.
  • the log data of a large number of target sub-objects may be historical logs generated by the target sub-objects.
  • n is a natural number greater than 1, and n is usually less than m.
  • the computing device performs word segmentation on the n second log samples, respectively, to obtain n word sequences in the second log.
  • the computing device fine-tunes the pre-trained log anomaly detection model through n word sequences to obtain a trained log anomaly detection model.
  • model parameters of the trained log anomaly detection model obtained by fine-tuning are more accurate than those of the pre-trained log anomaly detection model. Subsequently, the trained log anomaly detection model is used to perform anomaly detection on the logs generated by the target sub-object. The test results obtained will also be more accurate.
  • the process for the computing device to acquire the second log sample set including n second log samples is similar to the process for the computing device to acquire the first log sample set including m first log samples in S401.
  • the computing device performs word segmentation on the n second log samples respectively, and the process of obtaining n word sequences corresponding to the n second log samples is the same as the computing device in S402.
  • the process of obtaining m word sequences corresponding to m first log samples is similar.
  • the computing device fine-tunes the pre-trained log anomaly detection model through n word sequences to obtain the training
  • the process of a good log anomaly detection model is similar to the process in S403 that the computing device pre-trains the initial log anomaly detection model through m word sequences, and obtains a pre-trained log anomaly detection model.
  • the relevant information in S403. describe.
  • the computing device when it obtains the target cluster center C, it can also calculate the log anomaly detection model used for training according to the target cluster center C and m second row vectors.
  • the second classification threshold for anomaly detection of logs to be detected when the computing device obtains the target cluster center C, it can also calculate the log anomaly detection model used for training according to the target cluster center C and m second row vectors. The second classification threshold for anomaly detection of logs to be detected.
  • the process of calculating the second classification threshold by the computing device may include:
  • the computing device obtains the loss of m second row vectors to the target cluster center C.
  • the process of obtaining the loss from m second row vectors to the target cluster center C by the computing device is similar to the above-mentioned process of obtaining the loss from m second row vectors to the initial cluster center c.
  • the above related description please refer to the above related description.
  • the computing device obtains the percentile corresponding to the loss of the m second row vectors to the target cluster center C.
  • percentile is a statistical term. If a set of data is sorted from small to large and the corresponding cumulative percentile is calculated, the value of the data corresponding to a percentile is called the percentile percentile. For example, a value at the 80th percentile is called the 80th percentile.
  • the computing device obtains the percentiles corresponding to the losses of the m second row vectors to the target cluster center C, that is, the computing device sorts the losses from the m second row vectors to the target cluster center C from small to large, and Calculate the corresponding cumulative percentile.
  • the computing device determines the target percentile according to the percentile corresponding to the loss of the m second row vectors to the target cluster center C.
  • the computing device determines the second classification threshold according to the target percentile.
  • the second classification threshold T can be determined by the following formula:
  • P represents the value at the target percentile
  • is used to expand the distance around the target cluster center C
  • P and ⁇ can be based on the number of normal samples and the number of abnormal samples in the m first log samples, For example, when the number of normal samples in the m first log samples is much larger than the number of abnormal samples (for example, the number of normal samples and the number of abnormal samples are 10000:1 or 5000:1, etc.), the target percentile The value of the number can be as large as possible, such as 90%, 95%, etc., and the value of ⁇ can be 1.8, 2.0, 2.5, etc. In the m first log samples, the number of normal samples is close to the number of abnormal samples (such as normal samples).
  • the value of the target percentile can be close to 50%, such as 45%, 51%, etc., and the value of ⁇ can be 1.2, 1.5 etc.;
  • the second classification threshold T is used for the process of using the trained log anomaly detection model to perform anomaly detection on the log to be detected, please refer to the relevant description in FIG. 11 .
  • the present application calculates the second classification threshold T according to the target cluster center C and m second row vectors, and when calculating the second classification threshold T, considers the m first log samples
  • the value of the target percentile and the value of ⁇ are taken on the basis of the number of normal samples and the number of abnormal samples, instead of manually setting the classification threshold based on experience as in the prior art, and the manually set classification threshold is too large Or if it is too small, it will have a great impact on the accuracy of the trained log anomaly detection model. For example, when the manually set classification threshold is too large, this will make the trained log anomaly detection model have a high chance of mistaking anomalies. Logs are classified as normal logs.
  • the trained log anomaly detection model When the manually set classification threshold is too small, the trained log anomaly detection model has a high probability of misclassifying normal logs as abnormal logs. Therefore, by determining the second classification threshold T by the method provided in this application, the accuracy of anomaly detection performed by the trained log anomaly detection model can be improved.
  • the trained log anomaly detection model can be deployed to the target sub-object, and the trained log anomaly detection model can be used to detect The model performs anomaly detection on the logs to be detected generated by the target sub-object.
  • FIG. 11 is a schematic flowchart of an exemplary process of using a trained log anomaly detection model to perform anomaly detection on a log to be detected of a target sub-object. As shown in FIG. 11 , the detection process includes:
  • log entry to be detected is the log to be detected as described above.
  • S115 Input the first row vector corresponding to the word sequence to be detected into the trained log anomaly detection model to perform anomaly detection, and obtain a detection result.
  • the first row vector corresponding to the word sequence to be detected is input into the trained log anomaly detection model for anomaly detection, and the specific process of obtaining the detection result may include the following steps:
  • the cluster center C' represents the centroid of the normal log class that no longer changes when the trained log anomaly detection model is obtained in the fine-tuning stage, and the n second row vectors corresponding to the n second log samples are clustered.
  • the loss (C',X) of the second row vector corresponding to the word sequence to be detected to the cluster center C' can be obtained by the following formula:
  • X represents the second row vector corresponding to the word sequence to be detected.
  • the device refers to a device that generates the log entry to be detected.
  • the obtained anomaly detection result includes the relative aggregation of the second row vector x corresponding to the word sequence to be detected.
  • the loss (C', X) of the class center C' assuming that the loss (C', X) is 5, the second classification threshold T is 8, and the loss (C', X) is less than the second classification threshold T, then
  • the trained log anomaly detection model will attribute the second row of vector X to the normal log category, and output the detection result that there is no device anomaly information in the log entry to be detected.
  • the trained log anomaly detection model outputs the detection result that there is no device anomaly information in the log entry to be detected only as an example.
  • the output detection result can also be "The device is normal", etc., and no specific restrictions are made here.
  • the definitions of the word sequence to be detected, the first row vector corresponding to the word sequence to be detected, etc. are the same as the definitions of the word sequence, the first row vector, etc. in the embodiment of FIG. 4 .
  • FIG. 4 The relevant content in the illustrated embodiment will not be described again here.
  • the process of performing word segmentation on the content of the event to be detected to obtain the sequence of words to be detected corresponding to the content of the event to be detected is the same as the computing device in S402 performing word segmentation on m first log samples, and obtaining m first log samples corresponding to
  • the process of the m word sequences is similar, and for details, please refer to the relevant description in S402;
  • the process of obtaining the first row vector corresponding to the word sequence to be detected is the same as the process of obtaining the m word sequences after mask processing by the computing device in S403.
  • the process of the m first row vectors is similar, and for details, please refer to the relevant description in S403.
  • the word embedding vector and position embedding vector corresponding to each word in the word sequence to be detected can be directly obtained. , so as to obtain the first row vector corresponding to the word sequence to be detected.
  • the log anomaly detection model training method provided by this application takes computing devices as the execution subject, in specific implementation, the log anomaly detection model training method provided by this application is the execution subject. It can also be a computing device cluster including at least two computing devices, and at least two computing devices in the computing device cluster can cooperate to implement the log anomaly detection model training method provided by this application.
  • step S401 is performed by computing device A
  • steps S402 to S406 are performed by computing device B
  • steps S401 to S403 are performed by computing device A
  • steps S404 and S406 are performed by computing device A and computing device B jointly execute.
  • a log anomaly detection model training method provided by the present application is described in detail above. Based on the same inventive concept, the log anomaly detection model training device provided by the present application will be described below.
  • FIG. 13 is a schematic structural diagram of a log anomaly detection model training device 100 provided by the present application.
  • the device 100 includes: an acquisition module 110 and a training module 120, wherein,
  • an obtaining module 110 configured to obtain a first log sample set, wherein the first log sample set is obtained by processing log data of the target object;
  • the training module 120 pre-trains the initial log anomaly detection model by using the first log sample set to obtain a pre-trained log anomaly detection model
  • the obtaining module 110 is further configured to obtain a second log sample set, wherein the second log sample set is obtained by processing log data of a target sub-object, and the target sub-object belongs to the target object;
  • the training module 120 is further configured to fine-tune the pre-trained log anomaly detection model by using the second log sample set to obtain a trained log anomaly detection model.
  • the target object includes at least one of the following sub-objects: hard disk, memory, flash memory, network device and processor, and the target sub-object is any one of the target objects Subobjects of the type.
  • the first log sample set includes m log samples, where m is a natural number greater than 1, and the training module 120 is specifically used for:
  • the initial log anomaly detection model is pre-trained to obtain a pre-trained log anomaly detection model.
  • the training module 120 is specifically used for:
  • the initial log anomaly detection model is pre-trained through the masked m word sequences to obtain a pre-trained log anomaly detection model.
  • the training module 120 is specifically used for:
  • the word embedding vector corresponding to each word is a multi-dimensional representation of each word.
  • the position embedding vector corresponding to each word represents the position of each word in the word sequence to which it belongs;
  • the word embedding vector and the position embedding vector corresponding to each word in the m word sequences after the mask processing respectively, obtain m first row vectors corresponding to the m word sequences after the mask processing;
  • pre-train the initial log anomaly detection model to obtain a pre-trained log anomaly detection model.
  • the training module 120 is specifically used for:
  • the m first row vectors are respectively input into the initial log anomaly detection model for training, and m second row vectors are obtained, wherein the m second row vectors and the m word sequences processed by the mask There is a one-to-one correspondence, and each second row vector in the m second row vectors includes the semantic information of the word sequence after its corresponding mask processing;
  • an initial log anomaly detection model is trained to obtain a pre-trained log anomaly detection model and a target cluster center.
  • the training module 120 is further configured to:
  • a classification threshold is determined according to the percentile corresponding to the loss of the m second row vectors to the target cluster center, wherein the classification threshold is used for the trained log anomaly detection model to detect the to-be-detected Anomaly detection is performed on the log, and the detection result is obtained.
  • the formula for obtaining the loss from the m second row vectors to the initial cluster center is:
  • V i represents the ith second row vector among the m second row vectors
  • c represents the initial cluster center
  • loss(c,V i ) represents the ith second row vector to The loss of the initial cluster center
  • i is a natural number.
  • log anomaly detection model training apparatus 100 is only an example provided by the embodiment of the present application, and the log anomaly detection model training apparatus 100 may have more or less components than those shown in FIG. 13 , and may combine two one or more components, or may be implemented with different configurations of components.
  • the log anomaly detection model training apparatus 100 provided in this application can be applied to various computing devices such as cloud servers, personal computers, and terminal devices, and can also be applied to a computing device cluster including at least two computing devices. The following applies to one computing device Described as an example.
  • FIG. 14 is a schematic structural diagram of a computing device 200 provided by the present application.
  • the computing device 200 includes: a processor 210 , a memory 220 and a communication interface 230 , wherein one of the processor 210 , the memory 220 and the communication interface 230 is They can be connected to each other through the bus 240 . in,
  • the processor 210 can read the program code (including instructions) stored in the memory 220, and execute the program code stored in the memory 220, so that the computing device 200 executes the steps in the log anomaly detection model training method provided by the above method embodiments, or makes The computing device 200 deploys the log anomaly detection model training apparatus 100 .
  • the processor 210 may have various specific implementation forms, for example, the processor 210 may be a central processing unit (central processing unit, CPU), a graphics processing unit (graphics processing unit, GPU), etc., and the processor 210 may also be a single-core processor or multi-core processors.
  • the processor 210 may be a combination of a CPU and a hardware chip.
  • the above-mentioned hardware chip may be an application-specific integrated circuit (ASIC), a programmable logic device (PLD) or a combination thereof.
  • the above-mentioned PLD can be a complex programmable logic device (CPLD), a field-programmable gate array (FPGA), a general array logic (generic array logic, GAL) or any combination thereof.
  • the processor 210 can also be independently implemented by a logic device with built-in processing logic, such as an FPGA or a DSP.
  • the memory 220 may store program codes and program data.
  • the program code includes: the code of the acquisition module 110 and the code of the training module 120, etc.
  • the program data includes: the first log sample set, the second log sample set, the word sequence before mask processing and the word sequence after mask processing and many more.
  • the memory 220 may be a non-volatile memory, such as a read-only memory (ROM), a programmable read-only memory (PROM), an erasable programmable read-only memory (erasable). PROM, EPROM), electrically erasable programmable read-only memory (electrically EPROM, EEPROM), or flash memory.
  • ROM read-only memory
  • PROM programmable read-only memory
  • EPROM erasable programmable read-only memory
  • EEPROM electrically erasable programmable read-only memory
  • flash memory volatile memory, which may be random access memory (RAM), which acts as an external cache.
  • Communication interface 230 may be a wired interface (eg, an Ethernet interface) or a wireless interface (eg, a cellular network interface or using a wireless local area network interface) for communicating with other computing nodes or devices.
  • the communication interface 230 may use a protocol family above transmission control protocol/internet protocol (TCP/IP), for example, remote function call (RFC) protocol, simple object access protocol (SOAP) protocol, simple network management protocol (SNMP) protocol, common object request broker architecture (CORBA) protocol, and distributed protocols and many more.
  • TCP/IP transmission control protocol/internet protocol
  • RRC remote function call
  • SOAP simple object access protocol
  • SNMP simple network management protocol
  • CORBA common object request broker architecture
  • the bus 240 may be a peripheral component interconnect (PCI) bus or an extended industry standard architecture (extended industry standard architecture, EISA for short) bus or the like.
  • PCI peripheral component interconnect
  • EISA extended industry standard architecture
  • the bus 240 can be divided into an address bus, a data bus, a control bus, and the like. For ease of presentation, only one thick line is shown in FIG. 14, but it does not mean that there is only one bus or one type of bus.
  • the above computing device 200 is configured to execute the method described in the above embodiment of the log anomaly detection model training method, which belongs to the same concept as the above method embodiment, and the specific implementation process is detailed in the above method embodiment, which will not be repeated here.
  • the computing device 200 deploys the functional modules of the log anomaly detection model training apparatus 100, see the apparatus embodiment shown in FIG. 13 .
  • computing device 200 is only an example provided by the embodiments of the present application, and the computing device 200 may have more or less components than those shown in FIG. 14 , two or more components may be combined, or Different configurations of components are possible.
  • the present application also provides a non-transitory computer-readable storage medium, where instructions are stored in the non-transitory computer-readable storage medium, and when the instructions are run, part of the log anomaly detection model training method described in the above embodiment or all steps.
  • the present application also provides a computer program product, when the computer program product is read and executed by a computer, part or all of the steps of the log anomaly detection model training method described in the above method embodiments can be implemented.
  • the above-mentioned embodiments it may be implemented in whole or in part by software, hardware or any combination thereof.
  • software it can be implemented in whole or in part in the form of a computer program product.
  • the computer program product includes one or more computer instructions.
  • the computer may be a general purpose computer, special purpose computer, computer network, or other programmable device.
  • the computer instructions may be stored in or transmitted from one computer readable storage medium to another computer readable storage medium, for example, the computer instructions may be downloaded from a website site, computer, server or data center Transmission to another website site, computer, server, or data center by wire (eg, coaxial cable, optical fiber, digital subscriber line) or wireless (eg, infrared, wireless, microwave, etc.).
  • the computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that includes an integration of one or more available media.
  • the usable media may be magnetic media (eg, floppy disks, hard disks, magnetic tapes), optical media, or semiconductor media, and the like.
  • the steps in the method of the embodiment of the present application may be sequentially adjusted, combined or deleted according to actual needs; the units in the device of the embodiment of the present application may be divided, combined or deleted according to actual needs.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Computing Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Quality & Reliability (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Computation (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Software Systems (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Fuzzy Systems (AREA)
  • Medical Informatics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Debugging And Monitoring (AREA)

Abstract

A log anomaly detection model training method, apparatus and device, applied to the field of artificial intelligence. The method comprises: obtaining a first log sample set, and pre-training an initial log anomaly detection model by means of the fist log sample set to obtain a pre-trained log anomaly detection model; then obtaining second log samples; and finely adjusting the pre-trained log anomaly detection model by means of a second sample set, such that a trained log anomaly detection model can be obtained, wherein the first log sample set is obtained by processing log data of a target object, the second log sample set is obtained by processing log data of a target sub-object, and the target sub-object belongs to the target object. The method can solve the problems existing in the prior art that a lot of manpower and time costs would be consumed and the efficiency is low due to the fact that due to the low generalization capability of the log anomaly detection model obtained by training, a user needs to train to obtain a new model when implementing anomaly detection of a log generated by a similar object.

Description

日志异常检测模型训练方法、装置及设备Log anomaly detection model training method, device and equipment 技术领域technical field
本申请涉及人工智能领域,尤其涉及一种日志异常检测模型训练方法、装置及设备。The present application relates to the field of artificial intelligence, and in particular, to a log anomaly detection model training method, device, and device.
背景技术Background technique
硬盘(hard disk drive,HDD)、网络设备(如路由器、交换机等)和处理器等在运行过程中都会产生各类日志用于记录自身的状态以及重要事件,日志包含丰富的运行动态信息,异常信息能通过日志表现出来,因此日志可以用于异常检测和故障诊断。Hard disk drives (HDD), network devices (such as routers, switches, etc.) and processors will generate various logs during operation to record their own status and important events. Information can be represented through logs, so logs can be used for anomaly detection and troubleshooting.
目前,如图1所示,用户若想实现对某个特定对象(如某个厂家生产的某种型号的内存)生成的日志进行异常检测,主要是通过获取该特定对象生成的历史日志作为训练样本,对初始日志异常检测模型进行训练,得到针对该特定对象的日志具有较好检测效果的训练好的日志异常检测模型,然后利用训练好的模型实现对该特定对象生成的日志进行异常检测。At present, as shown in Figure 1, if a user wants to perform anomaly detection on the logs generated by a specific object (such as a certain type of memory produced by a certain manufacturer), he mainly obtains the historical logs generated by the specific object as training Samples are used to train the initial log anomaly detection model to obtain a trained log anomaly detection model with better detection effect for the logs of the specific object, and then use the trained model to implement anomaly detection for the logs generated by the specific object.
但是,上述方法训练得到的日志异常检测模型存在泛化能力较低的问题,导致用户在实现对相似特定对象(如另一厂家生产的另一型号的内存)生成的日志进行异常检测时,用户无法利用已经训练得到的日志异常检测模型,只能重新获取相似特定对象生成的历史日志作为训练样本对初始日志异常检测模型进行训练,得到针对相似特定对象生成的日志具有较好检测效果的日志异常检测模型,针对不同特定对象,重新训练得到新模型的过程通常会耗费大量的人力和时间成本,效率较低。However, the log anomaly detection model trained by the above method has the problem of low generalization ability, which causes the user to perform anomaly detection on the logs generated by similar specific objects (such as the memory of another model produced by another manufacturer). The log anomaly detection model that has been trained cannot be used, and the historical logs generated by similar specific objects can only be re-acquired as training samples to train the initial log anomaly detection model, and log anomalies with better detection effect can be obtained for the logs generated for similar specific objects. Detecting models, for different specific objects, the process of retraining to obtain new models usually consumes a lot of manpower and time costs, and the efficiency is low.
发明内容SUMMARY OF THE INVENTION
本申请提供了日志异常检测模型训练方法、装置及设备,可以解决现有技术存在的训练得到的日志异常检测模型泛化能力较低,导致用户在实现对相似特定对象生成的日志进行异常检测时,需要训练得到针对该特定对象的新的日志异常检测模型,会耗费大量人力和时间成本,效率低的问题。The present application provides a log anomaly detection model training method, device and equipment, which can solve the problem that the log anomaly detection model obtained by training in the prior art has low generalization ability, which causes users to perform anomaly detection on logs generated by similar specific objects. , it is necessary to train a new log anomaly detection model for the specific object, which will cost a lot of manpower and time, and has low efficiency.
第一方面,提供一种日志异常检测模型训练方法,所述方法包括:A first aspect provides a method for training a log anomaly detection model, the method comprising:
获取第一日志样本集,其中,所述第一日志样本集是对所述目标对象的日志数据进行处理得到的;obtaining a first log sample set, wherein the first log sample set is obtained by processing log data of the target object;
通过所述第一日志样本集对初始日志异常检测模型进行预训练,得到预训练的日志异常检测模型;Pre-training an initial log anomaly detection model by using the first log sample set to obtain a pre-trained log anomaly detection model;
获取第二日志样本集,其中,所述第二日志样本集是对目标子对象的日志数据进行处理得到的,所述目标子对象属于目标对象;acquiring a second log sample set, wherein the second log sample set is obtained by processing log data of a target sub-object, and the target sub-object belongs to the target object;
通过所述第二日志样本集对所述预训练的日志异常检测模型进行微调,得到训练好的日志异常检测模型。Fine-tune the pre-trained log anomaly detection model by using the second log sample set to obtain a trained log anomaly detection model.
上述方案中,通过来自目标对象的第一日志样本集对初始日志异常检测模型进行预训练,可以得到具有优质的模型参数以及较强的泛化能力的预训练的日志异常检测模型,在用户获取针对目标子对象的训练好的日志异常检测模型时,使用来自目标子对象的第二日志样本集,对具有优质的模型参数以及较强的泛化能力的预训练的日志异常检测模型进行微调即可,相较于现有技术,本申请提供的模型训练方法可以解决现有技术存在的训练得到的日志 异常检测模型泛化能力较低,导致用户在实现对相似特定对象生成的日志进行异常检测时,需要针对该特定对象重新训练得到新的日志异常检测模型,会耗费大量人力和时间成本,效率低的问题。In the above solution, by pre-training the initial log anomaly detection model with the first log sample set from the target object, a pre-trained log anomaly detection model with high-quality model parameters and strong generalization ability can be obtained. When the trained log anomaly detection model for the target sub-object is used, the second log sample set from the target sub-object is used to fine-tune the pre-trained log anomaly detection model with high-quality model parameters and strong generalization ability, i.e. Yes, compared with the prior art, the model training method provided by the present application can solve the problem that the log anomaly detection model obtained by training in the prior art has a low generalization ability, which causes the user to perform anomaly detection on logs generated by similar specific objects. When a new log anomaly detection model needs to be retrained for the specific object, it will cost a lot of manpower and time, and the efficiency will be low.
在一种可能的实现方式中,所述目标对象包括如下子对象中的至少一种:硬盘、内存、闪存、网络设备和处理器,所述目标子对象为所述目标对象中的任意一种类型的子对象。In a possible implementation manner, the target object includes at least one of the following sub-objects: hard disk, memory, flash memory, network device and processor, and the target sub-object is any one of the target objects Subobjects of the type.
在一种可能的实现方式中,所述第一日志样本集包括m个日志样本,m为大于1的自然数,所述通过所述第一日志样本集对初始日志异常检测模型进行预训练,得到预训练的日志异常检测模型,包括:In a possible implementation manner, the first log sample set includes m log samples, where m is a natural number greater than 1, and the initial log anomaly detection model is pre-trained by using the first log sample set to obtain Pre-trained log anomaly detection models, including:
分别对所述m个日志样本进行分词,得到所述m个日志样本对应的m个词序列;Perform word segmentation on the m log samples respectively to obtain m word sequences corresponding to the m log samples;
通过所述m个词序列,对初始日志异常检测模型进行预训练,得到预训练的日志异常检测模型。Through the m word sequences, the initial log anomaly detection model is pre-trained to obtain a pre-trained log anomaly detection model.
在一种可能的实现方式中,所述通过所述m个词序列,对初始日志异常检测模型进行预训练,得到预训练的日志异常检测模型,包括:In a possible implementation manner, the initial log anomaly detection model is pre-trained through the m word sequences to obtain a pre-trained log anomaly detection model, including:
分别对所述m个词序列中预设比例的词进行掩码处理,得到掩码处理后的m个词序列;Perform mask processing on words with preset proportions in the m word sequences, respectively, to obtain m word sequences after mask processing;
通过所述掩码处理后的m个词序列,对初始日志异常检测模型进行预训练,得到预训练的日志异常检测模型。The initial log anomaly detection model is pre-trained through the masked m word sequences to obtain a pre-trained log anomaly detection model.
上述方案中,通过掩码处理后的m个词序列,对初始日志异常检测模型进行预训练,得到预训练的日志异常检测模型,可以使得更好地学习到被掩码的词的上下文信息,从而使得预训练的日志异常检测模型学习到每个词序列的语义信息,便于后续得到的训练好的日志异常检测模型可以根据待检测日志的语义信息检测待检测日志是否异常。In the above scheme, the initial log anomaly detection model is pre-trained through the masked m word sequences to obtain a pre-trained log anomaly detection model, which can better learn the context information of the masked words. Thus, the pre-trained log anomaly detection model learns the semantic information of each word sequence, so that the trained log anomaly detection model obtained subsequently can detect whether the to-be-detected log is abnormal according to the semantic information of the to-be-detected log.
在一种可能的实现方式中,所述通过所述掩码处理后的m个词序列,对初始日志异常检测模型进行预训练,得到预训练的日志异常检测模型,包括:In a possible implementation, the initial log anomaly detection model is pre-trained with the m word sequences processed by the mask to obtain a pre-trained log anomaly detection model, including:
分别获取所述掩码处理后的m个词序列中的每个词对应的词嵌入向量和位置嵌入向量,其中,所述每个词对应的词嵌入向量为用于表示所述每个词的多维向量,所述每个词对应的位置嵌入向量表示所述每个词在其所属的词序列中的位置;Obtain the word embedding vector and the position embedding vector corresponding to each word in the m word sequences after the mask processing, wherein, the word embedding vector corresponding to each word is used to represent each word. A multi-dimensional vector, the position embedding vector corresponding to each word represents the position of each word in the word sequence to which it belongs;
分别根据所述掩码处理后的m个词序列中的每个词对应的词嵌入向量和位置嵌入向量,获取所述掩码处理后的m个词序列对应的m个第一行向量;According to the word embedding vector and the position embedding vector corresponding to each word in the m word sequences after the mask processing, respectively, obtain m first row vectors corresponding to the m word sequences after the mask processing;
使用所述m个第一行向量,对初始日志异常检测模型进行预训练,得到预训练的日志异常检测模型。Using the m first row vectors, pre-train the initial log anomaly detection model to obtain a pre-trained log anomaly detection model.
在一种可能的实现方式中,所述使用所述m个第一行向量,对初始日志异常检测模型进行预训练,得到所述预训练的日志异常检测模型,包括:In a possible implementation manner, using the m first row vectors to pre-train an initial log anomaly detection model to obtain the pre-trained log anomaly detection model, including:
分别将所述m个第一行向量,输入初始日志异常检测模型进行训练,得到m个第二行向量;The m first row vectors are respectively input into the initial log anomaly detection model for training to obtain m second row vectors;
获取所述m个第二行向量到初始聚类中心的损失;Obtain the loss of the m second row vectors to the initial cluster center;
根据所述m个第二行向量到所述初始聚类中心的损失,训练初始日志异常检测模型,得到所述预训练的日志异常检测模型以及目标聚类中心。According to the loss of the m second row vectors to the initial cluster center, an initial log anomaly detection model is trained to obtain the pre-trained log anomaly detection model and a target cluster center.
在一种可能的实现方式中,所述方法还包括:In a possible implementation, the method further includes:
获取所述m个第二行向量到所述目标聚类中心的损失对应的百分位数;Obtain the percentile corresponding to the loss of the m second row vectors to the target cluster center;
根据所述m个第二行向量到所述目标聚类中心的损失对应的百分位数确定分类阈值,其中,所述分类阈值用于所述训练好的日志异常检测模型对所述待检测日志进行异常检测, 得到检测结果。A classification threshold is determined according to the percentile corresponding to the loss of the m second row vectors to the target cluster center, wherein the classification threshold is used for the trained log anomaly detection model to detect the to-be-detected Anomaly detection is performed on the log, and the detection result is obtained.
在一种可能的实现方式中,所述获取所述m个第二行向量到初始聚类中心的损失的公式为:In a possible implementation manner, the formula for obtaining the loss from the m second row vectors to the initial cluster center is:
Figure PCTCN2021120446-appb-000001
Figure PCTCN2021120446-appb-000001
其中,V i表示所述m个第二行向量中的第i个第二行向量,c表示所述初始聚类中心,loss(c,V i)表示所述第i个第二行向量到所述初始聚类中心的损失,i为自然数。 Wherein, V i represents the ith second row vector among the m second row vectors, c represents the initial cluster center, and loss(c,V i ) represents the ith second row vector to The loss of the initial cluster center, i is a natural number.
第二方面,提供一种日志异常检测模型训练装置,所述装置包括:In a second aspect, a log anomaly detection model training device is provided, the device comprising:
获取模块,用于获取第一日志样本集,其中,所述第一日志样本集是对所述目标对象的日志数据进行处理得到的;an acquisition module, configured to acquire a first log sample set, wherein the first log sample set is obtained by processing log data of the target object;
训练模块,通过所述第一日志样本集对初始日志异常检测模型进行预训练,得到预训练的日志异常检测模型;a training module, which pre-trains the initial log anomaly detection model through the first log sample set to obtain a pre-trained log anomaly detection model;
所述获取模块,还用于获取第二日志样本集,其中,所述第二日志样本集是对目标子对象的日志数据进行处理得到的,所述目标子对象属于目标对象;The obtaining module is further configured to obtain a second log sample set, wherein the second log sample set is obtained by processing log data of a target sub-object, and the target sub-object belongs to the target object;
所述训练模块,还用于通过所述第二日志样本集对所述预训练的日志异常检测模型进行微调,得到训练好的日志异常检测模型。The training module is further configured to fine-tune the pre-trained log anomaly detection model through the second log sample set to obtain a trained log anomaly detection model.
在一种可能的实现方式中,所述目标对象包括如下子对象中的至少一种:硬盘、内存、闪存、网络设备和处理器,所述目标子对象为所述目标对象中的任意一种类型的子对象。In a possible implementation manner, the target object includes at least one of the following sub-objects: hard disk, memory, flash memory, network device and processor, and the target sub-object is any one of the target objects Subobjects of the type.
在一种可能的实现方式中,所述第一日志样本集包括m个日志样本,m为大于1的自然数,所述训练模块,具体用于:In a possible implementation manner, the first log sample set includes m log samples, where m is a natural number greater than 1, and the training module is specifically used for:
分别对所述m个日志样本进行分词,得到所述m个日志样本对应的m个词序列;Perform word segmentation on the m log samples respectively to obtain m word sequences corresponding to the m log samples;
通过所述m个词序列,对初始日志异常检测模型进行预训练,得到预训练的日志异常检测模型。Through the m word sequences, the initial log anomaly detection model is pre-trained to obtain a pre-trained log anomaly detection model.
在一种可能的实现方式中,所述训练模块,具体用于:In a possible implementation manner, the training module is specifically used for:
分别对所述m个词序列中预设比例的词进行掩码处理,得到掩码处理后的m个词序列;Perform mask processing on words with preset proportions in the m word sequences, respectively, to obtain m word sequences after mask processing;
通过所述掩码处理后的m个词序列,对初始日志异常检测模型进行预训练,得到预训练的日志异常检测模型。The initial log anomaly detection model is pre-trained through the masked m word sequences to obtain a pre-trained log anomaly detection model.
在一种可能的实现方式中,所述训练模块,具体用于:In a possible implementation manner, the training module is specifically used for:
分别获取所述掩码处理后的m个词序列中每个词对应的词嵌入向量和位置嵌入向量,其中,所述每个词对应的词嵌入向量为用于表示所述每个词的多维向量,所述每个词对应的位置嵌入向量表示所述每个词在其所属的词序列中的位置;Obtain the word embedding vector and the position embedding vector corresponding to each word in the m word sequences after the mask processing, wherein the word embedding vector corresponding to each word is a multi-dimensional representation of each word. vector, the position embedding vector corresponding to each word represents the position of each word in the word sequence to which it belongs;
分别根据所述掩码处理后的m个词序列中每个词对应的词嵌入向量和位置嵌入向量,获取所述掩码处理后的m个词序列对应的m个第一行向量;According to the word embedding vector and the position embedding vector corresponding to each word in the m word sequences after the mask processing, respectively, obtain m first row vectors corresponding to the m word sequences after the mask processing;
使用所述m个第一行向量,对初始日志异常检测模型进行预训练,得到预训练的日志异常检测模型。Using the m first row vectors, pre-train the initial log anomaly detection model to obtain a pre-trained log anomaly detection model.
在一种可能的实现方式中,所述训练模块,具体用于:In a possible implementation manner, the training module is specifically used for:
分别将所述m个第一行向量,输入初始日志异常检测模型进行训练,得到m个第二行向量,其中,所述m个第二行向量与所述掩码处理后的m个词序列存在一一对应关系,所述m个第二行向量中的每个第二行向量包括与其对应的掩码处理后的词序列的语义信息;The m first row vectors are respectively input into the initial log anomaly detection model for training, and m second row vectors are obtained, wherein the m second row vectors and the m word sequences processed by the mask There is a one-to-one correspondence, and each second row vector in the m second row vectors includes the semantic information of the word sequence after its corresponding mask processing;
获取所述m个第二行向量到初始聚类中心的损失;Obtain the loss of the m second row vectors to the initial cluster center;
根据所述m个第二行向量到所述初始聚类中心的损失,训练初始日志异常检测模型, 得到预训练的日志异常检测模型以及目标聚类中心。According to the loss of the m second row vectors to the initial cluster center, an initial log anomaly detection model is trained to obtain a pre-trained log anomaly detection model and a target cluster center.
在一种可能的实现方式中,所述训练模块,还用于:In a possible implementation manner, the training module is further used for:
获取所述m个第二行向量到所述目标聚类中心的损失对应的百分位数;Obtain the percentile corresponding to the loss of the m second row vectors to the target cluster center;
根据所述m个第二行向量到所述目标聚类中心的损失对应的百分位数确定分类阈值,其中,所述分类阈值用于所述训练好的日志异常检测模型对所述待检测日志进行异常检测,得到检测结果。A classification threshold is determined according to the percentile corresponding to the loss of the m second row vectors to the target cluster center, wherein the classification threshold is used for the trained log anomaly detection model to detect the to-be-detected Anomaly detection is performed on the log, and the detection result is obtained.
在一种可能的实现方式中,所述获取所述m个第二行向量到初始聚类中心的损失的公式为:In a possible implementation manner, the formula for obtaining the loss from the m second row vectors to the initial cluster center is:
Figure PCTCN2021120446-appb-000002
Figure PCTCN2021120446-appb-000002
其中,V i表示所述m个第二行向量中的第i个第二行向量,c表示所述初始聚类中心,loss(c,V i)表示所述第i个第二行向量到所述初始聚类中心的损失,i为自然数。 Wherein, V i represents the ith second row vector among the m second row vectors, c represents the initial cluster center, and loss(c,V i ) represents the ith second row vector to The loss of the initial cluster center, i is a natural number.
第三方面,提供了一种非瞬态计算机可读存储介质,所述非瞬态计算机可读介质存储有指令,所述指令用于实现如上述第一方面或者第一方面的任意可能的实现方式提供的方法。In a third aspect, a non-transitory computer-readable storage medium is provided, and the non-transitory computer-readable storage medium stores instructions for implementing the first aspect or any possible implementation of the first aspect. method provided.
第四方面,提供了一种计算设备,所述计算机设备包括处理器和存储器;所述处理器用于执行所述存储器存储的指令,使得所述计算设备实现如上述第一方面或者第一方面的任意可能的实现方式提供的方法。In a fourth aspect, a computing device is provided, the computing device includes a processor and a memory; the processor is configured to execute instructions stored in the memory, so that the computing device implements the first aspect or the first aspect above Methods provided by any possible implementation.
第五方面,提供了一种计算机程序产品,包括计算机程序,当所述计算机程序被计算设备读取并执行时,使得所述计算设备执行如上述第一方面或者第一方面的任意可能的实现方式提供的方法。In a fifth aspect, a computer program product is provided, including a computer program, which, when the computer program is read and executed by a computing device, causes the computing device to perform the above-mentioned first aspect or any possible implementation of the first aspect method provided.
附图说明Description of drawings
为了更清楚地说明本申请实施例技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍。In order to illustrate the technical solutions of the embodiments of the present application more clearly, the following briefly introduces the accompanying drawings that need to be used in the description of the embodiments.
图1是本申请涉及的一种现有技术的示意图;Fig. 1 is the schematic diagram of a kind of prior art involved in this application;
图2是本申请涉及的一种语言遮蔽模型(masked language model,MLM)方法对输入序列中的词进行掩码处理的示意图;2 is a schematic diagram of masking a word in an input sequence by a masked language model (MLM) method involved in the present application;
图3是本申请提供的一种日志异常检测模型训练方法的原理示意图;3 is a schematic diagram of the principle of a log anomaly detection model training method provided by the present application;
图4是本申请提供的一种日志异常检测模型训练方法的流程示意图;4 is a schematic flowchart of a method for training a log anomaly detection model provided by the present application;
图5是本申请提供的一种获取第i个第一日志样本对应的词序列的示意图;5 is a schematic diagram of obtaining a word sequence corresponding to the i-th first log sample provided by the present application;
图6是本申请提供的一种获取掩码处理后的词序列对应的第一行向量和第二行向量的示意图;6 is a schematic diagram of a first row vector and a second row vector corresponding to a word sequence obtained after mask processing provided by the present application;
图7是本申请提供的一种示例性的词嵌入向量和位置嵌入向量的示意图;7 is a schematic diagram of an exemplary word embedding vector and position embedding vector provided by the present application;
图8是本申请提供的一种示例性的词向量的示意图;8 is a schematic diagram of an exemplary word vector provided by the present application;
图9是本申请提供的一种训练得到预训练的日志异常检测模型的流程示意图;FIG. 9 is a schematic flowchart of a pre-trained log anomaly detection model provided by the present application;
图10是本申请涉及的使用m个第一行向量对初始日志异常检测模型进行预训练得到m个第二行向量以及目标聚类中心的示意图;Fig. 10 is the schematic diagram of using m first row vectors involved in the present application to pre-train the initial log anomaly detection model to obtain m second row vectors and target cluster centers;
图11是本申请涉及的一种日志异常检测方法的流程示意图;11 is a schematic flowchart of a log anomaly detection method involved in the present application;
图12是本申请涉及的对待检测序列对应的第一行向量x进行异常检测得到异常检测结果的示意图;12 is a schematic diagram of anomaly detection obtained by performing anomaly detection on the first row vector x corresponding to the sequence to be detected involved in the present application;
图13是本申请提供的一种日志异常检测模型训练装置的结构示意图;13 is a schematic structural diagram of a log anomaly detection model training device provided by the present application;
图14是本申请提供的一种计算设备的结构示意图。FIG. 14 is a schematic structural diagram of a computing device provided by the present application.
具体实施方式Detailed ways
下面结合本申请实施例中的附图对本申请实施例进行描述。The embodiments of the present application will be described below with reference to the accompanying drawings in the embodiments of the present application.
本申请实施例中的术语“第一”、“第二”仅用于描述目的,而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括一个或者更多个该特征。The terms "first" and "second" in the embodiments of the present application are only used for the purpose of description, and cannot be understood as indicating or implying relative importance or implying the number of indicated technical features. Thus, a feature defined as "first" or "second" may expressly or implicitly include one or more of that feature.
本申请实施例中,“至少一个”是指一个或者多个,“多个”是指两个或两个以上。“和/或”,描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B的情况,其中A,B可以是单数或者复数。字符“/”一般表示前后关联对象是一种“或”的关系。“以下中的至少一项(个)”或其类似表达,是指的这些项中的任意组合,包括单项(个)或复数项(个)的任意组合。例如,a,b或c中的至少一项(个),可以表示:a、b、c、a-b、a-c、b-c或a-b-c,其中a、b、c可以是单个,也可以是多个。In the embodiments of the present application, "at least one" refers to one or more, and "multiple" refers to two or more. "And/or", which describes the association relationship of the associated objects, indicates that there can be three kinds of relationships, for example, A and/or B, which can indicate: the existence of A alone, the existence of A and B at the same time, and the existence of B alone, where A, B can be singular or plural. The character "/" generally indicates that the associated objects are an "or" relationship. "At least one of the following" or similar expressions refers to any combination of these items, including any combination of a single item(s) or a plurality of items(s). For example, at least one (a) of a, b or c may represent: a, b, c, a-b, a-c, b-c or a-b-c, wherein a, b, c may be single or multiple.
为了便于本领域技术人员对本申请实施例的理解,先对本申请涉及的相关概念或者术语等进行介绍。In order to facilitate the understanding of the embodiments of the present application by those skilled in the art, related concepts or terms and the like involved in the present application are first introduced.
(1)日志,是硬盘、网络设备、处理器等对象生成的记录,用来表明硬盘、网络设备、处理器等的状态以及发生了哪些事件。例如,硬盘在故障发生或在认为将会发生故障的情况下生成日志。(1) Logs are records generated by objects such as hard disks, network devices, processors, etc., which are used to indicate the status of hard disks, network devices, processors, etc. and what events have occurred. For example, hard drives generate logs when a failure occurs or when it is thought to fail.
日志一般以日志文件的方式保存在设备中,日志文件可以是可直接阅读的文本文件或者是机器可读的二进制文件,或其他形式存在的文件,本申请不做具体限制。每个日志文件由一行行的日志记录组成,一条或连续的几条记录描述某次独立的事件,描述某次独立的事件的日志记录可以称为一个日志条目。一个日志文件包括多个日志条目。一个日志条目通常包含事件发生时间、事件内容、事件类型、事件级别等等。The log is generally stored in the device in the form of a log file, and the log file may be a directly readable text file or a machine-readable binary file, or a file existing in other forms, which is not specifically limited in this application. Each log file consists of line-by-line log records, one or several consecutive records describe an independent event, and a log record describing an independent event can be called a log entry. A log file contains multiple log entries. A log entry usually contains the event time, event content, event type, event level, and so on.
通常,不同的对象生成的日志条目格式不同,例如,设备A生成的日志条目的格式为:事件发生时间、访问设备A的设备的标识和事件内容,设备A生成的一个日志条目包含20个字符,设备B生成的日志条目的格式为:访问设备B的设备的标识、事件发生时间和事件内容,设备B生成的一个日志包含50个字符。Usually, the formats of log entries generated by different objects are different. For example, the format of log entries generated by device A is: event occurrence time, identification of the device accessing device A, and event content. A log entry generated by device A contains 20 characters. , the format of the log entry generated by device B is: the identifier of the device accessing device B, the event occurrence time and the event content, and a log generated by device B contains 50 characters.
(2)神经网络,可以是由神经单元(也称为神经元)组成的,神经单元可以是指以变量x s和截距b为输入的运算单元,该运算单元的输出可以为: (2) Neural network, which can be composed of neural units (also called neurons). The neural unit can refer to the operation unit that takes the variable x s and the intercept b as input, and the output of the operation unit can be:
Figure PCTCN2021120446-appb-000003
Figure PCTCN2021120446-appb-000003
其中,s=1、2、……n,n为大于1的自然数,W s为x s的权重,b为神经单元的偏置。f为神经单元的激活函数(activation functions),用于将非线性特性引入神经网络中,来将神经单元中的输入信号转换为输出信号。该激活函数的输出信号可以作为下一层卷积层的输入。激活函数可以是sigmoid函数以及其他函数,在此不做限定。神经网络是将许多个上述单一的神经单元联结在一起形成的网络,即一个神经单元的输出可以是另一个神经单元的输入。每个神经单元的输入可以与前一层的局部接受域相连,来提取局部接受域的特征,局部接受域可以是由若干个神经单元组成的区域。 Among them, s=1, 2, ... n, n is a natural number greater than 1, W s is the weight of x s , and b is the bias of the neural unit. f is an activation function of the neural unit, which is used to introduce nonlinear characteristics into the neural network to convert the input signal in the neural unit into an output signal. The output signal of this activation function can be used as the input of the next convolutional layer. The activation function can be a sigmoid function and other functions, which are not limited here. A neural network is a network formed by connecting many of the above single neural units together, that is, the output of one neural unit can be the input of another neural unit. The input of each neural unit can be connected with the local receptive field of the previous layer to extract the features of the local receptive field, and the local receptive field can be an area composed of several neural units.
(3)损失函数,在训练模型的过程中,因为希望模型的输出尽可能的接近真正想要预测的值,所以可以通过比较当前模型的预测值和真正想要的目标值,再根据两者之间的差异情况来更新模型中的每一层神经网络的权重向量(当然,在第一次更新之前通常会有初始化 过程,即为模型中的各层预先配置参数),比如,如果模型的预测值高了,就调整权重向量让它预测低一些,不断的调整,直到模型能够预测出真正想要的目标值或与真正想要的目标值非常接近的值。因此,就需要预先定义“如何比较预测值和目标值之间的差异”,这便是损失函数(loss function)或目标函数(objective function),它们是用于衡量预测值和目标值的差异的重要函数。其中,以损失函数举例,损失函数的输出值(loss)越高表示差异越大,那么模型的训练过程就变成了尽可能缩小这个loss的过程。(3) Loss function, in the process of training the model, because it is hoped that the output of the model is as close as possible to the value you really want to predict, you can compare the predicted value of the current model with the target value you really want, and then according to the two to update the weight vector of each layer of the neural network in the model (of course, there is usually an initialization process before the first update, that is, pre-configuring parameters for each layer in the model), for example, if the model's When the predicted value is high, adjust the weight vector to make the prediction lower, and keep adjusting until the model can predict the real desired target value or a value that is very close to the real desired target value. Therefore, it is necessary to pre-define "how to compare the difference between the predicted value and the target value", which is the loss function (loss function) or objective function (objective function), which are used to measure the difference between the predicted value and the target value. important function. Among them, taking the loss function as an example, the higher the output value of the loss function (loss), the greater the difference, then the training process of the model becomes the process of reducing the loss as much as possible.
(4)预训练(pre-training),指一种通过使用大型数据集对初始模型进行训练,使初始模型学习到识别大型数据集中的通用特征的过程,预训练得到模型(以下简称为预训练模型)具有较强的泛化能力,其可以为后续模型在特定数据集上训练提供优质的模型参数,能够适应多种特定数据集。(4) Pre-training (pre-training) refers to a process in which the initial model is trained by using a large data set, so that the initial model can learn to identify common features in the large data set, and the model is obtained by pre-training (hereinafter referred to as pre-training). Model) has a strong generalization ability, which can provide high-quality model parameters for subsequent model training on a specific data set, and can adapt to a variety of specific data sets.
(5)微调(fine-tuning),指一种使用特定数据集对预训练模型进行进一步训练,得到应用于该特定数据集的训练好(trained)的模型的过程。通常情况下,微调阶段使用的特定数据集的数据量小于预训练阶段使用的大型数据集的数据量,例如,微调阶段使用的特定数据集的数据量与预训练阶段使用的大型数据集的数据量的比值为1:100、1:500、1:1000等,在此不做具体限定;微调阶段使用特定数据集对预训练模型进行训练的次数小于预训练阶段使用大型数据集对初始模型进行训练的次数,例如,为了得到预训练模型,在预训练阶段设置使用大型数据集对初始模型进行训练的次数为100次,为了得到训练好的模型,在微调阶段设置使用特定数据集对预训练模型进行训练的次数为20次。(5) Fine-tuning refers to a process of further training a pre-trained model using a specific data set to obtain a trained model applied to the specific data set. Typically, the amount of data for a specific dataset used in the fine-tuning stage is smaller than the amount of data for a larger dataset used in the pre-training stage, e.g. The ratio of the amount is 1:100, 1:500, 1:1000, etc., which is not specifically limited here; the number of times that the pre-training model is trained using a specific data set in the fine-tuning stage is less than that using a large data set in the pre-training stage. The number of times of training, for example, in order to get a pre-trained model, set the number of times to train the initial model using a large dataset to 100 times in the pre-training phase. The number of times the model is trained is 20 times.
(6)MLM方法,一种模型的预训练方法,该方法训练的模型可以学习到输入序列的语义信息,该语义信息保存在模型输出的与输入序列中的“CLS”这个词对应的向量中。这是由于MLM对输入序列进行掩码处理的策略与传统的预训练方法(如屏蔽序列到序列(masked sequence to sequence,MASS))进行掩码处理的策略不同,在MLM中,输入序列中的词会被随机选中15%的词,用特殊符号MASK替换随机选中的15%的词中80%的词,用随机词替换随机选中的15%的词中10%的词,剩下的10%不做替换操作。(6) MLM method, a pre-training method of the model, the model trained by this method can learn the semantic information of the input sequence, and the semantic information is stored in the vector output by the model corresponding to the word "CLS" in the input sequence. . This is because the strategy of masking the input sequence by MLM is different from that of traditional pre-training methods (such as masked sequence to sequence (MASS)). Words will be randomly selected 15% of the words, replace 80% of the randomly selected 15% words with the special symbol MASK, replace 10% of the randomly selected 15% of the words with random words, and the remaining 10% No replacement operation is performed.
举例来讲,如图2所示,输入序列为<CLS><举><头><望><明><月><低><头><思><故><乡>,该输入序列包括11个词,MLM从输入序列中随机选中的15%的词为输入序列中的第三个词“头”和第七个词“低”,MLM用特殊符号MASK替换掉了随机选中的输入序列中的第三个词“头”,用随机词“人”替换掉了随机选中的输入序列中的第七个词“低”。For example, as shown in Figure 2, the input sequence is <CLS><raise><head><look><ming><month><low><head><think><gu><xiang>, the input sequence Including 11 words, 15% of the words randomly selected by MLM from the input sequence are the third word "head" and the seventh word "low" in the input sequence, MLM replaces the randomly selected input with the special symbol MASK The third word in the sequence, "head", replaces the seventh word "low" in the randomly chosen input sequence with the random word "person".
在对模型进行预训练时,对输入到模型中进行训练的序列进行掩码处理的好处是可以提高预训练得到的模型的容错能力和推理准确率。When pre-training the model, the advantage of masking the sequences input to the model for training is that it can improve the fault tolerance and inference accuracy of the pre-trained model.
举例来讲,若在对模型进行预训练时,输入的第一序列为<CLS><举><头><望><明><月><低><头><思><故><乡>,即没有进行掩码处理的序列,使用模型对该第一序列进行学习,由于该第一序列中的每个词都没有被掩码,也就是说,第一序列中的每个词都是已知的,因此,模型仅需对第一序列中的词进行学习,无需对第一序列中的词的上下文进行学习,便可学习到第一序列的语义信息是“举头望明月低头思故乡”,因此,得到的预训练的模型通常不具有根据序列中词的上下文推理该序列的语义信息的能力。若后续使用该预训练的模型对第二序列<CLS><举><望><明><月><头><思><乡>的语义信息进行推理,可以看出,第二序列相比第一序列,缺少<头><低><故>这三个词,由于预训练的模型不具有根据序列中的词的上下文推理序列的语义信息的能力,因此,模型推理得到第二序列的语义信息是“举头望明月低头思故乡”的可能性较小,其推理得到第二序列的语义信息是“思乡”或者“望明月思乡”的可能性较 大,模型的容错能力以及准确率较低。For example, when pre-training the model, the first input sequence is <CLS><raise><head><look><ming><month><low><head><think><so>< Township>, that is, the sequence without mask processing, use the model to learn the first sequence, because each word in the first sequence is not masked, that is, each word in the first sequence are known, therefore, the model only needs to learn the words in the first sequence, without learning the context of the words in the first sequence, it can learn that the semantic information of the first sequence is "Looking up at the bright moon" Looking down and thinking about my hometown”, therefore, the resulting pre-trained model usually does not have the ability to reason about the semantic information of the sequence based on the context of the words in the sequence. If the pre-trained model is used to infer the semantic information of the second sequence <CLS><ju><wang><ming><month><head><thinking><xiang>, it can be seen that the second sequence is similar to Compared with the first sequence, the three words <head><lower><so> are missing. Since the pre-trained model does not have the ability to infer the semantic information of the sequence according to the context of the words in the sequence, the model inference obtains the second sequence It is less likely that the semantic information of the second sequence is "looking up at the bright moon and thinking about the hometown", and it is more likely that the semantic information of the second sequence is "homesick" or "looking at the bright moon and homesick", the fault tolerance of the model and the accuracy rate is lower.
若在对模型进行预训练时,输入的第一序列为<CLS><举><MASK><望><MASK><月><低><头><思><故><MASK>,即掩码处理后的序列,使用模型对该第一序列进行学习,由于该第一序列中部分词(即<头><明><乡>)被掩码,也就是说,第一序列中部分词是未知的,另一部分词是已知的,因此,模型不仅需要对第一序列中已知的词进行学习,还要根据第一序列中被掩码的词的上下文对被掩码的词进行学习,学习到被掩码的词之后,才能学习到第一序列的语义信息是“举头望明月低头思故乡”,因此,得到的预训练的模型通常具有根据序列中词的上下文推理该序列的语义信息的能力。若后续使用该预训练的模型对第二序列<CLS><举><望><明><月><头><思><乡>的语义信息进行推理,可以看出,第二序列相比第一序列,缺少<低><故>这两个词,多了<明><乡>这两个词,由于预训练的模型具有根据序列中词的上下文推理该序列的语义信息的能力,因此,模型推理得到第二序列的语义信息是“举头望明月低头思故乡”的可能性较大,其推理得到第二序列的语义信息是“思乡”或者“望明月思乡”的可能性较小,模型的容错能力以及准确率较高。If the model is pre-trained, the first input sequence is <CLS><raise><MASK><look><MASK><month><low><head><think><so><MASK>, that is After masking the sequence, use the model to learn the first sequence, because some words in the first sequence (ie <head><ming><township>) are masked, that is to say, part of the first sequence The word is unknown, and another part of the word is known. Therefore, the model not only needs to learn the known words in the first sequence, but also learn the masked words according to the context of the masked words in the first sequence. After learning and learning the masked words, the semantic information of the first sequence can be learned as "looking up at the bright moon and bowing your head and thinking about hometown". Therefore, the obtained pre-trained model usually has the ability to reason about the context of the words in the sequence. The ability to sequence semantic information. If the pre-trained model is used to infer the semantic information of the second sequence <CLS><ju><wang><ming><month><head><thinking><xiang>, it can be seen that the second sequence is similar to Compared with the first sequence, the two words <low> and <hence> are missing, and the two words <ming> and <township> are more, because the pre-trained model has the ability to infer the semantic information of the sequence according to the context of the words in the sequence , therefore, it is more likely that the semantic information of the second sequence of model inference is "looking up at the bright moon and bowing your head to think of hometown", and the semantic information of the second sequence obtained by its inference is the possibility of "homesickness" or "homesickness while looking at the bright moon" The smaller the model, the higher the fault tolerance and accuracy of the model.
(7)日志异常检测,指对日志条目中包括的事件信息进行检测,确定日志条目中包括的事件信息是否为生成该日志条目的设备异常的信息,在确定日志条目中包括的事件信息为设备异常的信息的情况下,确定设备出现了异常。(7) Log anomaly detection, which refers to detecting the event information included in the log entry, to determine whether the event information included in the log entry is the abnormal information of the device that generated the log entry, and determining whether the event information included in the log entry is the device In the case of abnormal information, it is determined that the device is abnormal.
(8)初始日志异常检测模型,指未利用日志训练样本进行训练之前的模型(也可以称为算法),利用日志训练样本对初始日志异常检测模型进行训练的目的,是得到能够对日志进行异常检测的模型。(8) The initial log anomaly detection model refers to a model (also called an algorithm) that is not trained using log training samples. detected model.
随着系统规模的不断增大,很难保证大规模系统的稳定性和可靠性。另外,网络环境日益复杂,各类新型攻击不断涌现。而异常检测是保障系统安全的支撑技术之一。在系统运行过程中,系统的硬盘、处理器和为系统提供网络服务的网络设备等会产生各类日志文件来记录系统运行的状态和发生的事件,日志包含丰富的信息,大量日志数据里蕴藏着巨大信息量为系统的异常检测提供了一条途径,使得日志异常检测成为异常检测领域的研究热点。其中,使用日志训练样本训练得到训练好的日志异常检测模型对日志进行异常检测,是目前比较流行的日志异常检测方法。With the continuous increase of the system scale, it is difficult to ensure the stability and reliability of large-scale systems. In addition, the network environment is becoming increasingly complex, and various new types of attacks are emerging. Anomaly detection is one of the supporting technologies to ensure system security. During the operation of the system, the system's hard disk, processor, and network devices that provide network services for the system will generate various log files to record the system operation status and events. The log contains rich information, and a large amount of log data contains The huge amount of information provides a way for system anomaly detection, making log anomaly detection a research hotspot in the field of anomaly detection. Among them, using log training samples to train a trained log anomaly detection model to perform log anomaly detection is a relatively popular log anomaly detection method at present.
但是,现有的日志异常检测模型的训练方法训练得到的日志异常检测模型存在泛化能力较低的问题,导致用户在实现对相似特定对象生成的日志进行异常检测时,用户无法利用已经训练得到的日志异常检测模型,只能重新获取相似特定对象生成的历史日志作为训练样本对初始日志异常检测模型进行重新训练,得到针对相似特定对象生成的日志具有较好检测效果的日志异常检测模型,训练得到新模型的过程通常会耗费大量的人力和时间成本,效率较低。However, the log anomaly detection model trained by the existing log anomaly detection model training method has the problem of low generalization ability, so that when the user implements anomaly detection on logs generated by similar specific objects, the user cannot use the already trained log anomaly detection model. The log anomaly detection model can only re-acquire the historical logs generated by similar specific objects as training samples to retrain the initial log anomaly detection model, and obtain a log anomaly detection model with better detection effect for logs generated by similar specific objects. The process of obtaining a new model usually consumes a lot of manpower and time, and is inefficient.
例如,在用户已经训练得到针对硬盘生成的日志具有较好检测效果的训练好的日志异常检测模型A的情况下,若用户想要对内存生成的日志进行异常检测,由于模型A虽然针对硬盘生成的日志具有较好检测效果,但针对内存生成的日志通常检测效果较差,用户即便利用内存生成的历史日志作为训练样本继续对模型A进行训练,由于模型A的泛化能力较低,通常也无法得到针对内存生成的日志具有较好检测效果的日志异常检测模型,用户只能重新获取内存生成的历史日志作为训练样本,使用重新获取的训练样本训练初始日志异常检测模型,得到针对内存生成的日志具有较好检测效果的训练好的日志异常检测模型。For example, if the user has already trained a trained log anomaly detection model A that has better detection effect on logs generated by the hard disk, if the user wants to perform anomaly detection on the logs generated by the memory, although the model A is generated for the hard disk The log generated by the memory has a good detection effect, but the log generated by the memory usually has a poor detection effect. Even if the user uses the historical log generated by the memory as a training sample to continue to train the model A, due to the low generalization ability of the model A, it is usually A log anomaly detection model with better detection effect for logs generated in memory cannot be obtained. Users can only re-acquire historical logs generated in memory as training samples, and use the re-acquired training samples to train the initial log anomaly detection model, and obtain a log anomaly detection model for memory-generated logs. The log has a well-trained log anomaly detection model with better detection effect.
又例如,在用户已经训练得到针对B厂家生产的B型号的硬盘(以下简称为B硬盘)生 成的日志具有较好检测效果的训练好的日志异常检测模型B(以下简称为模型B)的情况下,若用户想要对C厂家生产的C型号的硬盘(以下简称为C硬盘)生成的日志进行异常检测,由于模型B虽然针对B硬盘生成的日志具有较好检测效果,但针对C硬盘生成的日志通常检测效果较差,用户即便利用C硬盘生成的历史日志作为训练样本继续对模型B进行训练,由于模型B的泛化能力较低,通常也无法得到针对C硬盘生成的日志具有较好检测效果的日志异常检测模型,用户只能重新获取C硬盘生成的历史日志作为训练样本,使用重新获取的训练样本训练初始日志异常检测模型,得到针对C硬盘生成的日志具有较好检测效果的训练好的日志异常检测模型。For another example, in the case where the user has trained a trained log anomaly detection model B (hereinafter referred to as model B) that has a better detection effect on the logs generated by the hard disk of type B produced by manufacturer B (hereinafter referred to as the B hard disk) Next, if the user wants to perform anomaly detection on the logs generated by the hard disk of model C (hereinafter referred to as the hard disk C) produced by the manufacturer C, although the model B has a good detection effect for the logs generated by the hard disk B, the The detection effect of the logs generated by the hard disk is usually poor. Even if the user continues to train the model B using the historical logs generated by the C hard disk as the training samples, due to the low generalization ability of the model B, it is usually impossible to obtain the logs generated for the C hard disk. For the log anomaly detection model with detection effect, the user can only re-acquire the historical logs generated by the C hard disk as training samples, and use the re-acquired training samples to train the initial log anomaly detection model, and obtain the training with better detection effect for the logs generated by the C hard disk. Good log anomaly detection model.
针对上述现有的日志异常检测模型训练方法存在的问题,本申请提供一种日志异常检测模型训练方法,如图3所示,本申请提供的模型训练方法包括预训练和微调两个阶段,预训练阶段可以使用来自目标对象(目标对象包括多个目标子对象)的第一日志样本集,对初始日志异常检测模型进行预训练,得到具有优质的模型参数以及较强的泛化能力的模型,微调阶段使用来自目标子对象(目标子对象属于目标对象)的第二日志样本集,对预训练的日志异常检测模型进行微调即可得到针对目标子对象的训练好的日志异常检测模型,相较于现有技术,本申请提供的模型训练方法可以解决现有技术存在的训练得到的日志异常检测模型泛化能力较低的问题,达到提高模型训练的效率的目的。In view of the problems existing in the above existing log anomaly detection model training methods, the present application provides a log anomaly detection model training method. As shown in FIG. 3 , the model training method provided by the present application includes two stages: pre-training and fine-tuning. In the training phase, the first log sample set from the target object (the target object includes multiple target sub-objects) can be used to pre-train the initial log anomaly detection model to obtain a model with high-quality model parameters and strong generalization ability. In the fine-tuning stage, the second log sample set from the target sub-object (the target sub-object belongs to the target object) is used to fine-tune the pre-trained log anomaly detection model to obtain a trained log anomaly detection model for the target sub-object. In view of the prior art, the model training method provided by the present application can solve the problem of low generalization ability of the log anomaly detection model obtained by training in the prior art, and achieve the purpose of improving the efficiency of model training.
其中,目标对象包括但不仅限于如下子对象中的至少一种:硬盘、内存、闪存、网络设备和处理器;目标子对象为所述目标对象中的任意一种类型的子对象。容易理解,上述目标对象和目标子对象,只是一种示例性的举例,本申请不做具体限制。Wherein, the target object includes but is not limited to at least one of the following sub-objects: hard disk, memory, flash memory, network device and processor; the target sub-object is any type of sub-object in the target object. It is easy to understand that the above-mentioned target object and target sub-object are merely exemplary examples, which are not specifically limited in this application.
以目标对象只包括硬盘这一种子对象为例,在目标对象包括硬盘一种子对象的情况下,目标子对象可以为不同型号的硬盘。Taking the target object including only the hard disk sub-object as an example, in the case that the target object includes a hard disk sub-object, the target sub-object may be hard disks of different models.
以目标对象包括硬盘和内存这两种子对象为例,在目标对象包括硬盘和内存两种子对象的情况下,目标子对象可以为硬盘或者内存。Taking the target object including two sub-objects of hard disk and memory as an example, when the target object includes two sub-objects of hard disk and memory, the target sub-object may be hard disk or memory.
可以理解,在对模型进行训练时,使用的训练样本越多,往往训练得到的模型的参数越优质,使用的训练样本的来源越广,往往训练得到的模型的泛化能力越强,因此,在本实施例中,目标对象可以包括尽量多的子对象,从目标对象获取的第一日志样本集的数据量也可以尽量大,这样得到的预训练的日志异常检测模型的参数也就越优质,模型的泛化能力也就越强。It can be understood that when training a model, the more training samples are used, the better the parameters of the trained model, the wider the source of training samples used, and the stronger the generalization ability of the trained model. Therefore, In this embodiment, the target object may include as many sub-objects as possible, and the data volume of the first log sample set obtained from the target object may also be as large as possible, so that the parameters of the pre-trained log anomaly detection model obtained in this way are of better quality. , the generalization ability of the model is stronger.
为了便于更清楚地理解本申请提供的日志异常检测模型训练方法,下面结合图4所示的流程示意图对本申请提供的方法进行详细描述,如图4所示,该方法包括如下步骤:In order to facilitate a clearer understanding of the log anomaly detection model training method provided by the present application, the method provided by the present application will be described in detail below with reference to the schematic flowchart shown in FIG. 4 . As shown in FIG. 4 , the method includes the following steps:
S401、计算设备获取包括m个第一日志样本的第一日志样本集,第一日志样本集是对目标对象的日志数据进行处理得到的。S401. The computing device acquires a first log sample set including m first log samples, where the first log sample set is obtained by processing log data of a target object.
由上文对日志条目的介绍可知,日志条目包含事件发生时间、事件内容、事件类型、事件级别等等,其中,事件内容能够反映发生的事件,通过对事件内容进行分析,可以获知发生的事件是否异常,而事件发生时间、事件类型和事件级别等,对判断事件是否异常的作用比较小。From the above description of log entries, it can be seen that log entries include event occurrence time, event content, event type, event level, etc., among which the event content can reflect the event that occurred. By analyzing the event content, you can know the event that occurred. Whether the event is abnormal or not, the event time, event type, and event level have little effect on judging whether the event is abnormal.
因此,在一种可能的实现方式中,计算设备可以从大量目标对象的日志数据包括的大量日志条目中获取m个第一事件内容作为m个第一日志样本,删除大量日志条目中除事件内容之外的事件发生时间、事件类型、事件级别等其他部分。其中,大量目标对象的日志数据可以是计算设备从互联网上爬虫得到,或者人工从目标对象收集得到,在此不做限制。Therefore, in a possible implementation manner, the computing device may obtain m first event contents from a large number of log entries included in the log data of a large number of target objects as m first log samples, and delete the event contents except for the event contents in the large number of log entries. other parts other than the event occurrence time, event type, event level, etc. The log data of a large number of target objects may be obtained by a computing device from crawlers on the Internet, or collected manually from the target objects, which is not limited herein.
举例来讲,假设目标对象的日志数据包括日志条目A:2021/06/03Thu 18:18:33PD_Vendor Done Check done,0xd ms.Flag 8ALL,其中,2021/06/03Thu 18:18:33为事件发生时间,PD_Vendor Done Check done,0xd ms.Flag 8为事件内容,ALL为事件级别,则计算设备从日志条目A中获取的第一日志样本为PD_Vendor Done Check done,0xd ms.Flag 8。For example, suppose the log data of the target object includes log entry A: 2021/06/03Thu 18:18:33PD_Vendor Done Check done,0xd ms.Flag 8ALL, where 2021/06/03Thu 18:18:33 is the event occurrence time, PD_Vendor Done Check done, 0xd ms.Flag 8 is the event content, ALL is the event level, then the first log sample obtained by the computing device from log entry A is PD_Vendor Done Check done, 0xd ms.Flag 8.
可以理解,从大量目标对象的日志数据中获取m个第一事件内容作为m个第一日志样本,删除大量目标对象的日志数据中除事件内容之外的事件发生时间、事件类型、事件级别等其他部分,可以屏蔽m个第一日志样本的格式之间的差异,达到增加第一日志样本集中的第一日志样本的数量的目的。It can be understood that m first event contents are obtained from the log data of a large number of target objects as m first log samples, and the event occurrence time, event type, event level, etc. in the log data of a large number of target objects except the event contents are deleted. For other parts, the difference between the formats of the m first log samples can be shielded, so as to achieve the purpose of increasing the number of first log samples in the first log sample set.
可以理解,m的值越大,预训练的日志异常检测模型的参数越优质,模型的泛化能力越强,因此,在具体实现中,计算设备可以获取尽可能多的第一日志样本添加到第一日志样本集。其中,m为大于1的自然数。It can be understood that the larger the value of m, the better the parameters of the pre-trained log anomaly detection model and the stronger the generalization ability of the model. Therefore, in the specific implementation, the computing device can obtain as many first log samples as possible and add them to the log anomaly detection model. The first log sample set. Among them, m is a natural number greater than 1.
需要说明的是,上述从大量目标对象的日志数据中获取m个第一事件内容作为m个第一日志样本仅仅是作为一种示例,不应视为具体限定。在具体实现中,计算设备也可以从大量目标对象的日志数据中选出m个格式相同的日志条目直接作为m个第一日志样本,或者从大量目标对象的日志数据中获取m个预设内容作为m个第一日志样本,其中,预设内容除了包括日志条目中的事件内容之外,还包括日志条目中除事件内容之外的其他内容,如事件级别和/或事件类型。It should be noted that the above-mentioned acquisition of m first event contents from log data of a large number of target objects as m first log samples is only an example, and should not be regarded as a specific limitation. In a specific implementation, the computing device may also select m log entries with the same format from the log data of a large number of target objects directly as m first log samples, or obtain m preset contents from the log data of a large number of target objects As the m first log samples, the preset content includes, in addition to the event content in the log entry, other content other than the event content in the log entry, such as event level and/or event type.
S402、计算设备分别对m个第一日志样本进行分词,得到m个日志样本对应的m个词序列。S402. The computing device performs word segmentation on the m first log samples, respectively, to obtain m word sequences corresponding to the m log samples.
以计算设备对m个第一日志样本中的第i个第一日志样本进行分词为例,计算设备得到第i个第一日志样本对应的词序列的过程包括:Taking the computing device performing word segmentation on the i-th first log sample among the m first log samples as an example, the process for the computing device to obtain the word sequence corresponding to the i-th first log sample includes:
S4021、对第i个第一日志样本进行分词,得到第一词序列。S4021. Perform word segmentation on the i-th first log sample to obtain a first word sequence.
继续以第i个第一日志样本为PD_Vendor Done Check done,0xd ms.Flag 8为例,计算设备对该日志样本进行分词,得到的第一词序列为:Continue to take the i-th first log sample as PD_Vendor Done Check done, 0xd ms.Flag 8 as an example, the computing device performs word segmentation on the log sample, and the obtained first word sequence is:
<PD_VendorDone><Check><done><0xd><ms><Flag><8>,如图5所示。<PD_VendorDone><Check><done><0xd><ms><Flag><8>, as shown in Figure 5.
S4022、在第一词序列包括数字和字符组成的混合词的情况下,将第一词序列中由数字和字符组成的混合词替换成词number,得到第二词序列,在第一词序列不包括混合词的情况下,无需进行替换操作,直接将第一词序列确定为第二词序列。S4022, in the case that the first word sequence includes a mixed word composed of numbers and characters, replace the mixed word composed of numbers and characters in the first word sequence with the word number to obtain a second word sequence, where the first word sequence does not In the case of including mixed words, no replacement operation is required, and the first word sequence is directly determined as the second word sequence.
继续以S4021中所举的例子为例,可以看出,第一词序列<PD_VendorDone><Check><done><0xd><ms><Flag><8>包括混合词0xd,则计算设备对第一词序列<PD_VendorDone><Check><done><0xd><ms><Flag><8>中的混合词0xd进行替换后得到的第二词序列为:Continuing to take the example in S4021 as an example, it can be seen that if the first word sequence <PD_VendorDone><Check><done><0xd><ms><Flag><8> includes the mixed word 0xd, the computing device will The second word sequence obtained after replacing the mixed word 0xd in the word sequence <PD_VendorDone><Check><done><0xd><ms><Flag><8> is:
<PD_VendorDone><Check><done><number><ms><Flag><8>,如图5所示。<PD_VendorDone><Check><done><number><ms><Flag><8>, as shown in Figure 5.
需要说明的是,上述用number替换混合词仅仅是作为一种示例,在具体实现中,还可以用num或者sep等其他词替换混合词,此处不作具体限定。It should be noted that the above-mentioned replacement of the compound word with number is only an example, and in a specific implementation, the compound word may also be replaced with other words such as num or sep, which is not specifically limited here.
S4023、在得到第二词序列后,在第二词序列的句首添加一个特殊的分类标记,即CLS标记,得到第三词序列。S4023 , after obtaining the second word sequence, add a special classification mark, ie, a CLS mark, to the beginning of the sentence of the second word sequence to obtain a third word sequence.
其中,得到的第三次序列句首的CLS标记标志着第三词序列的开始。Among them, the obtained CLS mark at the beginning of the third sequence sentence marks the beginning of the third word sequence.
继续以S4022中所举的例子为例,计算设备在第二词序列<PD_VendorDone><Check><done><number><ms><Flag><8>的句首添加CLS标记,得到的第三 词序列为:Continuing to take the example in S4022 as an example, the computing device adds a CLS tag to the beginning of the sentence of the second word sequence <PD_VendorDone><Check><done><number><ms><Flag><8>, and the obtained third The word sequence is:
<CLS><PD_VendorDone><Check><done><number><ms><Flag><8>,如图5所示。S4024、在得到第三词序列后,在第三词序列包括的词的数量小于预设阈值的情况下,在第三词序列的句尾添加pad标记得到第四词序列,使得第四词序列包括的词的数量等于预设阈值,在第三词序列包括的词的数量等于预设阈值的情况下,直接将第三词序列确定为第四词序列,在第三词序列包括的词的数量大于预设阈值的情况下,将第三词序列句尾部分的词截掉得到第四词序列,使得第四词序列包括的词的数量等于预设阈值。<CLS><PD_VendorDone><Check><done><number><ms><Flag><8>, as shown in Figure 5. S4024. After the third word sequence is obtained, if the number of words included in the third word sequence is less than the preset threshold, add a pad mark to the end of the sentence of the third word sequence to obtain a fourth word sequence, so that the fourth word sequence The number of included words is equal to the preset threshold, and when the number of words included in the third word sequence is equal to the preset threshold, the third word sequence is directly determined as the fourth word sequence, and the third word sequence is determined to be the fourth word sequence. When the number is greater than the preset threshold, the words at the end of the third word sequence are truncated to obtain a fourth word sequence, so that the number of words included in the fourth word sequence is equal to the preset threshold.
继续以S4023中所举的例子为例,假设预设阈值为10,可以看出,第三词序<CLS><PD_VendorDone><Check><done><number><ms><Flag><8>包括的词的数量为8,小于预设阈值10,则计算设备在第三词序列<CLS><PD_VendorDone><Check><done><number><ms><Flag><8>的句尾添加两个pad标记,得到的第四词序列为:Continuing to take the example in S4023 as an example, assuming that the preset threshold is 10, it can be seen that the third word sequence <CLS><PD_VendorDone><Check><done><number><ms><Flag><8> includes The number of words in 8 is 8, which is less than the preset threshold of 10, then the computing device adds two words at the end of the sentence of the third word sequence <CLS><PD_VendorDone><Check><done><number><ms><Flag><8> pad mark, the obtained fourth word sequence is:
<CLS><PD_VendorDone><Check><done><number><ms><Flag><8><pad><pad>,如图5所示。<CLS><PD_VendorDone><Check><done><number><ms><Flag><8><pad><pad>, as shown in Figure 5.
需要说明的是,上述用pad替换混合词仅仅是作为一种示例,在具体实现中,还可以用PAD或者pa等其他词替换混合词,此处不作具体限定;上述预设阈值为10仅仅是作为一种示例,在具体实现中,其还可以为20、50等,此处不作具体限定。It should be noted that the above-mentioned replacement of mixed words with pad is only an example. In specific implementation, other words such as PAD or pa can also be used to replace mixed words, which are not specifically limited here; the above preset threshold of 10 is only As an example, in a specific implementation, it may also be 20, 50, etc., which is not specifically limited here.
S4025、在得到第四词序列后,使用预设词典对第四词序列中的每个词进行转换,得到第五词序列,即第i个第一日志样本对应的词序列。S4025. After obtaining the fourth word sequence, use a preset dictionary to convert each word in the fourth word sequence to obtain a fifth word sequence, that is, the word sequence corresponding to the i-th first log sample.
其中,预设词典包括大量词以及大量词及其对应的令牌(token)标识(identification,简称为ID)之间的对应关系,例如,词“Check”、令牌ID“6”以及词“Check”和令牌ID“6”之间的对应关系。Wherein, the preset dictionary includes a large number of words and the correspondence between the large number of words and their corresponding token (identification, ID for short), for example, the word "Check", the token ID "6" and the word " Correspondence between Check" and token ID "6".
继续以S4024中所举的例子为例,假设预设词典包括的CLS对应的令牌ID为1,pad对应的令牌ID为0,PD_VendorDone对应的令牌ID为5,Check对应的令牌ID为6,done对应的令牌ID为7,number对应的令牌ID为4,ms对应的令牌ID为8,Flag对应的令牌ID为9,8对应的令牌ID为10,pad对应的令牌ID为0,则计算设备使用预设词典对第四词序列<CLS><PD_VendorDone><Check><done><number><ms><Flag><8><pad><pad>进行转换,得到的第五词序列为:Continue to take the example in S4024 as an example, assuming that the token ID corresponding to CLS included in the preset dictionary is 1, the token ID corresponding to pad is 0, the token ID corresponding to PD_VendorDone is 5, and the token ID corresponding to Check is 5. is 6, the token ID corresponding to done is 7, the token ID corresponding to number is 4, the token ID corresponding to ms is 8, the token ID corresponding to Flag is 9, the token ID corresponding to 8 is 10, and the token ID corresponding to pad is 10. The token ID is 0, then the computing device uses the preset dictionary to perform the fourth word sequence <CLS><PD_VendorDone><Check><done><number><ms><Flag><8><pad><pad> Conversion, the fifth word sequence obtained is:
<1><5><6><7><4><8><9><10><0><0>,如图5所示。<1><5><6><7><4><8><9><10><0><0>, as shown in Figure 5.
在本申请具体的实施例中,在使用预设词典对第四词序列中的词进行转换时,若预设词典中不存在第四词序列中的某个词以及与该词对应的令牌ID的情况下,可以在预设词典中新增该词以及该词对应的令牌ID。In a specific embodiment of the present application, when using a preset dictionary to convert words in the fourth word sequence, if there is no certain word in the fourth word sequence and a token corresponding to the word in the preset dictionary In the case of ID, the word and the token ID corresponding to the word can be added to the preset dictionary.
举例来讲,若预设词典包括的最大的令牌ID为“100000”,预设词典中不包括词“identification”的令牌ID,则可以在预设词典中新增词“identification”以及该词对应的令牌ID“100001”或者“100008”等。For example, if the maximum token ID included in the default dictionary is "100000", and the token ID of the word "identification" is not included in the default dictionary, the word "identification" and the word "identification" can be added to the default dictionary. The token ID corresponding to the word is "100001" or "100008", etc.
需要说明的是,上述获取第i个第一日志样本对应的词序列的过程仅仅是作为一种示例,在具体实现中,S4023可以在S4022的前面执行,或者,S4024可以在S4022的前面执行,此处不作具体限定。It should be noted that the above process of obtaining the word sequence corresponding to the i-th first log sample is only an example. In a specific implementation, S4023 can be executed before S4022, or S4024 can be executed before S4022, There is no specific limitation here.
可以理解,在计算设备对m个第一日志样本中的每个第一日志样本均执行了步骤S4021至步骤S4025的情况下,计算设备可以得到m个词序列,且m个词序列包括的词的数量相 等。It can be understood that when the computing device performs steps S4021 to S4025 for each of the m first log samples, the computing device can obtain m word sequences, and the words included in the m word sequences are equal in number.
S403、计算设备通过m个词序列,对初始日志异常检测模型进行预训练,得到预训练的日志异常检测模型。S403. The computing device pre-trains the initial log anomaly detection model through m word sequences, to obtain a pre-trained log anomaly detection model.
在本申请具体的实施例中,计算设备通过m个词序列,对初始日志异常检测模型进行预训练的方法可以为MLM方法或者MASS方法等。在此,不做具体限制。In a specific embodiment of the present application, the method for pre-training the initial log anomaly detection model by the computing device through m word sequences may be the MLM method or the MASS method. Here, no specific limitation is made.
以采用的预训练方法为MLM方法为例,计算设备对初始日志异常检测模型进行预训练,得到预训练的日志异常检测模型的过程具体可以包括如下步骤:Taking the pre-training method adopted as the MLM method as an example, the computing device pre-trains the initial log anomaly detection model, and the process of obtaining the pre-trained log anomaly detection model may specifically include the following steps:
S4031、计算设备分别对m个词序列中预设比例的词进行掩码处理,获取掩码处理后的m个词序列。S4031. The computing device respectively performs mask processing on words in a preset proportion in the m word sequences, and obtains m word sequences after mask processing.
其中,预设比例可以为10%、15%、20%等。The preset ratio may be 10%, 15%, 20%, and the like.
计算设备分别对m个词序列中预设比例的词进行掩码处理的操作可以参考上文对MLM方法的相关描述。For the operation that the computing device respectively performs mask processing on the words in the m word sequences with preset proportions, reference may be made to the relevant description of the MLM method above.
S4032、计算设备通过掩码处理后的m个词序列,对初始日志异常检测模型进行预训练,得到预训练的日志异常检测模型。S4032: The computing device pre-trains the initial log anomaly detection model through the masked m word sequences, to obtain a pre-trained log anomaly detection model.
在本申请具体的实施例中,计算设备通过掩码处理后的m个词序列,对初始日志异常检测模型进行预训练,得到预训练的日志异常检测模型的具体过程可以包括如下步骤:In a specific embodiment of the present application, the computing device pre-trains the initial log anomaly detection model through the masked m word sequences, and the specific process of obtaining the pre-trained log anomaly detection model may include the following steps:
S1、计算设备分别获取掩码处理后的m个词序列中的每个词对应的词嵌入向量和位置嵌入向量。S1. The computing device separately obtains a word embedding vector and a position embedding vector corresponding to each word in the m word sequences after mask processing.
上述每个词对应的词嵌入(word embedding)向量,为用于表示每个词的多维向量。其中,词嵌入向量,是自然语言处理领域中的一组语言建模和特征学习技术的统称,是用来将语言中的词进行数学化的一种方式,顾名思义,它的任务是把一个词转换成一个多维向量。在具体实现中,每个词对应的词嵌入向量可以通过独热编码(one-hot encoding)方式获取,也可以通过词到向量(Word2Vec)模型或者手套(Glove)模型获取,每个词对应的词嵌入向量的维度可以为256维度或者512维度,也可以为更多或者更少维度,此处不作具体限定。The word embedding vector corresponding to each word above is a multi-dimensional vector used to represent each word. Among them, word embedding vector is a general term for a set of language modeling and feature learning technologies in the field of natural language processing. Convert to a multidimensional vector. In the specific implementation, the word embedding vector corresponding to each word can be obtained by one-hot encoding, or it can be obtained by word-to-vector (Word2Vec) model or glove (Glove) model. The dimension of the word embedding vector may be 256 dimensions or 512 dimensions, and may also be more or less dimensions, which are not specifically limited here.
以掩码处理后的第i个词序列为<1><MASK><6><7><4><9><9><10><0><0>为例,如图6所示,假设获取掩码处理后的第i个词序列中的每个词对应的词嵌入向量的方式为Word2Vec模型,词嵌入向量的维度为5维,则获取的掩码处理后的第i个词序列中的“1”这个词可以为5维向量V i,1(-0.065,-0.035,0.019,-0.026,0.085),“MASK”这个词可以为5维向量V i,2(0.000,0.000,0.000,0.000,0.000),…,句尾的“0”这个词可以为5维向量V i,10(-0.027,-0.013,0.006,0.023,0.014),如图7所示。其中,5维向量V i,t的下标中的t表示每个词在第i个词序列中的位置。 Take the i-th word sequence after mask processing as <1><MASK><6><7><4><9><9><10><0><0> as an example, as shown in Figure 6 , assuming that the method of obtaining the word embedding vector corresponding to each word in the i-th word sequence after mask processing is the Word2Vec model, and the dimension of the word embedding vector is 5 dimensions, then the obtained mask-processing i-th word The word "1" in the sequence can be a 5-dimensional vector Vi ,1 (-0.065, -0.035, 0.019, -0.026, 0.085), and the word "MASK" can be a 5-dimensional vector Vi ,2 (0.000, 0.000 , 0.000, 0.000, 0.000), ..., the word "0" at the end of the sentence can be a 5-dimensional vector Vi ,10 (-0.027, -0.013, 0.006, 0.023, 0.014), as shown in Figure 7. Among them, the t in the subscript of the 5-dimensional vector V i, t represents the position of each word in the i-th word sequence.
需要说明的是,上述所举例的词嵌入向量中的数值包括三位小数仅仅是作为一种示例,在具体实现中,其可以包括更少或者更多的小数位,在此不做具体限制。It should be noted that the above-exemplified value in the word embedding vector includes three decimal places only as an example, and in a specific implementation, it may include fewer or more decimal places, which is not specifically limited herein.
上述每个词对应的位置嵌入(position embedding)向量,用于表示每个词在词序列中的位置,其维度与词嵌入向量的维度相同。The position embedding vector corresponding to each word above is used to represent the position of each word in the word sequence, and its dimension is the same as that of the word embedding vector.
在一种可能的实现方式中,可以通过如下公式获取一个词的位置嵌入向量:In a possible implementation, the position embedding vector of a word can be obtained by the following formula:
PE(pos,2j)=sin(pos/10000 (2j/dmodel)) PE(pos,2j)=sin(pos/10000 (2j/dmodel) )
PE(pos,2j+1)=cos(pos/10000 (2j/dmodel)) PE(pos,2j+1)=cos(pos/10000 (2j/dmodel) )
其中,PE()表示位置嵌入向量,pos表示词在词序列中的位置,其取值范围是[0,词序列中包括的词的数量),d model表示位置嵌入向量的维度,2j表示位置嵌入向量的偶数维度索引, 2j+1表示位置嵌入向量的奇数维度索引,以位置嵌入向量的维度d model为5进行举例,在位置嵌入向量的维度d model为5的情况下,j分别取0、1、2,在j取0时,计算出的PE(pos,2j)(即PE(pos,0))的值即为该位置嵌入向量的第零维度的数值,PE(pos,2j+1)(即PE(pos,1))的值即为该位置嵌入向量的第一维度的数值,在j取1时,计算出的PE(pos,2j)(即PE(pos,2))的值即为该位置嵌入向量的第二维度的数值,计算出的PE(pos,2j+1)(即PE(pos,3))的值即为该位置嵌入向量的第三维度的数值,在j取2时,计算出的PE(pos,2j)(即PE(pos,4))的值即为该位置嵌入向量的第四维度的数值。 Among them, PE() represents the position embedding vector, pos represents the position of the word in the word sequence, and its value range is [0, the number of words included in the word sequence), d model represents the dimension of the position embedding vector, and 2j represents the position The even-numbered dimension index of the embedding vector, 2j+1 represents the odd-numbered dimension index of the position embedding vector, and the dimension d model of the position embedding vector is 5 as an example. When the dimension d model of the position embedding vector is 5, j takes 0 respectively. , 1, 2, when j is 0, the calculated value of PE(pos, 2j) (ie PE(pos, 0)) is the value of the zeroth dimension of the embedded vector at this position, PE(pos, 2j+ 1) (ie PE(pos, 1)) is the value of the first dimension of the embedded vector at this position. When j is 1, the calculated PE(pos, 2j) (ie PE(pos, 2)) The value of is the value of the second dimension of the embedding vector at this position, and the calculated value of PE(pos,2j+1) (ie PE(pos,3)) is the value of the third dimension of the embedding vector at this position, When j is 2, the calculated value of PE(pos, 2j) (that is, PE(pos, 4)) is the value of the fourth dimension of the embedded vector at the position.
继续以图6所示的掩码处理后的第i个词序列为<1><MASK><6><7><4><9><9><10><0><0>、要获取的位置嵌入向量的维度d model为5为例,通过上述公式获取的掩码处理后的第i个词序列中的“1”这个词对应的位置嵌入向量V i,1'为: Continue to use the masked i-th word sequence shown in Figure 6 as <1><MASK><6><7><4><9><9><10><0><0>, For example, the dimension d model of the obtained position embedding vector is 5. The position embedding vector V i,1 ' corresponding to the word "1" in the i-th word sequence after mask processing obtained by the above formula is:
((PE(0,2*0)=sin(0/10000 (2*0/5)),(PE(0,2*0+1)=cos(0/10000 (2*0/5)),(PE(0,2*1)=sin(0/10000 (2*1/5)),(PE(0,2*1+1)=cos(0/10000 (2*1/5)),(PE(0,2*2)=sin(0/10000 (2*2/5))),即(0.000,1.000,0.000,1.000,0.000),如图7所示; ((PE(0,2*0)=sin(0/10000 (2*0/5) ), (PE(0,2*0+1)=cos(0/10000 (2*0/5) ) , (PE(0,2*1)=sin(0/10000 (2*1/5) ), (PE(0,2*1+1)=cos(0/10000 (2*1/5) ) , (PE(0,2*2)=sin(0/10000 (2*2/5) )), namely (0.000, 1.000, 0.000, 1.000, 0.000), as shown in Figure 7;
通过上述公式获取的掩码处理后的第i个词序列中的“MASK”这个词对应的位置嵌入向量V i,2'为: The position embedding vector V i,2 ' corresponding to the word "MASK" in the i-th word sequence after mask processing obtained by the above formula is:
((PE(1,2*0)=sin(1/10000 (2*0/5)),(PE(1,2*0+1)=cos(1/10000 (2*0/5)),(PE(1,2*1)=sin(1/10000 (2*1/5)),(PE(1,2*1+1)=cos(1/10000 (2*1/5)),(PE(1,2*2)=sin(1/10000 (2*2/5))),即(0.842,0.540,0.025,1.000,0.001),如图7所示;…; ((PE(1,2*0)=sin(1/10000 (2*0/5) ), (PE(1,2*0+1)=cos(1/10000 (2*0/5) ) , (PE(1,2*1)=sin(1/10000 (2*1/5) ), (PE(1,2*1+1)=cos(1/10000 (2*1/5) ) , (PE(1,2*2)=sin(1/10000 (2*2/5) )), that is (0.842, 0.540, 0.025, 1.000, 0.001), as shown in Figure 7; ...;
句尾的“0”这个对应的位置嵌入向量V i,10'为: The corresponding position embedding vector V i,10 ' of the "0" at the end of the sentence is:
((PE(9,2*0)=sin(9/10000 (2*0/5)),(PE(9,2*0+1)=cos(9/10000 (2*0/5)),(PE(9,2*1)=sin(9/10000 (2*1/5)),(PE(9,2*1+1)=cos(9/10000 (2*1/5)),(PE(9,2*2)=sin(9/10000 (2*2/5))),即(0.412,-0.911,0.224,0.975,0.006),如图7所示。 ((PE(9,2*0)=sin(9/10000 (2*0/5) ), (PE(9,2*0+1)=cos(9/10000 (2*0/5) ) , (PE(9,2*1)=sin(9/10000 (2*1/5) ), (PE(9,2*1+1)=cos(9/10000 (2*1/5) ) , (PE(9,2*2)=sin(9/10000 (2*2/5) )), namely (0.412, -0.911, 0.224, 0.975, 0.006), as shown in FIG. 7 .
需要说明的是,上述所举例的位置嵌入向量中的数值包括三位小数仅仅是作为一种示例,在具体实现中,其可以包括更少或者更多的小数位,在此不做具体限制。It should be noted that the numerical value in the position embedding vector exemplified above includes three decimal places only as an example, and in a specific implementation, it may include fewer or more decimal places, which is not specifically limited herein.
在另一种可能的实现方式中,可以通过如下公式获取一个词的位置嵌入向量:In another possible implementation, the position embedding vector of a word can be obtained by the following formula:
PE(pos,2j)=sin(pos/10000 (2j/dmodel)) PE(pos,2j)=sin(pos/10000 (2j/dmodel) )
PE(pos,2j+1)=cos(pos/10000 (2j+1/dmodel)) PE(pos,2j+1)=cos(pos/10000 (2j+1/dmodel) )
S2、计算设备分别根据掩码处理后的m个词序列中的每个词对应的词嵌入向量和位置嵌入向量,获取掩码处理后的m个词序列对应的m个第一行向量。S2. The computing device obtains m first row vectors corresponding to the masked m word sequences according to the word embedding vector and the position embedding vector corresponding to each word in the masked m word sequences, respectively.
具体地,计算设备可以通过将每个掩码处理后的词序列中的每个词对应的词嵌入向量和位置嵌入向量进行叠加,得到每个被掩码处理后的词序列中的每个词对应的词向量,从而得到每个掩码处理后的词序列对应的第一行向量,通过词嵌入向量和位置嵌入向量得到第一行向量的其他方式也在本申请的保护范围内,在此不做具体限制。Specifically, the computing device can obtain each word in each masked word sequence by superimposing the word embedding vector and the position embedding vector corresponding to each word in each masked word sequence The corresponding word vector, so as to obtain the first row vector corresponding to each masked word sequence, and other ways to obtain the first row vector through the word embedding vector and the position embedding vector are also within the scope of protection of this application. No specific restrictions are imposed.
继续以图6所示的掩码处理后的第i个词序列中的每个词对应的词嵌入向量和位置嵌入向量为例,那么“1”这个词对应的词向量V i,1"为: Continue to take the word embedding vector and position embedding vector corresponding to each word in the i-th word sequence after mask processing shown in Figure 6 as an example, then the word vector V i,1 " corresponding to the word "1" is :
Figure PCTCN2021120446-appb-000004
如图8所示;
Figure PCTCN2021120446-appb-000004
As shown in Figure 8;
“MASK”这个词对应的词向量V i,2"为: The word vector V i,2 " corresponding to the word "MASK" is:
Figure PCTCN2021120446-appb-000005
如图8所示;…;
Figure PCTCN2021120446-appb-000005
As shown in Figure 8;...;
句尾的“0”这个词对应的词向量V i,10"为: The word vector V i,10 " corresponding to the word "0" at the end of the sentence is:
Figure PCTCN2021120446-appb-000006
如图8所示。
Figure PCTCN2021120446-appb-000006
As shown in Figure 8.
在得到掩码处理后的第i个词序列中的每个词对应的词向量之后,掩码处理后的第i个词序列中的全部词对应的词向量的组合即为掩码处理后的第i个词序列对应的第一行向量V i'。 After obtaining the word vector corresponding to each word in the i-th word sequence after mask processing, the combination of word vectors corresponding to all words in the i-th word sequence after mask processing is the result of mask processing. The first row vector V i ' corresponding to the i-th word sequence.
在本实施例中,获取掩码处理后的m个词序列中每个词序列对应的第一行向量的过程与上述获取掩码处理后的第i个词序列对应的第一行向量V i'的过程相类似,具体可以参考上文相关描述,此处不再展开赘述。 In this embodiment, the process of obtaining the first row vector corresponding to each word sequence in the m word sequences after mask processing is the same as obtaining the first row vector V i corresponding to the ith word sequence after mask processing. ' The process is similar, for details, please refer to the above related description, which will not be repeated here.
S3、计算设备使用m个第一行向量,训练初始日志异常检测模型,得到预训练的日志异常检测模型。S3. The computing device uses m first row vectors to train an initial log anomaly detection model to obtain a pre-trained log anomaly detection model.
在本申请具体的实施例中,计算设备使用m个第一行向量,训练初始日志异常检测模型,得到预训练的日志异常检测模型的具体过程如图9所示,可以包括如下步骤:In a specific embodiment of the present application, the computing device uses m first row vectors to train an initial log anomaly detection model, and the specific process of obtaining a pre-trained log anomaly detection model is shown in Figure 9, which may include the following steps:
A1、分别将m个第一行向量,输入初始日志异常检测模型进行训练,得到掩码处理后的m个词序列对应的m个第二行向量。A1. Input the m first row vectors into the initial log anomaly detection model for training, and obtain m second row vectors corresponding to the m word sequences after mask processing.
其中,掩码处理后的词序列对应的第二行向量表示包括掩码处理后的词序列的语义信息的向量,每个词序列对应的第二行向量为每个词序列的CLS标记对应的初始日志异常检测模型的输出。The second row vector corresponding to the masked word sequence represents a vector including semantic information of the masked word sequence, and the second row vector corresponding to each word sequence is the CLS mark corresponding to each word sequence. The output of the initial log anomaly detection model.
继续以图6所示的掩码处理后的第i个词序列<1><MASK><6><7><4><9><9><10><0><0>中的第一行向量V i'为例,若将第一行向量V i'输入初始日志异常检测模型进行训练,可以得到第二行向量V i,如图6所示,第二行向量V i包括掩码处理后的第i个词序列的语义信息“供应商信息检查完毕”。 Continue to use the ith word sequence <1><MASK><6><7><4><9><9><10><0><0> after the mask processing shown in Figure 6. One row vector V i ' is taken as an example, if the first row vector V i ' is input into the initial log anomaly detection model for training, the second row vector V i can be obtained, as shown in Figure 6, the second row vector V i includes the mask. The semantic information of the i-th word sequence after code processing is "The supplier information has been checked".
A2、随机获取m个第二行向量中的任意一个第二行向量,作为正常日志类的初始聚类中心c。A2. Randomly obtain any second row vector of the m second row vectors as the initial cluster center c of the normal log class.
A3、分别计算m个第二行向量中每个第二行向量到初始聚类中心c的损失。A3. Calculate the loss from each of the m second row vectors to the initial cluster center c, respectively.
在本申请具体的实施例中,可以通过如下损失函数获取第i个第二行向量V i到初始聚类中心c的损失: In a specific embodiment of the present application, the loss from the ith second row vector V i to the initial cluster center c can be obtained through the following loss function:
Figure PCTCN2021120446-appb-000007
Figure PCTCN2021120446-appb-000007
其中,
Figure PCTCN2021120446-appb-000008
表示对第二行向量V i每个维度的数值进行开方。
in,
Figure PCTCN2021120446-appb-000008
Represents the square root of the value of each dimension of the second row vector V i .
A4、根据第i个第二行向量到初始聚类中心c的损失,确定第i个第二行向量能否归属到正常日志类,在确定第i个第二行向量可以归属到正常日志类的情况下,将第i个第二行向量归属到正常日志类。A4. According to the loss from the ith second row vector to the initial cluster center c, determine whether the ith second row vector can be assigned to the normal log class, and then determine whether the ith second row vector can belong to the normal log class In the case of , assign the ith second row vector to the normal log class.
具体地,可以在第i个第二行向量到初始聚类中心c的损失loss(c,V i)小于第一分类阈值的情况下,确定第i个第二行向量可以归属到正常日志类,反之,则确定第i个第二行向量不可以归属到正常日志类。其中,第一分类阈值可以由用户根据实际情况设置。 Specifically, it can be determined that the ith second row vector can be assigned to the normal log class when the loss (c, V i ) from the ith second row vector to the initial cluster center c is less than the first classification threshold , otherwise, it is determined that the i-th second row vector cannot be assigned to the normal log class. The first classification threshold may be set by the user according to the actual situation.
A5、在将m个第二行向量中可以归属到正常日志类的第二行向量归属到正常日志类之 后,重新计算正常日志类的质心,将计算得到的质心确定为正常日志类的新的聚类中心c 1A5. After assigning the second row vector of the m second row vectors that can be assigned to the normal log class to the normal log class, recalculate the centroid of the normal log class, and determine the calculated centroid as the new normal log class. Cluster center c 1 .
A6、反复迭代步骤A3至步骤A5,直到达到终止条件,得到预训练的日志异常检测模型。A6. Repeat steps A3 to A5 until a termination condition is reached, and a pre-trained log anomaly detection model is obtained.
其中,终止条件可以为最大迭代次数、最小化平方误差、簇中心点变化率等,此处不作具体限定。The termination condition may be the maximum number of iterations, the minimum square error, the rate of change of the cluster center point, etc., which are not specifically limited here.
在达到终止条件时,正常日志类以及正常日志类的质心不再发生变化,这里,以C表示不再发生变化的正常日志类的质心,即下文所述目标聚类中心。When the termination condition is reached, the normal log class and the centroid of the normal log class no longer change. Here, the centroid of the normal log class that no longer changes is represented by C, that is, the target cluster center described below.
如图10所示,将m个第一行向量输入初始日志异常检测模型进行训练,得到的m个第二行向量,最终被分成了属于正常日志类的第二行向量(指图10所示圆圈内的向量)和不属于正常日志类的第二行向量(指图10所示圆圈外的向量),正常日志类的质心为C。As shown in Figure 10, the m first row vectors are input into the initial log anomaly detection model for training, and the m second row vectors obtained are finally divided into the second row vectors belonging to the normal log category (refer to Figure 10). The vector inside the circle) and the second row vector that does not belong to the normal log class (referring to the vector outside the circle shown in Figure 10), the centroid of the normal log class is C.
虽然预训练的日志异常检测模型已经具有优质的模型参数,但是由于预训练的日志异常检测模型是利用来自目标对象的第一日志样本集进行训练得到的,若直接利用该模型对目标子对象生成的日志进行异常检测,模型的参数不够准确,会造成得到的检测结果的准确性较低,因此可以进一步采用来自目标子对象的训练样本集(即下文所述包括n个第二日志样本的第二日志样本集),对预训练的日志异常检测模型进行微调,得到更准确的模型参数,在模型的参数更准确的情况下,针对目标子对象生成的日志进行异常检测,得到的检测结果也会更准确。Although the pre-trained log anomaly detection model already has high-quality model parameters, since the pre-trained log anomaly detection model is trained using the first log sample set from the target object, if the model is directly used to generate the target sub-object If the parameters of the model are not accurate enough, the accuracy of the obtained detection results will be low. Therefore, the training sample set from the target sub-object (that is, the first set of n second log samples described below) can be further used. Second log sample set), fine-tune the pre-trained log anomaly detection model to obtain more accurate model parameters, and when the parameters of the model are more accurate, perform anomaly detection on the logs generated by the target sub-object, and the obtained detection results are also would be more accurate.
可以理解,由于预训练的日志异常检测模型已经具有优质的模型参数,因此,在使用来自目标子对象的第二训练样本集对预训练的日志异常检测模型进行微调时,第二训练样本集仅需包括少量的第二训练样本即可得到更准确的模型参数,微调得到更准确的模型参数的过程仅需耗费少量的人力和时间成本,模型训练的效率高。It can be understood that since the pre-trained log anomaly detection model already has high-quality model parameters, when using the second training sample set from the target sub-object to fine-tune the pre-trained log anomaly detection model, the second training sample set only More accurate model parameters can be obtained by including a small number of second training samples, and the process of fine-tuning to obtain more accurate model parameters only requires a small amount of labor and time cost, and the model training efficiency is high.
S404、计算设备获取包括n个第二日志样本的第二日志样本集,第二日志样本集是对目标子对象的日志数据进行处理得到的。S404. The computing device acquires a second log sample set including n second log samples, where the second log sample set is obtained by processing log data of the target sub-object.
其中,大量目标子对象的日志数据可以是目标子对象生成的历史日志。The log data of a large number of target sub-objects may be historical logs generated by the target sub-objects.
可以理解,n的值越大,训练好的日志异常检测模型的参数越准确,因此,在具体实现中,计算设备可以获取尽可能多的第二日志样本添加到第二日志样本集。其中,n为大于1的自然数,n通常小于m。It can be understood that the larger the value of n, the more accurate the parameters of the trained log anomaly detection model. Therefore, in a specific implementation, the computing device can obtain as many second log samples as possible and add them to the second log sample set. Among them, n is a natural number greater than 1, and n is usually less than m.
S405、计算设备分别对n个第二日志样本进行分词,得到第二日志n个词序列。S405. The computing device performs word segmentation on the n second log samples, respectively, to obtain n word sequences in the second log.
S406、计算设备通过n个词序列,对预训练的日志异常检测模型进行微调,得到训练好的日志异常检测模型。S406. The computing device fine-tunes the pre-trained log anomaly detection model through n word sequences to obtain a trained log anomaly detection model.
可以理解,微调得到的训练好的日志异常检测模型的模型参数比预训练的日志异常检测模型的模型参数更准确,后续使用训练好的日志异常检测模型对目标子对象生成的日志进行异常检测,得到的检测结果也会更准确。It can be understood that the model parameters of the trained log anomaly detection model obtained by fine-tuning are more accurate than those of the pre-trained log anomaly detection model. Subsequently, the trained log anomaly detection model is used to perform anomaly detection on the logs generated by the target sub-object. The test results obtained will also be more accurate.
在本实施例中,计算设备获取包括n个第二日志样本的第二日志样本集的过程,与S401中计算设备获取包括m个第一日志样本的第一日志样本集的过程相类似,具体可以参考S401中的相关描述;计算设备分别对n个第二日志样本进行分词,得到n个第二日志样本对应的n个词序列的过程,与S402中计算设备对m个第一日志样本进行分词,得到m个第一日志样本对应的m个词序列的过程相类似,具体可以参考S402中的相关描述;计算设备通过n个词序列,对预训练的日志异常检测模型进行微调,得到训练好的日志异常检测模型的过程,与S403中计算设备通过m个词序列,对初始日志异常检测模型进行预训练,得到预训练的日志异常检测模型的过程相类似,具体可以参考S403中的相关描述。In this embodiment, the process for the computing device to acquire the second log sample set including n second log samples is similar to the process for the computing device to acquire the first log sample set including m first log samples in S401. Reference can be made to the relevant description in S401; the computing device performs word segmentation on the n second log samples respectively, and the process of obtaining n word sequences corresponding to the n second log samples is the same as the computing device in S402. The process of obtaining m word sequences corresponding to m first log samples is similar. For details, please refer to the relevant description in S402; the computing device fine-tunes the pre-trained log anomaly detection model through n word sequences to obtain the training The process of a good log anomaly detection model is similar to the process in S403 that the computing device pre-trains the initial log anomaly detection model through m word sequences, and obtains a pre-trained log anomaly detection model. For details, please refer to the relevant information in S403. describe.
在本申请具体的实施例中,计算设备在获取到目标聚类中心C的情况下,还可以根据目标聚类中心C以及m个第二行向量,计算得到用于训练好的日志异常检测模型对待检测日志进行异常检测的第二分类阈值。In a specific embodiment of the present application, when the computing device obtains the target cluster center C, it can also calculate the log anomaly detection model used for training according to the target cluster center C and m second row vectors. The second classification threshold for anomaly detection of logs to be detected.
进一步地,计算设备根据目标聚类中心C以及m个第二行向量,计算得到第二分类阈值的过程可以包括:Further, according to the target cluster center C and m second row vectors, the process of calculating the second classification threshold by the computing device may include:
B1、计算设备获取m个第二行向量到目标聚类中心C的损失。B1. The computing device obtains the loss of m second row vectors to the target cluster center C.
计算设备获取m个第二行向量到目标聚类中心C的损失的过程与上文所述的获取m个第二行向量到初始聚类中心c的损失的过程相类似,具体可以参考上文相关描述。The process of obtaining the loss from m second row vectors to the target cluster center C by the computing device is similar to the above-mentioned process of obtaining the loss from m second row vectors to the initial cluster center c. For details, please refer to the above related description.
B2、计算设备获取m个第二行向量到目标聚类中心C的损失对应的百分位数。B2. The computing device obtains the percentile corresponding to the loss of the m second row vectors to the target cluster center C.
其中,百分位数,是统计学术语,如果将一组数据从小到大排序,并计算相应的累计百分位,则某一百分位所对应数据的值就称为这一百分位的百分位数。如,处于80%位置的值称为第80百分位数。Among them, percentile is a statistical term. If a set of data is sorted from small to large and the corresponding cumulative percentile is calculated, the value of the data corresponding to a percentile is called the percentile percentile. For example, a value at the 80th percentile is called the 80th percentile.
因此,计算设备获取m个第二行向量到目标聚类中心C的损失对应的百分位数,即计算设备将m个第二行向量到目标聚类中心C的损失从小到大排序,并计算相应的累计百分位。Therefore, the computing device obtains the percentiles corresponding to the losses of the m second row vectors to the target cluster center C, that is, the computing device sorts the losses from the m second row vectors to the target cluster center C from small to large, and Calculate the corresponding cumulative percentile.
B3、计算设备根据m个第二行向量到目标聚类中心C的损失对应的百分位数确定目标百分位数。B3. The computing device determines the target percentile according to the percentile corresponding to the loss of the m second row vectors to the target cluster center C.
B4、计算设备根据目标百分位数确定第二分类阈值。B4. The computing device determines the second classification threshold according to the target percentile.
在本申请具体的实施例中,可以通过如下公式确定第二分类阈值T:In a specific embodiment of the present application, the second classification threshold T can be determined by the following formula:
T=P·βT=P·β
其中,P表示处于目标百分位数的值,β用于扩大目标聚类中心C周围的距离,P和β可以根据m个第一日志样本中正常样本的数量与异常样本的数量取值,例如,在m个第一日志样本中正常样本的数量远远大于异常样本的数量(如正常样本的数量与异常样本的数量为10000:1或者5000:1等)的情况下,目标百分位数的取值可以尽量大,如90%、95%等,β的取值可以为1.8、2.0、2.5等,在m个第一日志样本中正常样本的数量接近异常样本的数量(如正常样本的数量与异常样本的数量为500:1或者100:1)的情况下,目标百分位数的取值可以接近50%,如45%、51%等,β的取值可以为1.2、1.5等;第二分类阈值T用于训练好的日志异常检测模型对待检测日志进行异常检测的过程请参见图11中的相关描述。Among them, P represents the value at the target percentile, β is used to expand the distance around the target cluster center C, P and β can be based on the number of normal samples and the number of abnormal samples in the m first log samples, For example, when the number of normal samples in the m first log samples is much larger than the number of abnormal samples (for example, the number of normal samples and the number of abnormal samples are 10000:1 or 5000:1, etc.), the target percentile The value of the number can be as large as possible, such as 90%, 95%, etc., and the value of β can be 1.8, 2.0, 2.5, etc. In the m first log samples, the number of normal samples is close to the number of abnormal samples (such as normal samples). When the number of β and the number of abnormal samples is 500:1 or 100:1), the value of the target percentile can be close to 50%, such as 45%, 51%, etc., and the value of β can be 1.2, 1.5 etc.; the second classification threshold T is used for the process of using the trained log anomaly detection model to perform anomaly detection on the log to be detected, please refer to the relevant description in FIG. 11 .
可以看出,本申请是根据目标聚类中心C以及m个第二行向量,计算得到第二分类阈值T,并且在计算第二分类阈值T时,是在考虑了m个第一日志样本中正常样本的数量与异常样本的数量的基础上取的目标百分位数的值以及β的值,而不是像现有技术中,是由人工根据经验设置分类阈值,人工设置的分类阈值过大或者过小都会对训练好的日志异常检测模型的准确性产生较大的影响,例如,在人工设置的分类阈值过大时,这会使得训练好的日志异常检测模型有很大几率误将异常日志归类为正常日志,在人工设置的分类阈值过小时,这会使得训练好的日志异常检测模型有很大几率误将正常日志归类为异常日志。因此,通过本申请提供的方法确定第二分类阈值T,可以提高训练好的日志异常检测模型进行异常检测的准确性。It can be seen that the present application calculates the second classification threshold T according to the target cluster center C and m second row vectors, and when calculating the second classification threshold T, considers the m first log samples The value of the target percentile and the value of β are taken on the basis of the number of normal samples and the number of abnormal samples, instead of manually setting the classification threshold based on experience as in the prior art, and the manually set classification threshold is too large Or if it is too small, it will have a great impact on the accuracy of the trained log anomaly detection model. For example, when the manually set classification threshold is too large, this will make the trained log anomaly detection model have a high chance of mistaking anomalies. Logs are classified as normal logs. When the manually set classification threshold is too small, the trained log anomaly detection model has a high probability of misclassifying normal logs as abnormal logs. Therefore, by determining the second classification threshold T by the method provided in this application, the accuracy of anomaly detection performed by the trained log anomaly detection model can be improved.
在具体实现中,在利用本申请提供的日志异常检测模型训练方法训练得到训练好的日志异常检测模型后,可以将训练好的日志异常检测模型部署到目标子对象,利用训练好的日志异常检测模型对目标子对象生成的待检测日志进行异常检测。In a specific implementation, after using the log anomaly detection model training method provided in this application to train a trained log anomaly detection model, the trained log anomaly detection model can be deployed to the target sub-object, and the trained log anomaly detection model can be used to detect The model performs anomaly detection on the logs to be detected generated by the target sub-object.
请参见图11,图11为本申请提供的一种示例性的利用训练好的日志异常检测模型对目标子对象的待检测日志进行异常检测的流程示意图,如图11所示,检测过程包括:Please refer to FIG. 11 . FIG. 11 is a schematic flowchart of an exemplary process of using a trained log anomaly detection model to perform anomaly detection on a log to be detected of a target sub-object. As shown in FIG. 11 , the detection process includes:
S111、获取目标子对象生成的待检测日志条目。S111. Obtain the log entry to be detected generated by the target sub-object.
可以理解,这里的待检测日志条目,即上文所述待检测日志。It can be understood that the log entry to be detected here is the log to be detected as described above.
S112、从待检测日志条目中获取待检测事件内容。S112. Acquire the content of the event to be detected from the log entry to be detected.
S113、对待检测事件内容进行分词,得到待检测事件内容对应的待检测词序列。S113. Perform word segmentation on the content of the event to be detected to obtain a sequence of words to be detected corresponding to the content of the event to be detected.
S114、获取待检测词序列对应的第一行向量。S114: Obtain the first row vector corresponding to the word sequence to be detected.
S115、将待检测词序列对应的第一行向量,输入训练好的日志异常检测模型进行异常检测,得到检测结果。S115: Input the first row vector corresponding to the word sequence to be detected into the trained log anomaly detection model to perform anomaly detection, and obtain a detection result.
在本申请具体的实施例中,将待检测词序列对应的第一行向量,输入训练好的日志异常检测模型进行异常检测,得到检测结果的具体过程可以包括如下步骤:In a specific embodiment of the present application, the first row vector corresponding to the word sequence to be detected is input into the trained log anomaly detection model for anomaly detection, and the specific process of obtaining the detection result may include the following steps:
S1151、将待检测词序列对应的第一行向量,输入训练好的日志异常检测模型,得到待检测词序列对应的第二行向量。S1151. Input the first row vector corresponding to the word sequence to be detected into the trained log anomaly detection model to obtain the second row vector corresponding to the word sequence to be detected.
S1152、获取待检测词序列对应的第二行向量到聚类中心C'的损失loss。S1152: Obtain the loss from the second row vector corresponding to the word sequence to be detected to the cluster center C'.
其中,聚类中心C'表示微调阶段得到训练好的日志异常检测模型时,n个第二日志样本对应的n个第二行向量被聚类得到的不再发生变化的正常日志类的质心。Among them, the cluster center C' represents the centroid of the normal log class that no longer changes when the trained log anomaly detection model is obtained in the fine-tuning stage, and the n second row vectors corresponding to the n second log samples are clustered.
待检测词序列对应的第二行向量到聚类中心C'的损失loss(C',X)可以通过如下公式获取:The loss (C',X) of the second row vector corresponding to the word sequence to be detected to the cluster center C' can be obtained by the following formula:
Figure PCTCN2021120446-appb-000009
Figure PCTCN2021120446-appb-000009
其中,X表示待检测词序列对应的第二行向量。Wherein, X represents the second row vector corresponding to the word sequence to be detected.
S1153、确定损失loss(C',X)是否小于第二分类阈值T,在确定损失loss(C',X)小于第二分类阈值T的情况下,执行S1154,在确定损失loss(C',X)大于或者等于第二分类阈值T的情况下,执行S1155。S1153. Determine whether the loss (C', X) is less than the second classification threshold T, and if the loss (C', X) is determined to be less than the second classification threshold T, execute S1154, and after determining the loss (C', When X) is greater than or equal to the second classification threshold T, execute S1155.
在具体实现中,也可以确定损失loss(C',X)是否小于或者等于第二分类阈值T,在确定损失loss(C',X)小于或者等于第二分类阈值T的情况下,执行S1154,在确定loss(C',X)大于第二分类阈值T的情况下,执行S1155。In a specific implementation, it is also possible to determine whether the loss (C', X) is less than or equal to the second classification threshold T, and in the case where it is determined that the loss (C', X) is less than or equal to the second classification threshold T, execute S1154 , if it is determined that loss(C', X) is greater than the second classification threshold T, execute S1155.
S1154、确定待检测日志条目中没有设备异常的信息。其中,所述设备指生成所述待检测日志条目的设备。S1154. Determine that there is no device abnormality information in the log entry to be detected. The device refers to a device that generates the log entry to be detected.
S1155、确定待检测日志条目中有设备异常的信息。S1155. Determine that the log entry to be detected contains information about a device abnormality.
如图12所示,将待检测词序列对应的第一行向量x输入训练好的日志异常检测模型进行异常检测,得到的异常检测结果中包括待检测词序列对应的第二行向量X相对聚类中心C'的损失loss(C',X),假设损失loss(C',X)为5,第二分类阈值T为8,损失loss(C',X)小于第二分类阈值T,则训练好的日志异常检测模型会将第二行向量X归属到正常日志类,输出待检测日志条目中没有设备异常信息的检测结果。As shown in Figure 12, input the first row vector x corresponding to the word sequence to be detected into the trained log anomaly detection model for anomaly detection, and the obtained anomaly detection result includes the relative aggregation of the second row vector x corresponding to the word sequence to be detected. The loss (C', X) of the class center C', assuming that the loss (C', X) is 5, the second classification threshold T is 8, and the loss (C', X) is less than the second classification threshold T, then The trained log anomaly detection model will attribute the second row of vector X to the normal log category, and output the detection result that there is no device anomaly information in the log entry to be detected.
需要说明的是,上述所举例子中,训练好的日志异常检测模型输出待检测日志条目中没有设备异常信息的检测结果仅仅是作为一种示例,在具体实现中,输出的检测结果也可以为“设备正常”等,在此不做具体限制。It should be noted that, in the above example, the trained log anomaly detection model outputs the detection result that there is no device anomaly information in the log entry to be detected only as an example. In a specific implementation, the output detection result can also be "The device is normal", etc., and no specific restrictions are made here.
在本实施例中,待检测词序列、待检测词序列对应的第一行向量等的定义均与图4实施例中的词序列、第一行向量等的定义相同,具体请参见图4所示的实施例中的相关内容,此处不再展开描述。在本实施例中,对待检测事件内容进行分词,得到待检测事件内容对应的 待检测词序列的过程,与S402中计算设备对m个第一日志样本进行分词,得到m个第一日志样本对应的m个词序列的过程相类似,具体可以参考S402中的相关描述;获取待检测词序列对应的第一行向量的过程,与S403中计算设备获取掩码处理后的m个词序列对应的m个第一行向量的过程相类似,具体可以参考S403中的相关描述。In this embodiment, the definitions of the word sequence to be detected, the first row vector corresponding to the word sequence to be detected, etc. are the same as the definitions of the word sequence, the first row vector, etc. in the embodiment of FIG. 4 . For details, please refer to FIG. 4 The relevant content in the illustrated embodiment will not be described again here. In this embodiment, the process of performing word segmentation on the content of the event to be detected to obtain the sequence of words to be detected corresponding to the content of the event to be detected is the same as the computing device in S402 performing word segmentation on m first log samples, and obtaining m first log samples corresponding to The process of the m word sequences is similar, and for details, please refer to the relevant description in S402; the process of obtaining the first row vector corresponding to the word sequence to be detected is the same as the process of obtaining the m word sequences after mask processing by the computing device in S403. The process of the m first row vectors is similar, and for details, please refer to the relevant description in S403.
需要说明的是,在获取待检测词序列对应的第一行向量时,无需对待检测词序列进行掩码处理,可以直接获取待检测词序列中的每个词对应的词嵌入向量和位置嵌入向量,从而获取到待检测词序列对应的第一行向量。It should be noted that when obtaining the first row vector corresponding to the word sequence to be detected, it is not necessary to perform mask processing on the word sequence to be detected, and the word embedding vector and position embedding vector corresponding to each word in the word sequence to be detected can be directly obtained. , so as to obtain the first row vector corresponding to the word sequence to be detected.
需要说明的是,虽然上文在对本申请提供的日志异常检测模型训练方法进行描述时,均以计算设备为执行主体,但在具体实现中,本申请提供的日志异常检测模型训练方法的执行主体还可以为包括至少两个计算设备的计算设备集群,所述计算设备集群中的至少两个计算设备可以协同实现本申请提供的日志异常检测模型训练方法,如在计算设备集群包括计算设备A和计算设备B的情况下,步骤S401由计算设备A执行,步骤S402至步骤S406由计算设备B执行,或者步骤S401至步骤S403由计算设备A执行,步骤S404和步骤S406由计算设备A和计算设备B共同执行。It should be noted that although the above description of the log anomaly detection model training method provided by this application takes computing devices as the execution subject, in specific implementation, the log anomaly detection model training method provided by this application is the execution subject. It can also be a computing device cluster including at least two computing devices, and at least two computing devices in the computing device cluster can cooperate to implement the log anomaly detection model training method provided by this application. In the case of computing device B, step S401 is performed by computing device A, steps S402 to S406 are performed by computing device B, or steps S401 to S403 are performed by computing device A, and steps S404 and S406 are performed by computing device A and computing device B jointly execute.
上文详细阐述了本申请提供的一种日志异常检测模型训练方法,基于相同的发明构思,下面继续介绍本申请提供的日志异常检测模型训练装置。A log anomaly detection model training method provided by the present application is described in detail above. Based on the same inventive concept, the log anomaly detection model training device provided by the present application will be described below.
参见图13,图13是本申请提供的一种日志异常检测模型训练装置100的结构示意图,该装置100包括:获取模块110和训练模块120,其中,Referring to FIG. 13, FIG. 13 is a schematic structural diagram of a log anomaly detection model training device 100 provided by the present application. The device 100 includes: an acquisition module 110 and a training module 120, wherein,
获取模块110,用于获取第一日志样本集,其中,所述第一日志样本集是对所述目标对象的日志数据进行处理得到的;an obtaining module 110, configured to obtain a first log sample set, wherein the first log sample set is obtained by processing log data of the target object;
训练模块120,通过所述第一日志样本集对初始日志异常检测模型进行预训练,得到预训练的日志异常检测模型;The training module 120 pre-trains the initial log anomaly detection model by using the first log sample set to obtain a pre-trained log anomaly detection model;
所述获取模块110,还用于获取第二日志样本集,其中,所述第二日志样本集是对目标子对象的日志数据进行处理得到的,所述目标子对象属于目标对象;The obtaining module 110 is further configured to obtain a second log sample set, wherein the second log sample set is obtained by processing log data of a target sub-object, and the target sub-object belongs to the target object;
所述训练模块120,还用于通过所述第二日志样本集对所述预训练的日志异常检测模型进行微调,得到训练好的日志异常检测模型。The training module 120 is further configured to fine-tune the pre-trained log anomaly detection model by using the second log sample set to obtain a trained log anomaly detection model.
在一种可能的实现方式中,所述目标对象包括如下子对象中的至少一种:硬盘、内存、闪存、网络设备和处理器,所述目标子对象为所述目标对象中的任意一种类型的子对象。In a possible implementation manner, the target object includes at least one of the following sub-objects: hard disk, memory, flash memory, network device and processor, and the target sub-object is any one of the target objects Subobjects of the type.
在一种可能的实现方式中,所述第一日志样本集包括m个日志样本,m为大于1的自然数,所述训练模块120,具体用于:In a possible implementation manner, the first log sample set includes m log samples, where m is a natural number greater than 1, and the training module 120 is specifically used for:
分别对所述m个日志样本进行分词,得到所述m个日志样本对应的m个词序列;Perform word segmentation on the m log samples respectively to obtain m word sequences corresponding to the m log samples;
通过所述m个词序列,对初始日志异常检测模型进行预训练,得到预训练的日志异常检测模型。Through the m word sequences, the initial log anomaly detection model is pre-trained to obtain a pre-trained log anomaly detection model.
在一种可能的实现方式中,所述训练模块120,具体用于:In a possible implementation manner, the training module 120 is specifically used for:
分别对所述m个词序列中预设比例的词进行掩码处理,得到掩码处理后的m个词序列;Perform mask processing on words with preset proportions in the m word sequences, respectively, to obtain m word sequences after mask processing;
通过所述掩码处理后的m个词序列,对初始日志异常检测模型进行预训练,得到预训练的日志异常检测模型。The initial log anomaly detection model is pre-trained through the masked m word sequences to obtain a pre-trained log anomaly detection model.
在一种可能的实现方式中,所述训练模块120,具体用于:In a possible implementation manner, the training module 120 is specifically used for:
分别获取所述掩码处理后的m个词序列中每个词对应的词嵌入向量和位置嵌入向量,其中,所述每个词对应的词嵌入向量为用于表示所述每个词的多维向量,所述每个词对应的 位置嵌入向量表示所述每个词在其所属的词序列中的位置;Obtain the word embedding vector and the position embedding vector corresponding to each word in the m word sequences after the mask processing, wherein the word embedding vector corresponding to each word is a multi-dimensional representation of each word. vector, the position embedding vector corresponding to each word represents the position of each word in the word sequence to which it belongs;
分别根据所述掩码处理后的m个词序列中每个词对应的词嵌入向量和位置嵌入向量,获取所述掩码处理后的m个词序列对应的m个第一行向量;According to the word embedding vector and the position embedding vector corresponding to each word in the m word sequences after the mask processing, respectively, obtain m first row vectors corresponding to the m word sequences after the mask processing;
使用所述m个第一行向量,对初始日志异常检测模型进行预训练,得到预训练的日志异常检测模型。Using the m first row vectors, pre-train the initial log anomaly detection model to obtain a pre-trained log anomaly detection model.
在一种可能的实现方式中,所述训练模块120,具体用于:In a possible implementation manner, the training module 120 is specifically used for:
分别将所述m个第一行向量,输入初始日志异常检测模型进行训练,得到m个第二行向量,其中,所述m个第二行向量与所述掩码处理后的m个词序列存在一一对应关系,所述m个第二行向量中的每个第二行向量包括与其对应的掩码处理后的词序列的语义信息;The m first row vectors are respectively input into the initial log anomaly detection model for training, and m second row vectors are obtained, wherein the m second row vectors and the m word sequences processed by the mask There is a one-to-one correspondence, and each second row vector in the m second row vectors includes the semantic information of the word sequence after its corresponding mask processing;
获取所述m个第二行向量到初始聚类中心的损失;Obtain the loss of the m second row vectors to the initial cluster center;
根据所述m个第二行向量到所述初始聚类中心的损失,训练初始日志异常检测模型,得到预训练的日志异常检测模型以及目标聚类中心。According to the loss of the m second row vectors to the initial cluster center, an initial log anomaly detection model is trained to obtain a pre-trained log anomaly detection model and a target cluster center.
在一种可能的实现方式中,所述训练模块120,还用于:In a possible implementation manner, the training module 120 is further configured to:
获取所述m个第二行向量到所述目标聚类中心的损失对应的百分位数;Obtain the percentile corresponding to the loss of the m second row vectors to the target cluster center;
根据所述m个第二行向量到所述目标聚类中心的损失对应的百分位数确定分类阈值,其中,所述分类阈值用于所述训练好的日志异常检测模型对所述待检测日志进行异常检测,得到检测结果。A classification threshold is determined according to the percentile corresponding to the loss of the m second row vectors to the target cluster center, wherein the classification threshold is used for the trained log anomaly detection model to detect the to-be-detected Anomaly detection is performed on the log, and the detection result is obtained.
在一种可能的实现方式中,所述获取所述m个第二行向量到初始聚类中心的损失的公式为:In a possible implementation manner, the formula for obtaining the loss from the m second row vectors to the initial cluster center is:
Figure PCTCN2021120446-appb-000010
Figure PCTCN2021120446-appb-000010
其中,V i表示所述m个第二行向量中的第i个第二行向量,c表示所述初始聚类中心,loss(c,V i)表示所述第i个第二行向量到所述初始聚类中心的损失,i为自然数。 Wherein, V i represents the ith second row vector among the m second row vectors, c represents the initial cluster center, and loss(c,V i ) represents the ith second row vector to The loss of the initial cluster center, i is a natural number.
具体地,上述日志异常检测模型训练装置100执行各种操作的具体实现,可参照上述日志异常检测模型训练方法实施例中相关内容中的描述,为了说明书的简洁,这里不再赘述。Specifically, for the specific implementation of various operations performed by the above log abnormality detection model training apparatus 100, reference may be made to the description in the relevant content in the above embodiment of the log abnormality detection model training method.
应当理解,日志异常检测模型训练装置100仅为本申请实施例提供的一个例子,并且,日志异常检测模型训练装置100可具有比图13示出的部件更多或更少的部件,可以组合两个或更多个部件,或者可具有部件的不同配置实现。It should be understood that the log anomaly detection model training apparatus 100 is only an example provided by the embodiment of the present application, and the log anomaly detection model training apparatus 100 may have more or less components than those shown in FIG. 13 , and may combine two one or more components, or may be implemented with different configurations of components.
本申请提供的日志异常检测模型训练装置100可以应用于云服务器、个人计算机、终端设备等各种计算设备,也可以应用于包括至少两个计算设备的计算设备集群,下面以应用于一个计算设备为例进行描述。The log anomaly detection model training apparatus 100 provided in this application can be applied to various computing devices such as cloud servers, personal computers, and terminal devices, and can also be applied to a computing device cluster including at least two computing devices. The following applies to one computing device Described as an example.
参见图14,图14是本申请提供的一种计算设备200的结构示意图,该计算设备200包括:处理器210、存储器220以及通信接口230,其中,处理器210、存储器220以及通信接口230之间可以通过总线240相互连接。其中,Referring to FIG. 14 , FIG. 14 is a schematic structural diagram of a computing device 200 provided by the present application. The computing device 200 includes: a processor 210 , a memory 220 and a communication interface 230 , wherein one of the processor 210 , the memory 220 and the communication interface 230 is They can be connected to each other through the bus 240 . in,
处理器210可以读取存储器220中存储的程序代码(包括指令),执行存储器220中存储的程序代码,使得计算设备200执行上述方法实施例提供的日志异常检测模型训练方法中的步骤,或者使得计算设备200部署日志异常检测模型训练装置100。The processor 210 can read the program code (including instructions) stored in the memory 220, and execute the program code stored in the memory 220, so that the computing device 200 executes the steps in the log anomaly detection model training method provided by the above method embodiments, or makes The computing device 200 deploys the log anomaly detection model training apparatus 100 .
处理器210可以有多种具体实现形式,例如处理器210可以为中央处理器(central processing unit,CPU)、图形处理器(graphics processing unit,GPU)等,处理器210还可以是单核处理器或多核处理器。处理器210可以由CPU和硬件芯片的组合。上述硬件芯片可以是专用集成电路(application-specific integrated circuit,ASIC),可编程逻辑器件(programmable  logic device,PLD)或其组合。上述PLD可以是复杂可编程逻辑器件(complex programmable logic device,CPLD),现场可编程逻辑门阵列(field-programmable gate array,FPGA),通用阵列逻辑(generic array logic,GAL)或其任意组合。处理器210也可以单独采用内置处理逻辑的逻辑器件来实现,例如FPGA或DSP等。The processor 210 may have various specific implementation forms, for example, the processor 210 may be a central processing unit (central processing unit, CPU), a graphics processing unit (graphics processing unit, GPU), etc., and the processor 210 may also be a single-core processor or multi-core processors. The processor 210 may be a combination of a CPU and a hardware chip. The above-mentioned hardware chip may be an application-specific integrated circuit (ASIC), a programmable logic device (PLD) or a combination thereof. The above-mentioned PLD can be a complex programmable logic device (CPLD), a field-programmable gate array (FPGA), a general array logic (generic array logic, GAL) or any combination thereof. The processor 210 can also be independently implemented by a logic device with built-in processing logic, such as an FPGA or a DSP.
存储器220可以存储有程序代码以及程序数据。其中,程序代码包括:获取模块110的代码和训练模块120的代码等,程序数据包括:第一日志样本集、第二日志样本集、掩码处理前的词序列和掩码处理后的词序列等等。The memory 220 may store program codes and program data. The program code includes: the code of the acquisition module 110 and the code of the training module 120, etc., and the program data includes: the first log sample set, the second log sample set, the word sequence before mask processing and the word sequence after mask processing and many more.
在实际应用中,存储器220可以是非易失性存储器,例如,只读存储器(read-only memory,ROM)、可编程只读存储器(programmable ROM,PROM)、可擦除可编程只读存储器(erasable PROM,EPROM)、电可擦除可编程只读存储器(electrically EPROM,EEPROM)或闪存。存储器220也可以是易失性存储器,易失性存储器可以是随机存取存储器(random access memory,RAM),其用作外部高速缓存。In practical applications, the memory 220 may be a non-volatile memory, such as a read-only memory (ROM), a programmable read-only memory (PROM), an erasable programmable read-only memory (erasable). PROM, EPROM), electrically erasable programmable read-only memory (electrically EPROM, EEPROM), or flash memory. The memory 220 may also be volatile memory, which may be random access memory (RAM), which acts as an external cache.
通信接口230可以为有线接口(例如以太网接口)或无线接口(例如蜂窝网络接口或使用无线局域网接口),用于与其他计算节点或装置进行通信。当通信接口230为有线接口时,通信接口230可以采用传输控制协议/网际协议(transmission control protocol/internet protocol,TCP/IP)之上的协议族,例如,远程函数调用(remote function call,RFC)协议、简单对象访问协议(simple object access protocol,SOAP)协议、简单网络管理协议(simple network management protocol,SNMP)协议、公共对象请求代理体系结构(common object request broker architecture,CORBA)协议以及分布式协议等等。 Communication interface 230 may be a wired interface (eg, an Ethernet interface) or a wireless interface (eg, a cellular network interface or using a wireless local area network interface) for communicating with other computing nodes or devices. When the communication interface 230 is a wired interface, the communication interface 230 may use a protocol family above transmission control protocol/internet protocol (TCP/IP), for example, remote function call (RFC) protocol, simple object access protocol (SOAP) protocol, simple network management protocol (SNMP) protocol, common object request broker architecture (CORBA) protocol, and distributed protocols and many more.
总线240可以是外设部件互连标准(peripheral component interconnect,PCI)总线或扩展工业标准结构(extended industry standard architecture,简称EISA)总线等。所述总线240可以分为地址总线、数据总线、控制总线等。为便于表示,图14中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。The bus 240 may be a peripheral component interconnect (PCI) bus or an extended industry standard architecture (extended industry standard architecture, EISA for short) bus or the like. The bus 240 can be divided into an address bus, a data bus, a control bus, and the like. For ease of presentation, only one thick line is shown in FIG. 14, but it does not mean that there is only one bus or one type of bus.
上述计算设备200用于执行上述日志异常检测模型训练方法实施例所描述的方法,与上述方法实施例属于同一构思,其具体实现过程详见上述方法实施例,这里不再赘述。The above computing device 200 is configured to execute the method described in the above embodiment of the log anomaly detection model training method, which belongs to the same concept as the above method embodiment, and the specific implementation process is detailed in the above method embodiment, which will not be repeated here.
计算设备200部署日志异常检测模型训练装置100的功能模块,参见图13所示的装置实施例。The computing device 200 deploys the functional modules of the log anomaly detection model training apparatus 100, see the apparatus embodiment shown in FIG. 13 .
应当理解,计算设备200仅为本申请实施例提供的一个例子,并且,计算设备200可具有比图14示出的部件更多或更少的部件,可以组合两个或更多个部件,或者可具有部件的不同配置实现。It should be understood that the computing device 200 is only an example provided by the embodiments of the present application, and the computing device 200 may have more or less components than those shown in FIG. 14 , two or more components may be combined, or Different configurations of components are possible.
本申请还提供一种非瞬态计算机可读存储介质,非瞬态计算机可读存储介质中存储有指令,该指令被运行时可以实现上述实施例中记载的日志异常检测模型训练方法的部分或者全部步骤。The present application also provides a non-transitory computer-readable storage medium, where instructions are stored in the non-transitory computer-readable storage medium, and when the instructions are run, part of the log anomaly detection model training method described in the above embodiment or all steps.
本申请还提供一种计算机程序产品,当计算机程序产品被计算机读取并执行时,可以实现上述方法实施例中记载的日志异常检测模型训练方法的部分或者全部步骤。The present application also provides a computer program product, when the computer program product is read and executed by a computer, part or all of the steps of the log anomaly detection model training method described in the above method embodiments can be implemented.
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其它实施例的相关描述。In the above-mentioned embodiments, the description of each embodiment has its own emphasis. For parts that are not described in detail in a certain embodiment, reference may be made to the relevant descriptions of other embodiments.
在上述实施例中,可以全部或部分地通过软件、硬件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照 本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线)或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质(例如软盘、硬盘、磁带)、光介质、或者半导体介质等。In the above-mentioned embodiments, it may be implemented in whole or in part by software, hardware or any combination thereof. When implemented in software, it can be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, all or part of the processes or functions described in the embodiments of the present application are generated. The computer may be a general purpose computer, special purpose computer, computer network, or other programmable device. The computer instructions may be stored in or transmitted from one computer readable storage medium to another computer readable storage medium, for example, the computer instructions may be downloaded from a website site, computer, server or data center Transmission to another website site, computer, server, or data center by wire (eg, coaxial cable, optical fiber, digital subscriber line) or wireless (eg, infrared, wireless, microwave, etc.). The computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that includes an integration of one or more available media. The usable media may be magnetic media (eg, floppy disks, hard disks, magnetic tapes), optical media, or semiconductor media, and the like.
本申请实施例方法中的步骤可以根据实际需要进行顺序调整、合并或删减;本申请实施例装置中的单元可以根据实际需要进行划分、合并或删减。The steps in the method of the embodiment of the present application may be sequentially adjusted, combined or deleted according to actual needs; the units in the device of the embodiment of the present application may be divided, combined or deleted according to actual needs.
以上对本申请实施例进行了详细介绍,本文中应用了具体个例对本申请的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本申请的方法及其核心思想;同时,对于本领域的一般技术人员,依据本申请的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本申请的限制。The embodiments of the present application have been introduced in detail above, and specific examples are used to illustrate the principles and implementations of the present application. The descriptions of the above embodiments are only used to help understand the methods and core ideas of the present application; at the same time, for Persons of ordinary skill in the art, according to the idea of the present application, will have changes in the specific implementation manner and application scope. In conclusion, the contents of this specification should not be construed as a limitation on the present application.

Claims (18)

  1. 一种日志异常检测模型训练方法,其特征在于,所述方法包括:A method for training a log anomaly detection model, wherein the method comprises:
    获取第一日志样本集,其中,所述第一日志样本集是对所述目标对象的日志数据进行处理得到的;obtaining a first log sample set, wherein the first log sample set is obtained by processing log data of the target object;
    通过所述第一日志样本集对初始日志异常检测模型进行预训练,得到预训练的日志异常检测模型;Pre-training an initial log anomaly detection model by using the first log sample set to obtain a pre-trained log anomaly detection model;
    获取第二日志样本集,其中,所述第二日志样本集是对目标子对象的日志数据进行处理得到的,所述目标子对象属于目标对象;acquiring a second log sample set, wherein the second log sample set is obtained by processing log data of a target sub-object, and the target sub-object belongs to the target object;
    通过所述第二日志样本集对所述预训练的日志异常检测模型进行微调,得到训练好的日志异常检测模型。Fine-tune the pre-trained log anomaly detection model by using the second log sample set to obtain a trained log anomaly detection model.
  2. 根据权利要求1所述的方法,其特征在于,The method of claim 1, wherein:
    所述目标对象包括如下子对象中的至少一种:硬盘、内存、闪存、网络设备和处理器,所述目标子对象为所述目标对象中的任意一种类型的子对象。The target object includes at least one of the following sub-objects: hard disk, memory, flash memory, network device and processor, and the target sub-object is any type of sub-object in the target object.
  3. 根据权利要求1或2所述的方法,其特征在于,所述第一日志样本集包括m个日志样本,m为大于1的自然数,所述通过所述第一日志样本集对初始日志异常检测模型进行预训练,得到预训练的日志异常检测模型,包括:The method according to claim 1 or 2, wherein the first log sample set includes m log samples, where m is a natural number greater than 1, and the first log sample set is used to detect anomalies in the initial log. The model is pre-trained to obtain a pre-trained log anomaly detection model, including:
    分别对所述m个日志样本进行分词,得到所述m个日志样本对应的m个词序列;Perform word segmentation on the m log samples respectively to obtain m word sequences corresponding to the m log samples;
    通过所述m个词序列,对初始日志异常检测模型进行预训练,得到预训练的日志异常检测模型。Through the m word sequences, the initial log anomaly detection model is pre-trained to obtain a pre-trained log anomaly detection model.
  4. 根据权利要求3所述的方法,其特征在于,所述通过所述m个词序列,对初始日志异常检测模型进行预训练,得到预训练的日志异常检测模型,包括:The method according to claim 3, wherein the initial log anomaly detection model is pre-trained through the m word sequences to obtain a pre-trained log anomaly detection model, comprising:
    分别对所述m个词序列中预设比例的词进行掩码处理,得到掩码处理后的m个词序列;Perform mask processing on words with preset proportions in the m word sequences, respectively, to obtain m word sequences after mask processing;
    通过所述掩码处理后的m个词序列,对初始日志异常检测模型进行预训练,得到预训练的日志异常检测模型。The initial log anomaly detection model is pre-trained through the masked m word sequences to obtain a pre-trained log anomaly detection model.
  5. 根据权利要求4所述的方法,其特征在于,所述通过所述掩码处理后的m个词序列,对初始日志异常检测模型进行预训练,得到预训练的日志异常检测模型,包括:The method according to claim 4, wherein the initial log anomaly detection model is pre-trained through the m word sequences processed by the mask to obtain a pre-trained log anomaly detection model, comprising:
    分别获取所述掩码处理后的m个词序列中的每个词对应的词嵌入向量和位置嵌入向量,其中,所述每个词对应的词嵌入向量为用于表示所述每个词的多维向量,所述每个词对应的位置嵌入向量表示所述每个词在其所属的词序列中的位置;Obtain the word embedding vector and the position embedding vector corresponding to each word in the m word sequences after the mask processing, wherein, the word embedding vector corresponding to each word is used to represent each word. A multi-dimensional vector, the position embedding vector corresponding to each word represents the position of each word in the word sequence to which it belongs;
    分别根据所述掩码处理后的m个词序列中的每个词对应的词嵌入向量和位置嵌入向量,获取所述掩码处理后的m个词序列对应的m个第一行向量;According to the word embedding vector and the position embedding vector corresponding to each word in the m word sequences after the mask processing, respectively, obtain m first row vectors corresponding to the m word sequences after the mask processing;
    使用所述m个第一行向量,对初始日志异常检测模型进行预训练,得到预训练的日志异常检测模型。Using the m first row vectors, pre-train the initial log anomaly detection model to obtain a pre-trained log anomaly detection model.
  6. 根据权利要求5所述的方法,其特征在于,所述使用所述m个第一行向量,对初始日志异常检测模型进行预训练,得到预训练的日志异常检测模型,包括:The method according to claim 5, wherein the use of the m first row vectors to pre-train an initial log anomaly detection model to obtain a pre-trained log anomaly detection model, comprising:
    分别将所述m个第一行向量,输入初始日志异常检测模型进行训练,得到m个第二行向量,其中,所述m个第二行向量与所述掩码处理后的m个词序列存在一一对应关系,所述m个第二行向量中的每个第二行向量包括与其对应的掩码处理后的词序列的语义信息;The m first row vectors are respectively input into the initial log anomaly detection model for training, and m second row vectors are obtained, wherein the m second row vectors and the m word sequences processed by the mask There is a one-to-one correspondence, and each second row vector in the m second row vectors includes the semantic information of the word sequence after its corresponding mask processing;
    获取所述m个第二行向量到初始聚类中心的损失;Obtain the loss of the m second row vectors to the initial cluster center;
    根据所述m个第二行向量到所述初始聚类中心的损失,训练初始日志异常检测模型, 得到预训练的日志异常检测模型以及目标聚类中心。According to the loss of the m second row vectors to the initial cluster center, an initial log anomaly detection model is trained to obtain a pre-trained log anomaly detection model and a target cluster center.
  7. 根据权利要求6所述的方法,其特征在于,所述方法还包括:The method according to claim 6, wherein the method further comprises:
    获取所述m个第二行向量到所述目标聚类中心的损失对应的百分位数;Obtain the percentile corresponding to the loss of the m second row vectors to the target cluster center;
    根据所述m个第二行向量到所述目标聚类中心的损失对应的百分位数确定分类阈值,其中,所述分类阈值用于所述训练好的日志异常检测模型对所述待检测日志进行异常检测,得到检测结果。A classification threshold is determined according to the percentile corresponding to the loss of the m second row vectors to the target cluster center, wherein the classification threshold is used for the trained log anomaly detection model to detect the to-be-detected Anomaly detection is performed on the log, and the detection result is obtained.
  8. 根据权利要求6或7所述的方法,其特征在于,所述获取所述m个第二行向量到初始聚类中心的损失的公式为:The method according to claim 6 or 7, wherein the formula for obtaining the loss from the m second row vectors to the initial cluster center is:
    Figure PCTCN2021120446-appb-100001
    Figure PCTCN2021120446-appb-100001
    其中,V i表示所述m个第二行向量中的第i个第二行向量,c表示所述初始聚类中心,loss(c,V i)表示所述第i个第二行向量到所述初始聚类中心的损失,i为自然数。 Wherein, V i represents the ith second row vector among the m second row vectors, c represents the initial cluster center, and loss(c,V i ) represents the ith second row vector to The loss of the initial cluster center, i is a natural number.
  9. 一种日志异常检测模型训练装置,其特征在于,所述装置包括:A log anomaly detection model training device, characterized in that the device comprises:
    获取模块,用于获取第一日志样本集,其中,所述第一日志样本集是对所述目标对象的日志数据进行处理得到的;an acquisition module, configured to acquire a first log sample set, wherein the first log sample set is obtained by processing log data of the target object;
    训练模块,通过所述第一日志样本集对初始日志异常检测模型进行预训练,得到预训练的日志异常检测模型;a training module, which pre-trains the initial log anomaly detection model through the first log sample set to obtain a pre-trained log anomaly detection model;
    所述获取模块,还用于获取第二日志样本集,其中,所述第二日志样本集是对目标子对象的日志数据进行处理得到的,所述目标子对象属于目标对象;The obtaining module is further configured to obtain a second log sample set, wherein the second log sample set is obtained by processing log data of a target sub-object, and the target sub-object belongs to the target object;
    所述训练模块,还用于通过所述第二日志样本集对所述预训练的日志异常检测模型进行微调,得到训练好的日志异常检测模型。The training module is further configured to fine-tune the pre-trained log anomaly detection model through the second log sample set to obtain a trained log anomaly detection model.
  10. 根据权利要求9所述的装置,其特征在于,The device of claim 9, wherein:
    所述目标对象包括如下子对象中的至少一种:硬盘、内存、闪存、网络设备和处理器,所述目标子对象为所述目标对象中的任意一种类型的子对象。The target object includes at least one of the following sub-objects: hard disk, memory, flash memory, network device and processor, and the target sub-object is any type of sub-object in the target object.
  11. 根据权利要求8或9所述的装置,其特征在于,所述第一日志样本集包括m个日志样本,m为大于1的自然数,所述训练模块,具体用于:The device according to claim 8 or 9, wherein the first log sample set includes m log samples, where m is a natural number greater than 1, and the training module is specifically used for:
    分别对所述m个日志样本进行分词,得到所述m个日志样本对应的m个词序列;Perform word segmentation on the m log samples respectively to obtain m word sequences corresponding to the m log samples;
    通过所述m个词序列,对初始日志异常检测模型进行预训练,得到预训练的日志异常检测模型。Through the m word sequences, the initial log anomaly detection model is pre-trained to obtain a pre-trained log anomaly detection model.
  12. 根据权利要求11所述的装置,其特征在于,所述训练模块,具体用于:The device according to claim 11, wherein the training module is specifically used for:
    分别对所述m个词序列中预设比例的词进行掩码处理,得到掩码处理后的m个词序列;Perform mask processing on words with preset proportions in the m word sequences, respectively, to obtain m word sequences after mask processing;
    通过所述掩码处理后的m个词序列,对初始日志异常检测模型进行预训练,得到预训练的日志异常检测模型。The initial log anomaly detection model is pre-trained through the masked m word sequences to obtain a pre-trained log anomaly detection model.
  13. 根据权利要求12所述的装置,其特征在于,所述训练模块,具体用于:The device according to claim 12, wherein the training module is specifically used for:
    分别获取所述掩码处理后的m个词序列中每个词对应的词嵌入向量和位置嵌入向量,其中,所述每个词对应的词嵌入向量为用于表示所述每个词的多维向量,所述每个词对应的位置嵌入向量表示所述每个词在其所属的词序列中的位置;Obtain the word embedding vector and the position embedding vector corresponding to each word in the m word sequences after the mask processing, wherein the word embedding vector corresponding to each word is a multi-dimensional representation of each word. vector, the position embedding vector corresponding to each word represents the position of each word in the word sequence to which it belongs;
    分别根据所述掩码处理后的m个词序列中每个词对应的词嵌入向量和位置嵌入向量,获取所述掩码处理后的m个词序列对应的m个第一行向量;According to the word embedding vector and the position embedding vector corresponding to each word in the m word sequences after the mask processing, respectively, obtain m first row vectors corresponding to the m word sequences after the mask processing;
    使用所述m个第一行向量,对初始日志异常检测模型进行预训练,得到预训练的日志异常检测模型。Using the m first row vectors, pre-train the initial log anomaly detection model to obtain a pre-trained log anomaly detection model.
  14. 根据权利要求13所述的装置,其特征在于,所述训练模块,具体用于:The device according to claim 13, wherein the training module is specifically used for:
    分别将所述m个第一行向量,输入初始日志异常检测模型进行训练,得到m个第二行向量,其中,所述m个第二行向量与所述掩码处理后的m个词序列存在一一对应关系,所述m个第二行向量中的每个第二行向量包括与其对应的掩码处理后的词序列的语义信息;The m first row vectors are respectively input into the initial log anomaly detection model for training, and m second row vectors are obtained, wherein the m second row vectors and the m word sequences processed by the mask There is a one-to-one correspondence, and each second row vector in the m second row vectors includes the semantic information of the word sequence after its corresponding mask processing;
    获取所述m个第二行向量到初始聚类中心的损失;Obtain the loss of the m second row vectors to the initial cluster center;
    根据所述m个第二行向量到所述初始聚类中心的损失,训练初始日志异常检测模型,得到预训练的日志异常检测模型以及目标聚类中心。According to the loss of the m second row vectors to the initial cluster center, an initial log anomaly detection model is trained to obtain a pre-trained log anomaly detection model and a target cluster center.
  15. 根据权利要求14所述的装置,其特征在于,所述训练模块,还用于:The device according to claim 14, wherein the training module is further used for:
    获取所述m个第二行向量到所述目标聚类中心的损失对应的百分位数;Obtain the percentile corresponding to the loss of the m second row vectors to the target cluster center;
    根据所述m个第二行向量到所述目标聚类中心的损失对应的百分位数确定分类阈值,其中,所述分类阈值用于所述训练好的日志异常检测模型对所述待检测日志进行异常检测,得到检测结果。The classification threshold is determined according to the percentile corresponding to the loss of the m second row vectors to the target cluster center, wherein the classification threshold is used for the trained log anomaly detection model to detect the to-be-detected log. Anomaly detection is performed on the log, and the detection result is obtained.
  16. 根据权利要求14或15所述的装置,其特征在于,所述获取所述m个第二行向量到初始聚类中心的损失的公式为:The device according to claim 14 or 15, wherein the formula for obtaining the loss from the m second row vectors to the initial cluster center is:
    Figure PCTCN2021120446-appb-100002
    Figure PCTCN2021120446-appb-100002
    其中,V i表示所述m个第二行向量中的第i个第二行向量,c表示所述初始聚类中心,loss(c,V i)表示所述第i个第二行向量到所述初始聚类中心的损失,i为自然数。 Wherein, V i represents the ith second row vector among the m second row vectors, c represents the initial cluster center, and loss(c,V i ) represents the ith second row vector to The loss of the initial cluster center, i is a natural number.
  17. 一种非瞬态计算机可读存储介质,其特征在于,所述非瞬态计算机可读介质存储有指令,所述指令用于实现权利要求1至8任一项所述的方法。A non-transitory computer-readable storage medium, characterized in that, the non-transitory computer-readable medium stores instructions, and the instructions are used to implement the method of any one of claims 1 to 8.
  18. 一种计算设备,其特征在于,所述计算设备包括处理器和存储器;所述处理器用于执行所述存储器存储的指令,使得所述计算设备实现权利要求1至8任一项所述的方法。A computing device, characterized in that the computing device includes a processor and a memory; the processor is configured to execute instructions stored in the memory, so that the computing device implements the method of any one of claims 1 to 8 .
PCT/CN2021/120446 2021-04-29 2021-09-24 Log anomaly detection model training method, apparatus and device WO2022227388A1 (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
CN202110471278 2021-04-29
CN202110471278.7 2021-04-29
CN202110699643.X 2021-06-23
CN202110699643.XA CN115269304A (en) 2021-04-29 2021-06-23 Log anomaly detection model training method, device and equipment

Publications (1)

Publication Number Publication Date
WO2022227388A1 true WO2022227388A1 (en) 2022-11-03

Family

ID=83745884

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/120446 WO2022227388A1 (en) 2021-04-29 2021-09-24 Log anomaly detection model training method, apparatus and device

Country Status (2)

Country Link
CN (1) CN115269304A (en)
WO (1) WO2022227388A1 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115809662A (en) * 2023-02-03 2023-03-17 北京匠数科技有限公司 Text content abnormity detection method, device, equipment and medium
CN115941357A (en) * 2023-01-09 2023-04-07 北京安帝科技有限公司 Flow log detection method and device based on industrial safety and electronic equipment
CN116170300A (en) * 2023-02-24 2023-05-26 山东云天安全技术有限公司 Data processing method, electronic equipment and medium for determining abnormal log information
CN117009911A (en) * 2023-10-08 2023-11-07 深圳安天网络安全技术有限公司 Abnormality determination method and device for target event, medium and electronic equipment
CN117473225A (en) * 2023-10-17 2024-01-30 杭州智顺科技有限公司 Log data management method and device, electronic equipment and readable storage medium

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115794465B (en) * 2022-11-10 2023-12-19 上海鼎茂信息技术有限公司 Log abnormality detection method and system
CN117077062A (en) * 2023-08-31 2023-11-17 木卫四(北京)科技有限公司 Method and device for detecting abnormality of automobile instruction

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108154029A (en) * 2017-10-25 2018-06-12 上海观安信息技术股份有限公司 Intrusion detection method, electronic equipment and computer storage media
US20190171428A1 (en) * 2017-12-04 2019-06-06 Banjo, Inc. Automated model management methods
CN110210512A (en) * 2019-04-19 2019-09-06 北京亿阳信通科技有限公司 A kind of automation daily record method for detecting abnormality and system
CN110321371A (en) * 2019-07-01 2019-10-11 腾讯科技(深圳)有限公司 Daily record data method for detecting abnormality, device, terminal and medium
CN111177095A (en) * 2019-12-10 2020-05-19 中移(杭州)信息技术有限公司 Log analysis method and device, computer equipment and storage medium
CN111930597A (en) * 2020-08-13 2020-11-13 南开大学 Log anomaly detection method based on transfer learning

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108154029A (en) * 2017-10-25 2018-06-12 上海观安信息技术股份有限公司 Intrusion detection method, electronic equipment and computer storage media
US20190171428A1 (en) * 2017-12-04 2019-06-06 Banjo, Inc. Automated model management methods
CN110210512A (en) * 2019-04-19 2019-09-06 北京亿阳信通科技有限公司 A kind of automation daily record method for detecting abnormality and system
CN110321371A (en) * 2019-07-01 2019-10-11 腾讯科技(深圳)有限公司 Daily record data method for detecting abnormality, device, terminal and medium
CN111177095A (en) * 2019-12-10 2020-05-19 中移(杭州)信息技术有限公司 Log analysis method and device, computer equipment and storage medium
CN111930597A (en) * 2020-08-13 2020-11-13 南开大学 Log anomaly detection method based on transfer learning

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115941357A (en) * 2023-01-09 2023-04-07 北京安帝科技有限公司 Flow log detection method and device based on industrial safety and electronic equipment
CN115941357B (en) * 2023-01-09 2023-05-12 北京安帝科技有限公司 Industrial safety-based flow log detection method and device and electronic equipment
CN115809662A (en) * 2023-02-03 2023-03-17 北京匠数科技有限公司 Text content abnormity detection method, device, equipment and medium
CN116170300A (en) * 2023-02-24 2023-05-26 山东云天安全技术有限公司 Data processing method, electronic equipment and medium for determining abnormal log information
CN116170300B (en) * 2023-02-24 2024-01-23 山东云天安全技术有限公司 Data processing method, electronic equipment and medium for determining abnormal log information
CN117009911A (en) * 2023-10-08 2023-11-07 深圳安天网络安全技术有限公司 Abnormality determination method and device for target event, medium and electronic equipment
CN117009911B (en) * 2023-10-08 2023-12-08 深圳安天网络安全技术有限公司 Abnormality determination method and device for target event, medium and electronic equipment
CN117473225A (en) * 2023-10-17 2024-01-30 杭州智顺科技有限公司 Log data management method and device, electronic equipment and readable storage medium

Also Published As

Publication number Publication date
CN115269304A (en) 2022-11-01

Similar Documents

Publication Publication Date Title
WO2022227388A1 (en) Log anomaly detection model training method, apparatus and device
WO2021258348A1 (en) Abnormal flow detection method and system and computer storage medium
US20230075486A1 (en) Systems and methods for multivariate anomaly detection in software monitoring
CN112235264B (en) Network traffic identification method and device based on deep migration learning
CN112003870A (en) Network encryption traffic identification method and device based on deep learning
CN111371806A (en) Web attack detection method and device
WO2021151292A1 (en) Corpus monitoring method based on mask language model, corpus monitoring apparatus, device, and medium
CN112839034A (en) Network intrusion detection method based on CNN-GRU hierarchical neural network
WO2022121178A1 (en) Training method and apparatus and recognition method and apparatus for text error correction model, and computer device
WO2021208727A1 (en) Text error detection method and apparatus based on artificial intelligence, and computer device
CN112910859B (en) Internet of things equipment monitoring and early warning method based on C5.0 decision tree and time sequence analysis
CN110855648B (en) Early warning control method and device for network attack
CN110912908B (en) Network protocol anomaly detection method and device, computer equipment and storage medium
WO2023116111A1 (en) Disk fault prediction method and apparatus
CN113591077A (en) Network attack behavior prediction method and device, electronic equipment and storage medium
CN111159481B (en) Edge prediction method and device for graph data and terminal equipment
CN115600128A (en) Semi-supervised encrypted traffic classification method and device and storage medium
CN116795977A (en) Data processing method, apparatus, device and computer readable storage medium
CN114826681A (en) DGA domain name detection method, system, medium, equipment and terminal
WO2021189845A1 (en) Detection method and apparatus for time series anomaly point, and device and readable storage medium
Li et al. Semi-wtc: A practical semi-supervised framework for attack categorization through weight-task consistency
CN112348041B (en) Log classification and log classification training method and device, equipment and storage medium
CN114692778B (en) Multi-mode sample set generation method, training method and device for intelligent inspection
CN113612777B (en) Training method, flow classification method, device, electronic equipment and storage medium
CN113822684B (en) Black-birth user identification model training method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21938852

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21938852

Country of ref document: EP

Kind code of ref document: A1