CN115982366A - Log classification method and device, electronic equipment and storage medium - Google Patents

Log classification method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN115982366A
CN115982366A CN202310077768.8A CN202310077768A CN115982366A CN 115982366 A CN115982366 A CN 115982366A CN 202310077768 A CN202310077768 A CN 202310077768A CN 115982366 A CN115982366 A CN 115982366A
Authority
CN
China
Prior art keywords
log
word
model
classification
template
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310077768.8A
Other languages
Chinese (zh)
Inventor
李东江
张静
张宪波
杨继成
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jingdong Technology Information Technology Co Ltd
Original Assignee
Jingdong Technology Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jingdong Technology Information Technology Co Ltd filed Critical Jingdong Technology Information Technology Co Ltd
Priority to CN202310077768.8A priority Critical patent/CN115982366A/en
Publication of CN115982366A publication Critical patent/CN115982366A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a log classification method and device, electronic equipment and a storage medium. The method comprises the following steps: acquiring a log to be classified, and extracting a log template corresponding to the log to be classified; determining a feature vector of the log template based on each word in the log template, the part of speech and the position of the word; and classifying the characteristic vectors based on a pre-trained log classification model to obtain a classification result of the log to be classified. According to the technical scheme, the influence of the part of speech and the position of the word on log classification is considered, the part of speech and the position of the word are added into the feature vector, log features are extracted to the maximum extent, the accuracy of log classification is improved, the feature vector is classified through pre-training a log classification model to obtain a classification result, log classification can be performed on the features to be classified in time, and the timeliness of the log classification is improved.

Description

Log classification method and device, electronic equipment and storage medium
Technical Field
The invention relates to the technical field of computers, in particular to a log classification method and device, electronic equipment and a storage medium.
Background
With the rapid development of the internet and computer technology, personal computers or servers have become an indispensable part of individuals and enterprises. The server and the computer often produce a large amount of log data in the operation process, the problem root and the specific problem can be traced through the log data, and the operation and maintenance logs are classified for rapidly searching the problem root so as to be convenient for searching the problem in the later period.
At present, the prior art is a regular method based on manual handwriting or an artificial intelligence method to perform classification matching on log data, and the regular method based on manual handwriting generally requires a large number of operation and maintenance workers to perform rule design on various logs, so that logs of the same class are divided into log data of the same class during regular matching, and a large number of workers are required to realize regular matching of new log rules under the condition that new logs appear; the artificial intelligence method classifies the logs through machine learning, deep learning and reinforcement learning, and can reduce the participation of artificial rules to a great extent.
However, the method based on the regular matching requires a large amount of professionals, and the artificial de-writing rule often has some omissions and miswritings, so that the artificial rule often has certain limitations in order to timely classify and improve the accurate classification result; although the artificial intelligence method can reduce the participation of artificial rules to a great extent, the time is required for training and reasoning the model, the complex model classification accuracy is high, a large amount of time is required for training and reasoning, the simple model requires less time for training and reasoning, the accuracy is often low, the accuracy and the time cannot be considered simultaneously, certain limitation exists, and the training time, the reasoning time and the accuracy cannot be improved simultaneously.
Disclosure of Invention
The invention provides a log classification method, a log classification device, electronic equipment and a storage medium, and aims to solve the problem that log data cannot be classified timely and accurately in the prior art.
According to an aspect of the present invention, there is provided a log classification method, including:
acquiring a log to be classified, and extracting a log template corresponding to the log to be classified;
determining a feature vector of the log template based on each word in the log template, the part of speech and the position of the word;
and classifying the characteristic vectors based on a pre-trained log classification model to obtain a classification result of the log to be classified.
According to another aspect of the present invention, there is provided a log sorting apparatus, including:
the log template extraction module is used for acquiring logs to be classified and extracting log templates corresponding to the logs to be classified;
the characteristic vector determining module is used for determining the characteristic vector of the log template based on each word in the log template, the part of speech and the word position of each word;
and the log classification module is used for classifying the feature vectors based on a pre-trained log classification model to obtain a classification result of the log to be classified.
According to another aspect of the present invention, there is provided an electronic apparatus including:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein, the first and the second end of the pipe are connected with each other,
the memory stores a computer program executable by the at least one processor, the computer program being executable by the at least one processor to enable the at least one processor to perform the log classification method according to any of the embodiments of the present invention.
According to another aspect of the present invention, there is provided a computer-readable storage medium storing computer instructions for causing a processor to implement the log classification method according to any one of the embodiments of the present invention when the computer instructions are executed.
According to the technical scheme of the embodiment of the invention, the log template corresponding to the log to be classified is extracted by acquiring the log to be classified; determining a feature vector of the log template based on each word in the log template, the part of speech of each word and the position of the word; classifying the feature vectors based on a pre-trained log classification model to obtain a classification result of the log to be classified; the influence of the part of speech and the position of the word on the log classification is considered, the part of speech and the position of the word are added into the feature vector, the log features are extracted to the maximum extent, the accuracy of the log classification is improved, the feature vector is classified through pre-training a log classification model to obtain a classification result, the log classification can be performed on the features to be classified in time, and the timeliness of the log classification is improved.
It should be understood that the statements in this section are not intended to identify key or critical features of the embodiments of the present invention, nor are they intended to limit the scope of the invention. Other features of the present invention will become apparent from the following description.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a flowchart of a log classification method according to an embodiment of the present invention;
FIG. 2 is a diagram illustrating an example of a log template obtained by parsing an original log according to an embodiment of the present invention;
FIG. 3 is a diagram of a log classification model training process according to a second embodiment of the present invention;
FIG. 4 is a diagram illustrating parameter adjustment of an initialized word vector transformation model and an initialized word position vector transformation model according to a second embodiment of the present invention;
fig. 5 is a schematic structural diagram of a log classification device according to a third embodiment of the present invention;
fig. 6 is a schematic structural diagram of an electronic device according to a fourth embodiment of the present invention.
Detailed Description
In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
Example one
Fig. 1 is a flowchart of a log classifying method according to an embodiment of the present invention, where the present embodiment is applicable to a case where log data generated by a server and a computer during an operation process is classified, and the method may be executed by a log classifying device, where the log classifying device may be implemented in a form of hardware and/or software, and the log classifying device may be configured in the server or the computer. As shown in fig. 1, the method includes:
s110, obtaining the log to be classified, and extracting a log template corresponding to the log to be classified.
The log to be classified is an original log generated by a server or a computer in the running process, the original log comprises a variable part and a fixed format part, the variable part comprises but is not limited to corresponding time, server IP and the like, the fixed format part is a main constituent part of the log, such as the type of an event, the name of the event and the like, and the fixed format can represent the basic situation of the original log. The log template refers to a template obtained by removing a variable part from the log to be classified, specifically, the variable part in the log to be classified can be replaced by characters, so that the log template of the log to be classified is obtained, and exemplarily, the variable part can be replaced by a word. In this embodiment, the log to be classified is obtained, and is analyzed according to a log template extraction algorithm, so as to extract a log template in the log to be classified. The log template extraction algorithm includes, but is not limited to, FT-Tree, which is not limited herein.
For example, fig. 2 is an exemplary diagram of a Log Template obtained by analyzing logs of an original Log according to an embodiment of the present invention, and as shown in fig. 2, a variable part in each original Log is replaced with "x" to obtain the same Template used for expressing a certain type of Log, for example, both Log templates obtained by extracting Log1 and Log5 of the original Log are Log templates Template1, and Log Template1 may be a Log Template of all logs of the same type as Log1 and Log 5.
And S120, determining a feature vector of the log template based on each word in the log template, the part of speech and the word position of each word.
The part of speech refers to the part of speech of each english word in the log template, specifically, the part of speech includes but is not limited to verb, noun, adjective, etc., and is not limited herein; the word position refers to the position of each english word in the log template.
On the basis of the foregoing embodiment, optionally, the determining the feature vector of the log template based on each word in the log template, the part of speech of each word, and the word position includes: carrying out vector conversion on each word in the log template to obtain a word vector corresponding to the log template; determining a word position vector corresponding to the log template; determining the part of speech of each word in the log template, and determining the weight data corresponding to each word based on the part of speech; determining a feature vector of the log template based on the word vector, the word position vector, and weight data corresponding to each of the words.
Specifically, vector conversion may be performed on each word in the log template according to a word vector conversion model to obtain a word vector corresponding to the log template, where the word vector conversion model includes, but is not limited to, methods such as one-hot coding, singular Value Decomposition (SVD), and word2vector Decomposition, and the like, and is not limited herein.
In some embodiments, the initial construction may be performed using pre-trained word vectors to speed up the training process.
Specifically, the word position vector corresponding to the log template may be obtained by performing vector transformation on the word position corresponding to the log template according to the word position vector transformation model. The log template may be converted into a log template with a preset character length, wherein the blank position may be filled with "0". For example, the preset character length may be statistically derived from the character amount of the history log template. Illustratively, the word position is vector initialized based on the torch.nn.embedded () function to obtain a position encoding matrix, and the position encoding matrix is updated along with the subsequent training of the word position vector conversion model.
Specifically, part-of-speech tagging may be performed on each word in the log template according to a natural language processing method to obtain part-of-speech of each word in the log template, and weight data corresponding to each word may be set according to the part-of-speech of each word, where the natural language processing method includes, but is not limited to, jieba, snowNLP, THULAC, stanfordCoreNLP, hanlp, NLTK, and the like, and is not limited herein.
In this embodiment, vector fusion is performed according to the word vector, the word position vector, and the weight data corresponding to each word, so as to obtain a feature vector of the log template. It can be understood that the importance of each word is characterized by the part-of-speech weight, the corresponding weight data of the word with importance in part-of-speech can be properly amplified, and the corresponding weight data of the word with relatively unimportant part-of-speech can be reduced.
It should be noted that although the variable part is removed from the log template, there still exist some non-native english word characters with small meaning for log classification and english words such as "a", "the", "at", etc. that have no or little practical meaning, and these words and characters should also be removed; however, according to the invention, by setting the weight data of the part of speech of each word in the log template (the weight data of non-native English word characters and English words without or with little practical significance can be adjusted to 0 or a minimum numerical value), no new matching rule needs to be additionally constructed, the workload of operators is reduced, and the log classification efficiency is improved.
On the basis of the foregoing embodiment, optionally, the determining the feature vector of the log template based on the word vector, the word position vector, and the weight data corresponding to each word includes: obtaining an intermediate vector based on the word vector and the weight data corresponding to each word; and obtaining a feature vector of the log template based on the sum of the intermediate vector and the word position vector.
The intermediate vector refers to a word vector with a part of speech, and specifically, the intermediate vector can be obtained by multiplying the word vector by corresponding weight data of each word. In this embodiment, the word vector is multiplied by the corresponding weight data of each word to obtain an intermediate vector, and the intermediate vector is added to the word position vector to obtain a feature vector of the log template.
S130, classifying the feature vectors based on a pre-trained log classification model to obtain a classification result of the log to be classified.
In the embodiment, the feature vectors are classified according to a pre-trained log classification model to obtain a classification result of the log to be classified; the classification result may include a normal log and an abnormal log, and may also include a statistical log, an access log, and a diagnosis log, which are not limited herein.
On the basis of the foregoing embodiment, optionally, after the log template corresponding to the log to be classified is extracted, the method further includes: matching the log template with a classified template; and if the matching is successful, determining the classification result of the log to be classified based on the classification result of the classified template which is successfully matched.
The classified template refers to a log template subjected to log classification, and the log template can represent a type of log, so that the classification result of the classified template can represent the classification result of any log matched with the log template and the classified template. In this embodiment, after extracting the log template corresponding to the log to be classified, the log template may be matched with the classified template, and if the matching is successful, the classified template is the log template of the log to be classified, and the classification result of the classified template may be used as the classification result of the log to be classified. According to the method and the device, the log template is matched with the classified template, repeated classification of the logs of the same type is avoided, resource loss is reduced, and the resource utilization rate is improved while the log classification efficiency is improved.
According to the technical scheme of the embodiment, the log template corresponding to the log to be classified is extracted by acquiring the log to be classified; determining a feature vector of the log template based on each word in the log template, the part of speech of each word and the position of the word; classifying the feature vectors based on a pre-trained log classification model to obtain a classification result of the log to be classified; the method has the advantages that the influence of the part of speech and the position of the word on log classification is considered, the part of speech and the position of the word are added into the feature vector, log features are extracted to the maximum extent, the accuracy of log classification is improved, the feature vector is classified through pre-training a log classification model to obtain a classification result, log classification can be performed on the features to be classified in time, and the timeliness of the log classification is improved.
Example two
The present embodiment specifically describes the training process of the log classification model based on the above embodiments. The method comprises the following steps:
s210, obtaining the log to be classified, and extracting a log template corresponding to the log to be classified.
S220, determining a feature vector of the log template based on each word in the log template, the part of speech and the word position of each word.
And S230, classifying the feature vectors based on a pre-trained log classification model to obtain a classification result of the log to be classified.
On the basis of the foregoing embodiment, optionally, the training process of the log classification model includes: iteratively executing the following training process until a training end condition is met, and determining the trained student model as a log classification model:
1) Obtaining a sample log, and extracting a sample log template of the sample log;
2) Respectively determining input information of a teacher model and input information of a student model based on the sample log template;
3) Inputting input information of the student model into the student model to obtain training classification results under different temperature parameters, and inputting input information of the teacher model into the teacher model to obtain a teacher classification result;
4) And determining a loss function based on the teacher classification result, the training classification result and the classification label of the sample log, and adjusting model parameters of the student model based on the loss function.
The training end condition may be that the iteration number reaches a preset iteration number threshold, or that the loss does not decrease any more, and is not limited here; in this embodiment, fig. 3 is a schematic diagram of a log classification model training process provided in the second embodiment of the present invention. As shown in fig. 3, the knowledge-based distillation method trains a log classification model (i.e., a student model) through a trained teacher model, specifically, obtains a sample log, extracts a sample log template of the sample log, determines input information of the teacher model and input information of the student model based on the sample log template, inputs the input information of the teacher model into the trained teacher model to obtain a teacher classification result under a first temperature parameter, inputs the input information of the student model into the student model, and trains the student model simultaneously under the first temperature parameter and the second temperature parameter to obtain a training classification result under the first temperature parameter and a training classification result under the second temperature parameter; and determining a loss function based on the teacher classification result under the first temperature parameter, the training classification result under the second temperature parameter and the classification label of the sample log, and adjusting the model parameters of the student model based on the loss function. The teacher model is a trained model, specifically, the teacher model may be a Bert model, and is not limited herein; the student model is a model to be trained, specifically, the student model may be a TextCNN model, which is not limited herein; the first temperature parameter is a high temperature parameter, and is set by a person skilled in the art according to training requirements, and the first temperature parameter is greater than the second temperature parameter, and the second temperature parameter may be 1.
It can be understood that, for the log classification model trained based on the knowledge distillation method, training can be continued based on the training data to perform fine tuning on the log classification model.
On the basis of the above embodiment, optionally, the input information of the student model is determined based on each word in the sample log template, the part of speech of each word, and the position of the word; the input information of the teacher model is determined based on the input requirements of the teacher model.
In this embodiment, the input information of the student model is determined based on each word, the part of speech of each word, and the word position in the sample log template, and specifically, may be a feature vector of the sample log template determined based on each word, the part of speech of each word, and the word position in the sample log template; the teacher model is a trained model without fine adjustment, and the input information of the teacher model depends on the input requirements of the teacher model; illustratively, taking the Bert model as an example, the characteristic inputs of the Bert model include three types, token entries (vector representation of a word itself), segment entries (vector representation for distinguishing two sentences), and position entries (word position information is encoded into a characteristic vector).
It should be noted that if the trained teacher model is subjected to fine tuning, the teacher model needs to be further trained, so that the training speed of the log classification model is influenced.
On the basis of the foregoing embodiment, optionally, the determining a loss function based on the teacher classification result, the training classification result, and the classification label of the sample log includes: determining a first loss term based on the teacher classification result and a training classification result at a first temperature parameter; determining a second loss term based on the classification label of the sample log and a training classification result under a second temperature parameter; determining a loss function based on the first loss term and the second loss term.
In this embodiment, as shown in fig. 3, the loss function is obtained by weighting a first loss term and a second loss term, specifically, the first loss term is determined based on a teacher classification result and a training classification result at a first temperature, and the second loss term is determined based on a classification label of the sample log and a training classification result at a second temperature parameter.
Illustratively, the loss function L may be expressed as:
L=αL soft +(1-α)L hard
wherein L represents a loss function, α is a weight parameter, L soft Represents a first loss term, L hard Representing a second loss term.
In the process of training the learning model, optionally, the method further includes: acquiring an initialized word vector conversion model and an initialized word position vector conversion model, wherein the word vector conversion model is used for performing word vector conversion on an input log template, and the word position vector conversion model is used for performing word position vector conversion on the input log template; and in the training process of the student model, carrying out model parameter adjustment on the initialized word vector conversion model and the initialized word position vector conversion model.
In this embodiment, fig. 4 is a schematic parameter adjustment diagram of the initialized word vector conversion model and the initialized word position vector conversion model provided in the second embodiment of the present invention, and as shown in fig. 4, the initialized word vector conversion model and the initialized word position vector conversion model are obtained, the sample log template is input into the initialized word vector conversion model, and the sample log template is subjected to word vector conversion based on the initialized word vector conversion model to obtain a sample word vector of the sample log template; inputting the sample log template into an initialized word position vector conversion model, and performing word position vector conversion on the sample log template based on the initialized word position vector conversion model to obtain a sample word position vector of the sample log template; the sample word vector and the sample word position vector are used for determining input information of the teacher model and the student model, and training of a subsequent log classification model is conducted on the basis of the input information of the teacher model and the input information of the student model. In the training process of the student model (namely, the log classification model), model parameter adjustment is carried out on the initialized word vector conversion model and the initialized word position vector conversion model according to the training of the log classification model so as to update the initialized word vector conversion model and the initialized word position vector conversion model.
According to the technical scheme, knowledge distillation training is directly carried out on the log classification model through the unadjusted teacher model, so that the lightweight log classification model is obtained, training resources are saved, the training speed of the log classification model is increased, and meanwhile, the lightweight log classification model is convenient to store and deploy.
EXAMPLE III
Fig. 5 is a schematic structural diagram of a log classification device according to a third embodiment of the present invention. As shown in fig. 5, the apparatus includes:
a log template extracting module 510, configured to obtain a log to be classified, and extract a log template corresponding to the log to be classified;
a feature vector determining module 520, configured to determine a feature vector of the log template based on each word in the log template, a part of speech of each word, and a word position;
and the log classification module 530 is configured to perform classification processing on the feature vectors based on a pre-trained log classification model to obtain a classification result of the log to be classified.
According to the technical scheme of the embodiment, the log template corresponding to the log to be classified is extracted by acquiring the log to be classified; determining a feature vector of the log template based on each word in the log template, the part of speech of each word and the position of the word; classifying the feature vectors based on a pre-trained log classification model to obtain a classification result of the log to be classified; the method has the advantages that the influence of the part of speech and the position of the word on log classification is considered, the part of speech and the position of the word are added into the feature vector, log features are extracted to the maximum extent, the accuracy of log classification is improved, the feature vector is classified through pre-training a log classification model to obtain a classification result, log classification can be performed on the features to be classified in time, and the timeliness of the log classification is improved.
On the basis of the foregoing embodiment, optionally, the feature vector determining module 520 includes a word vector determining unit, a word position vector determining unit, a weight data determining unit, and a feature vector determining unit, where:
the word vector determining unit is used for performing vector conversion on each word in the log template to obtain a word vector corresponding to the log template;
the word position vector determining unit is used for determining a word position vector corresponding to the log template;
the weight data determining unit is used for determining the part of speech of each word in the log template and determining the weight data corresponding to each word based on the part of speech;
the feature vector determining unit is used for determining a feature vector of the log template based on the word vector, the word position vector and the weight data corresponding to each word.
On the basis of the foregoing embodiment, optionally, the eigenvector determining unit is specifically configured to obtain an intermediate vector based on the word vector and the weight data corresponding to each word; and obtaining a feature vector of the log template based on the sum of the intermediate vector and the word position vector.
On the basis of the above embodiment, optionally, after the log template corresponding to the log to be classified is extracted, the apparatus further includes a classified template classification module, configured to match the log template with a classified template; and if the matching is successful, determining the classification result of the log to be classified based on the classification result of the classified template which is successfully matched.
On the basis of the foregoing embodiment, optionally, the apparatus further includes a log classification model training module, configured to train the log classification model, where a training process of the log classification model includes: iteratively executing the following training process until a training end condition is met, and determining the trained student model as a log classification model: obtaining a sample log, and extracting a sample log template of the sample log; respectively determining input information of a teacher model and input information of a student model based on the sample log template; inputting input information of the student model into the student model to obtain training classification results under different temperature parameters, and inputting input information of the teacher model into the teacher model to obtain a teacher classification result; and determining a loss function based on the teacher classification result, the training classification result and the classification label of the sample log, and adjusting model parameters of the student model based on the loss function.
On the basis of the above embodiment, optionally, the input information of the student model is determined based on each word in the sample log template, the part of speech of each word, and the position of the word; the input information of the teacher model is determined based on the input requirements of the teacher model.
On the basis of the foregoing embodiment, optionally, the log classification model training module includes a loss function determining unit, configured to determine a first loss term based on the teacher classification result and a training classification result under a first temperature parameter; determining a second loss term based on the classification label of the sample log and a training classification result under a second temperature parameter; determining a loss function based on the first loss term and the second loss term.
On the basis of the foregoing embodiment, optionally, the apparatus further includes a model parameter adjusting module, configured to obtain an initialized word vector conversion model and an initialized word position vector conversion model, where the word vector conversion model is configured to perform word vector conversion on an input log template, and the word position vector conversion model is configured to perform word position vector conversion on the input log template; and in the training process of the student model, carrying out model parameter adjustment on the initialized word vector conversion model and the initialized word position vector conversion model.
The log classification device provided by the embodiment of the invention can execute the log classification method provided by any embodiment of the invention, and has corresponding functional modules and beneficial effects of the execution method.
Example four
Fig. 6 is a schematic structural diagram of an electronic device according to a fourth embodiment of the present invention. The electronic device 10 is intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital assistants, cellular phones, smart phones, wearable devices (e.g., helmets, glasses, watches, etc.), and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the inventions described and/or claimed herein.
As shown in fig. 6, the electronic device 10 includes at least one processor 11, and a memory communicatively connected to the at least one processor 11, such as a Read Only Memory (ROM) 12, a Random Access Memory (RAM) 13, and the like, wherein the memory stores a computer program executable by the at least one processor, and the processor 11 can perform various suitable actions and processes according to the computer program stored in the Read Only Memory (ROM) 12 or the computer program loaded from a storage unit 18 into the Random Access Memory (RAM) 13. In the RAM 13, various programs and data necessary for the operation of the electronic apparatus 10 can also be stored. The processor 11, the ROM 12, and the RAM 13 are connected to each other via a bus 14. An input/output (I/O) interface 15 is also connected to bus 14.
A number of components in the electronic device 10 are connected to the I/O interface 15, including: an input unit 16 such as a keyboard, a mouse, or the like; an output unit 17 such as various types of displays, speakers, and the like; a storage unit 18 such as a magnetic disk, an optical disk, or the like; and a communication unit 19 such as a network card, modem, wireless communication transceiver, etc. The communication unit 19 allows the electronic device 10 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.
The processor 11 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of processor 11 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various processors running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, or the like. The processor 11 performs the various methods and processes described above, such as the log classification method.
In some embodiments, the log classification method may be implemented as a computer program tangibly embodied in a computer-readable storage medium, such as storage unit 18. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 10 via the ROM 12 and/or the communication unit 19. When the computer program is loaded into RAM 13 and executed by processor 11, one or more steps of the log sorting method described above may be performed. Alternatively, in other embodiments, the processor 11 may be configured to perform the log classification method by any other suitable means (e.g., by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), system on a chip (SOCs), complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
A computer program for implementing the log classification method of the present invention may be written in any combination of one or more programming languages. These computer programs may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the computer programs, when executed by the processor, cause the functions/acts specified in the flowchart and/or block diagram block or blocks to be performed. A computer program can execute entirely on a machine, partly on a machine, as a stand-alone software package partly on a machine and partly on a remote machine or entirely on a remote machine or server.
EXAMPLE five
An embodiment of the present invention further provides a computer-readable storage medium, where a computer instruction is stored, where the computer instruction is used to enable a processor to execute a log classification method, where the method includes:
acquiring a log to be classified, and extracting a log template corresponding to the log to be classified; determining a feature vector of the log template based on each word in the log template, the part of speech and the position of the word; and classifying the characteristic vectors based on a pre-trained log classification model to obtain a classification result of the log to be classified.
In the context of the present invention, a computer-readable storage medium may be a tangible medium that can contain, or store a computer program for use by or in connection with an instruction execution system, apparatus, or device. A computer readable storage medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. Alternatively, the computer readable storage medium may be a machine readable signal medium. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on an electronic device having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the electronic device. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), blockchain networks, and the internet.
The computing system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical host and VPS service are overcome.
It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present invention may be executed in parallel, sequentially, or in different orders, and are not limited herein as long as the desired result of the technical solution of the present invention can be achieved.
The above-described embodiments should not be construed as limiting the scope of the invention. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (11)

1. A log classification method, comprising:
acquiring a log to be classified, and extracting a log template corresponding to the log to be classified;
determining a feature vector of the log template based on each word in the log template, the part of speech and the position of the word;
and classifying the characteristic vectors based on a pre-trained log classification model to obtain a classification result of the log to be classified.
2. The method of claim 1, wherein determining the feature vector of the log template based on each word in the log template, the part of speech and the word position of each word comprises:
carrying out vector conversion on each word in the log template to obtain a word vector corresponding to the log template;
determining a word position vector corresponding to the log template;
determining the part of speech of each word in the log template, and determining the weight data corresponding to each word based on the part of speech;
determining a feature vector of the log template based on the word vector, the word position vector and weight data corresponding to each of the words.
3. The method of claim 2, wherein determining the feature vector of the log template based on the word vector, the word position vector, and the weight data corresponding to each of the words comprises:
obtaining an intermediate vector based on the word vector and the weight data corresponding to each word;
and obtaining a feature vector of the log template based on the sum of the intermediate vector and the word position vector.
4. The method according to claim 1, wherein after the extracting of the log template corresponding to the log to be classified, the method further comprises:
matching the log template with a classified template;
and if the matching is successful, determining the classification result of the log to be classified based on the classification result of the classified template which is successfully matched.
5. The method of claim 1, wherein the training process of the log classification model comprises:
iteratively executing the following training process until a training end condition is met, and determining the trained student model as a log classification model:
obtaining a sample log, and extracting a sample log template of the sample log;
respectively determining input information of a teacher model and input information of a student model based on the sample log template;
inputting input information of the student model into the student model to obtain training classification results under different temperature parameters, and inputting input information of the teacher model into the teacher model to obtain a teacher classification result;
and determining a loss function based on the teacher classification result, the training classification result and the classification label of the sample log, and adjusting model parameters of the student model based on the loss function.
6. The method of claim 5, wherein the input information for the student model is determined based on words in the sample log template, parts of speech and word positions of the words;
the input information of the teacher model is determined based on the input requirements of the teacher model.
7. The method of claim 5, wherein determining a loss function based on the teacher classification result, the training classification results, and the classification labels of the sample logs comprises:
determining a first loss term based on the teacher classification result and a training classification result at a first temperature parameter;
determining a second loss term based on the classification label of the sample log and a training classification result under a second temperature parameter;
determining a loss function based on the first loss term and the second loss term.
8. The method of claim 5, further comprising:
acquiring an initialized word vector conversion model and an initialized word position vector conversion model, wherein the word vector conversion model is used for performing word vector conversion on an input log template, and the word position vector conversion model is used for performing word position vector conversion on the input log template;
and in the training process of the student model, carrying out model parameter adjustment on the initialized word vector conversion model and the initialized word position vector conversion model.
9. A log sorting apparatus, comprising:
the log template extraction module is used for acquiring logs to be classified and extracting log templates corresponding to the logs to be classified;
the characteristic vector determining module is used for determining a characteristic vector of the log template based on each word in the log template, the part of speech and the word position of each word;
and the log classification module is used for classifying the feature vectors based on a pre-trained log classification model to obtain a classification result of the log to be classified.
10. An electronic device, characterized in that the electronic device comprises:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores a computer program executable by the at least one processor, the computer program being executable by the at least one processor to enable the at least one processor to perform the log sorting method of any one of claims 1-8.
11. A computer-readable storage medium storing computer instructions for causing a processor to perform the log classification method of any one of claims 1-8 when executed.
CN202310077768.8A 2023-01-16 2023-01-16 Log classification method and device, electronic equipment and storage medium Pending CN115982366A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310077768.8A CN115982366A (en) 2023-01-16 2023-01-16 Log classification method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310077768.8A CN115982366A (en) 2023-01-16 2023-01-16 Log classification method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN115982366A true CN115982366A (en) 2023-04-18

Family

ID=85962483

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310077768.8A Pending CN115982366A (en) 2023-01-16 2023-01-16 Log classification method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN115982366A (en)

Similar Documents

Publication Publication Date Title
US20220358292A1 (en) Method and apparatus for recognizing entity, electronic device and storage medium
CN114970522A (en) Language model pre-training method, device, equipment and storage medium
CN112926306A (en) Text error correction method, device, equipment and storage medium
CN113722493A (en) Data processing method, device, storage medium and program product for text classification
US20230342667A1 (en) Classification model training method, semantic classification method, device and medium
CN112541070A (en) Method and device for excavating slot position updating corpus, electronic equipment and storage medium
CN113836925A (en) Training method and device for pre-training language model, electronic equipment and storage medium
CN115293149A (en) Entity relationship identification method, device, equipment and storage medium
CN115454706A (en) System abnormity determining method and device, electronic equipment and storage medium
JP2023025126A (en) Training method and apparatus for deep learning model, text data processing method and apparatus, electronic device, storage medium, and computer program
CN115168562A (en) Method, device, equipment and medium for constructing intelligent question-answering system
CN112906368B (en) Industry text increment method, related device and computer program product
CN117708300A (en) Knowledge base question-answering method, device, equipment and medium
CN116955075A (en) Method, device, equipment and medium for generating analytic statement based on log
CN116340777A (en) Training method of log classification model, log classification method and device
CN115577705A (en) Method, device and equipment for generating text processing model and storage medium
CN112395873B (en) Method and device for generating white character labeling model and electronic equipment
CN115309867A (en) Text processing method, device, equipment and medium
CN115982366A (en) Log classification method and device, electronic equipment and storage medium
CN114491030A (en) Skill label extraction and candidate phrase classification model training method and device
CN114841172A (en) Knowledge distillation method, apparatus and program product for text matching double tower model
CN114119972A (en) Model acquisition and object processing method and device, electronic equipment and storage medium
CN113886543A (en) Method, apparatus, medium, and program product for generating an intent recognition model
CN114201953A (en) Keyword extraction and model training method, device, equipment and storage medium
CN115131709B (en) Video category prediction method, training method and device for video category prediction model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination