CN117112791A - Unknown log classification decision system, method and device and readable storage medium - Google Patents
Unknown log classification decision system, method and device and readable storage medium Download PDFInfo
- Publication number
- CN117112791A CN117112791A CN202311346153.7A CN202311346153A CN117112791A CN 117112791 A CN117112791 A CN 117112791A CN 202311346153 A CN202311346153 A CN 202311346153A CN 117112791 A CN117112791 A CN 117112791A
- Authority
- CN
- China
- Prior art keywords
- classification
- log
- unknown
- named
- named entity
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 55
- 238000013145 classification model Methods 0.000 claims abstract description 57
- 238000005516 engineering process Methods 0.000 claims abstract description 11
- 238000000605 extraction Methods 0.000 claims description 13
- 238000007781 pre-processing Methods 0.000 claims description 10
- 238000012545 processing Methods 0.000 claims description 10
- 238000012544 monitoring process Methods 0.000 claims description 5
- 230000011218 segmentation Effects 0.000 claims description 4
- 238000004891 communication Methods 0.000 description 8
- 230000008569 process Effects 0.000 description 8
- 238000004422 calculation algorithm Methods 0.000 description 4
- 230000009471 action Effects 0.000 description 3
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 238000013136 deep learning model Methods 0.000 description 3
- 230000006978 adaptation Effects 0.000 description 2
- 238000007635 classification algorithm Methods 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 230000006698 induction Effects 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012549 training Methods 0.000 description 2
- 238000013459 approach Methods 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 230000010485 coping Effects 0.000 description 1
- 238000003066 decision tree Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000003745 diagnosis Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000000737 periodic effect Effects 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 238000012706 support-vector machine Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
- G06F16/353—Clustering; Classification into predefined classes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
- G06F40/295—Named entity recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Databases & Information Systems (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Evolutionary Computation (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention provides an unknown log classification decision system, an unknown log classification decision method, an unknown log classification decision device and a readable storage medium, and belongs to the technical field of log classification. The system comprises: the large model summarizing module is configured to summarize, generalize and deduce the existing log classification rule strategies and named entities in different classifications through a large language model, generate a classification model based on the named entities, and perform new classification recommendation by combining the unknown logs with the existing classifications to update the classification model; the named entity identification module is configured to extract named entity information from logs of unknown types; and the classification decision module is configured to classify the log entries of the unknown type into proper categories by using a classification model based on the named entity, and generate unclassified logs to be fed back to the large model summarization module. The invention utilizes a large language model and a named entity technology, and effectively improves the accuracy and the intelligence of unknown log classification.
Description
Technical Field
The invention relates to the technical field of log classification, in particular to an unknown log classification decision system, an unknown log classification decision method, an unknown log classification decision device and a readable storage medium.
Background
In the big data age, the handling and sorting of log files is becoming more and more important. Conventional log classification methods typically rely on manually written rules or conventional machine learning classification algorithms that perform poorly when dealing with large numbers of logs of complex and unknown types.
Among them, the rule-based log classification method is one of conventional log classification methods, which relies on a rule set defined in advance to identify and classify logs, and is limited to defined rules, cannot adapt to logs of unknown types or complex log structures, and when new log types occur, the conventional method requires re-writing rules or updating algorithms, which may cause processing delay and inaccuracy. Learning a log classification model from training data based on a scheme of a machine learning classification algorithm, such as a decision tree, a support vector machine, or a neural network, automatically learning log classification rules by using labeled training data, which generally requires a large amount of labeled data to train the model, and requires periodic updating to accommodate new log types; moreover, due to the fixed classification decision adopted, the method has no flexibility and adaptability, and cannot cope with the continuously evolving log data.
Furthermore, these methods often employ regular expression matching schemes in handling special named entity recognition, which cannot accurately and efficiently process logs containing named entities that are critical to understanding the meaning of the log.
Therefore, the traditional log classification scheme has the defects of rule dependence, poor adaptability, incapability of coping with unknown types and the like.
Disclosure of Invention
Aiming at the problems, the invention aims to provide an unknown log classification decision system, an unknown log classification decision method, an unknown log classification decision device and a unknown log classification decision-making medium.
The invention aims to achieve the aim, and the aim is achieved by the following technical scheme: an unknown log classification decision system comprising: the system comprises a large model summarization module, a named entity recognition module and a classification decision module;
the large model summarizing module is configured to summarize, generalize and deduce the existing log classification rule strategies and named entities in different classifications through a large language model, generate a classification model based on the named entities, and perform new classification recommendation by combining the unknown logs with the existing classifications to update the classification model;
the named entity identification module is configured to extract named entity information from logs of unknown types;
and the classification decision module is configured to classify the log entries of the unknown type into proper categories by using a classification model based on the named entity, and generate unclassified logs to be fed back to the large model summarization module.
Further, the large model summarization module includes:
a rule extraction unit configured to extract key features and patterns from existing log classification rules and logs of corresponding rules to capture commonalities and differences of different types of logs;
an entity analysis unit configured to analyze named entities occurring in different classifications to determine their importance and relevance in the log classification;
the classification model generation unit is configured to generate a classification model based on named entities based on the extracted key features and modes and named entities appearing in different classifications;
and the classification model updating unit is configured to combine the existing classification rules based on the named entities, perform newly added classification recommendation on the unclassified logs, and update the classification model based on the named entities.
Further, the named entity recognition module includes:
a text preprocessing unit configured to perform text preprocessing on logs of unknown types;
the entity identification unit is configured to apply a named entity identification technology and automatically identify named entities in the log;
and the entity extraction unit is configured to extract the identified named entity information and perform structured storage.
Further, the classification decision module comprises:
the entity and rule comprehensive analysis unit is configured to analyze the entity and rule by using a classification model based on the named entity, stored named entity information and the content of an unknown type log, determine the frequency of the named entity and the association relationship of the named entity, and generate a comprehensive analysis result;
the classification decision unit is configured to determine the category or sub-category of the log entry classification according to the comprehensive analysis result, and generate a classification decision;
the feedback recommendation unit is configured to feed back the logs of the sub-categories which cannot be classified to the large model summarization module according to the comprehensive analysis result;
and the classification result output unit is configured to output the result of the classification decision for subsequent log management, monitoring or alarm systems.
Correspondingly, the invention also discloses an unknown log classification decision method, which comprises the following steps:
summarizing, generalizing and deducting the existing log classification rule strategy and named entities in different classifications through a large language model to generate a classification model based on the named entities;
extracting named entity information from logs of unknown types;
determining the category of the log entry according to the named entity information by using a classification model based on the named entity;
and obtaining an unknown type log of which the log entry category cannot be determined, and updating a classification model based on the named entity by adjusting parameters and the entity.
Further, the step of summarizing, generalizing and deducting the existing log classification rule strategy and named entities in different classifications through the large language model to generate a classification model based on the named entities comprises the following steps:
analyzing the existing log classification rules and history log data by using a large language model, automatically learning and understanding the log rules, and extracting key features and modes in the log rules to identify the commonalities and differences of different types of logs;
analyzing named entities appearing in different classifications using a large language model to determine their importance and relevance in log classifications;
based on the learning result, the extracted key features and modes and the analysis result of the large language model, a classification model based on the named entity is generated.
Further, the extracting named entity information from the log of unknown type includes:
performing text word segmentation, denoising and punctuation processing on the logs of unknown types;
applying a named entity recognition technology to automatically recognize named entities in the log;
and extracting the identified named entity information and carrying out structured storage.
Further, the updating the classification model based on the named entity by adjusting parameters and entities comprises the following steps:
and updating the classification model based on the named entity through parameter adjustment, entity updating and entity relation changing.
Correspondingly, the invention discloses an unknown log classification decision device, which comprises:
the memory is used for storing an unknown log classification decision program;
a processor for implementing the steps of the unknown log classification decision method as described in any of the preceding claims when executing the unknown log classification decision program.
Correspondingly, the invention discloses a readable storage medium, wherein the readable storage medium is stored with an unknown log classification decision program, and the unknown log classification decision program realizes the steps of the unknown log classification decision method according to any one of the above when being executed by a processor.
Compared with the prior art, the invention has the beneficial effects that:
1. according to the invention, the log rule and the named entity information are summarized by using the deep learning model, so that logs of unknown types can be more accurately classified, and misclassification is reduced.
2. The invention utilizes a large-scale deep learning model and a named entity recognition technology to realize intelligent classification decision, and can automatically adapt to new log types and changes.
3. Compared with the traditional method based on rules or heuristic algorithms, the method reduces the dependence on manual rule writing and characteristic engineering, and reduces the workload of operators and managers.
4. The invention is excellent in facing complex, diversified and unknown log data, does not need to update rules or models frequently, and has higher adaptability.
5. The method is excellent in facing complex, diversified and unknown log data, does not need to update rules or models frequently, has higher adaptability, has wide application prospects especially in the fields of network safety, system monitoring, fault diagnosis and the like, and can remarkably improve the value and the utilization degree of the log data.
It can be seen that the present invention has outstanding substantial features and significant advances over the prior art, as well as the benefits of its implementation.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present invention, and that other drawings can be obtained according to the provided drawings without inventive effort for a person skilled in the art.
FIG. 1 is a diagram of the results of a system in accordance with an embodiment of the present invention.
Fig. 2 is a flow chart of a method of an embodiment of the present invention.
In the figure, 1, a large model summarizing module; 2. a named entity recognition module; 3. a classification decision module; 11. a rule extraction unit; 12. an entity analysis unit; 13. a classification model generation unit; 14. a classification model updating unit; 21. a text preprocessing unit; 22. an entity identification unit; 23. an entity extraction unit; 31. an entity and rule comprehensive analysis unit; 32. a classification decision unit; 33. a feedback recommendation unit; 34. and a classification result output unit.
Detailed Description
In order to better understand the aspects of the present invention, the present invention will be described in further detail with reference to the accompanying drawings and detailed description. It will be apparent that the described embodiments are only some, but not all, embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
As shown in fig. 1, the present invention provides an unknown log classification decision system, comprising: a large model summarization module 1, a named entity recognition module 2 and a classification decision module 3.
The large model summarization module 1 is configured to summarize, generalize and deduct existing log classification rule strategies and named entities in different classifications through a large language model, generate a classification model based on the named entities, and perform new classification recommendation by combining an unknown log with the existing classifications to update the classification model.
The named entity recognition module 2 is configured to extract named entity information from logs of unknown type.
A classification decision module 3 configured to classify log entries of unknown type into appropriate categories using a named entity based classification model and generate unclassified logs for feedback to the large model summarization module 1.
In a specific embodiment, as the core of the present system, the large model summarization module 1 specifically includes: a rule extraction unit 11, an entity analysis unit 12, a classification model generation unit 13, and a classification model update unit 14.
The rule extraction unit 11 is configured to extract key features and patterns from the existing log classification rules and logs of the corresponding rules to capture commonalities and differences of the different category logs.
An entity analysis unit 12 configured to analyze named entities that appear in different classifications to determine their importance and relevance in the log classification. Among the named entities include, but are not limited to: application name, error code, date, etc.
The classification model generation unit 13 is configured to generate a classification model based on named entities based on the extracted key features and patterns, named entities occurring in different classifications. In the actual operation process, the classification model generating unit 13 generates a classification method based on named entities based on rule extraction and entity analysis, so that the subsequent classification decision module can use the classification method.
The classification model updating unit 14 is configured to combine existing classification rules based on named entities, and perform newly added classification recommendation on the unclassified logs, and update the classification model based on named entities.
In a specific embodiment, the named entity recognition module 2 includes: a text preprocessing unit 21, an entity recognition unit 22, and an entity extraction unit 23.
The text preprocessing unit 21 is configured to perform text preprocessing on logs of unknown types, including word segmentation, denoising, punctuation processing, and the like.
The entity recognition unit 22 is configured to apply a named entity recognition technique to automatically recognize named entities in the log, such as a person name, a place name, an IP address, a file path, and the like.
The entity extraction unit 23 is configured to extract the identified named entity information, and perform structural storage for use by a subsequent classification decision module.
In a specific embodiment, the classification decision module 3 comprises: the entity and rule comprehensive analysis unit 31, the classification decision unit 32, the feedback recommendation unit 33 and the classification result output unit 34.
And the entity and rule comprehensive analysis unit 31 is configured to analyze the entity and rule by using the classification model based on the named entity, the stored named entity information and the content of the unknown type log, determine the frequency of the named entity and the association relationship of the named entity, and generate a comprehensive analysis result.
A classification decision unit 32 configured to decide into which category or subcategory the log entries are classified based on the result of the comprehensive analysis.
And a feedback recommending unit 33 configured to feed back the logs of the sub-categories that cannot be classified to the large model summarizing module according to the result of the comprehensive analysis.
The classification result output unit 34 is configured to output the result of the classification decision for use by a subsequent log management, monitoring or alarm system.
Therefore, through the cooperative work of the core modules, the system realizes the intelligent classification of logs of unknown types, and fully utilizes a large-scale deep learning model and a named entity recognition technology to improve the accuracy and the intelligence of classification. The large model summarization module is responsible for summarizing rules and entities, the named entity recognition module is used for extracting key information, and the classification decision module integrates the information to make a final log classification decision. The system is excellent in facing complex and diversified log data, and has extremely high adaptability and intelligence.
As shown in fig. 2, the invention also discloses an unknown log classification decision method, which comprises the following steps:
s1: summarizing, generalizing and deducting the existing log classification rule strategies and named entities in different classifications through a large language model to generate a classification model based on the named entities.
In a specific embodiment, the step uses a large language model to summarize and generalize existing log classification rule policies, log records and named entities in different classifications to form a classification method based on named entities, and the specific process includes:
rule summarization: the large language model analyzes the existing log classification rules and historical log data, automatically learns and understands the log rules, and extracts key features and modes in the log classification rules to identify commonalities and differences of different types of logs.
Physical induction: the model analyzes named entities that appear in different classifications, such as application names, error codes, dates, etc., to determine their importance and relevance in the log classification.
The generation classification method comprises the following steps: model-based learning results, rule extraction and entity induction, a named entity-based classification method is generated, which can be used for subsequent log classification.
S2: named entity information is extracted from logs of unknown type.
In particular embodiments, this step is used to process logs of unknown type, from which named entity information is extracted, these entities being critical to the meaning and classification of the logs. The method comprises the following specific steps:
text preprocessing: text preprocessing, including word segmentation, denoising, punctuation processing, etc., is performed on logs of unknown type in preparation for entity recognition.
Entity identification: named entity recognition technology is applied to automatically recognize named entities in the log, such as person names, place names, IP addresses, file paths and the like.
And (3) entity extraction: and extracting the identified named entity information, and carrying out structural storage for later steps.
S3: and determining the category of the log entry according to the named entity information by using a classification model based on the named entity.
In particular embodiments, this step is used in the final log categorization decision to categorize the log entries into the appropriate categories. The method comprises the following specific steps:
and (3) comprehensively analyzing entities and rules: and comprehensively considering the classification method generated in the first step, the named entity information extracted in the second step and the content of the unknown log, and analyzing the aspects of named entity frequency, association relation and the like.
Classification decision: based on the results of the comprehensive analysis, a decision is made as to which category or subcategory the journal entries are to be categorized into.
Outputting a classification result: and outputting the result of the classification decision for subsequent log management, monitoring or alarm systems.
S4: and obtaining an unknown type log of which the log entry category cannot be determined, and updating a classification model based on the named entity by adjusting parameters and the entity.
In a specific embodiment, the step is used for updating the classification method for the unknown log, so that the classification method is continuously adapted to various types of log data. The method comprises the following specific steps:
and (5) feedback updating: and (3) accumulating unknown logs which cannot be subjected to decision classification according to the classification decision in the step (S3), feeding back to a large model summarizing module, and updating classification rules through approaches such as parameter adjustment, entity updating, entity relation changing and the like.
Thus, through the cooperative work of the four steps, the method realizes intelligent classification of the logs of unknown types. The first step summarizes rules and entity information, the second step extracts important named entities, the third step combines the information to make accurate classification decisions, and the fourth step realizes system feedback so that the system can adaptively update the classification model. The method has high adaptability and intelligence when facing complex and diversified log data, and can improve the accuracy and efficiency of log classification.
The invention also discloses an unknown log classification decision device, which comprises a processor and a memory; the processor performs the following steps when executing the unknown log classification decision program stored in the memory:
1. summarizing, generalizing and deducting the existing log classification rule strategies and named entities in different classifications through a large language model to generate a classification model based on the named entities.
2. Named entity information is extracted from logs of unknown type.
3. And determining the category of the log entry according to the named entity information by using a classification model based on the named entity.
4. And obtaining an unknown type log of which the log entry category cannot be determined, and updating a classification model based on the named entity by adjusting parameters and the entity.
Further, the unknown log classification decision device in this embodiment may further include:
the input interface is used for acquiring an unknown log classification decision program imported from the outside, storing the acquired unknown log classification decision program into the memory, and also can be used for acquiring various instructions and parameters transmitted by the external terminal equipment and transmitting the various instructions and parameters into the processor so that the processor can develop corresponding processing by utilizing the various instructions and parameters. In this embodiment, the input interface may specifically include, but is not limited to, a USB interface, a serial interface, a voice input interface, a fingerprint input interface, a hard disk reading interface, and the like.
And the output interface is used for outputting various data generated by the processor to the terminal equipment connected with the output interface so that other terminal equipment connected with the output interface can acquire various data generated by the processor. In this embodiment, the output interface may specifically include, but is not limited to, a USB interface, a serial interface, and the like.
The communication unit is used for establishing remote communication connection between the unknown log classification decision device and the external server so that the unknown log classification decision device can mount the image file to the external server. In this embodiment, the communication unit may specifically include, but is not limited to, a remote communication unit based on a wireless communication technology or a wired communication technology.
And the keyboard is used for acquiring various parameter data or instructions input by a user by knocking the key cap in real time.
And the display is used for running the related information of the unknown log classification decision process and displaying the related information in real time.
A mouse may be used to assist a user in inputting data and to simplify user operations.
The invention also discloses a readable storage medium, which includes Random Access Memory (RAM), memory, read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, removable disk, CD-ROM, or any other form of storage medium known in the art. The readable storage medium stores an unknown log categorization decision program which when executed by the processor performs the steps of:
1. summarizing, generalizing and deducting the existing log classification rule strategies and named entities in different classifications through a large language model to generate a classification model based on the named entities.
2. Named entity information is extracted from logs of unknown type.
3. And determining the category of the log entry according to the named entity information by using a classification model based on the named entity.
4. And obtaining an unknown type log of which the log entry category cannot be determined, and updating a classification model based on the named entity by adjusting parameters and the entity.
In conclusion, the invention utilizes a large language model and a named entity technology to effectively improve the accuracy and the intelligence of unknown log classification.
In this specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, so that the same or similar parts between the embodiments are referred to each other. For the method disclosed in the embodiment, since it corresponds to the system disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section.
Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative elements and steps are described above generally in terms of functionality in order to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
In the several embodiments provided by the present invention, it should be understood that the disclosed systems, and methods may be implemented in other ways. For example, the system embodiments described above are merely illustrative, e.g., the division of the elements is merely a logical functional division, and there may be additional divisions when actually implemented, e.g., multiple elements or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some interface, system or unit indirect coupling or communication connection, which may be in electrical, mechanical or other form.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional module in the embodiments of the present invention may be integrated in one processing unit, or each module may exist alone physically, or two or more modules may be integrated in one unit.
Similarly, each processing unit in the embodiments of the present invention may be integrated in one functional module, or each processing unit may exist physically, or two or more processing units may be integrated in one functional module.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. The software modules may be disposed in Random Access Memory (RAM), memory, read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
Finally, it is further noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The unknown log classification decision system, the method, the device and the readable storage medium provided by the invention are described in detail above. The principles and embodiments of the present invention have been described herein with reference to specific examples, the description of which is intended only to facilitate an understanding of the method of the present invention and its core ideas. It should be noted that it will be apparent to those skilled in the art that various modifications and adaptations of the invention can be made without departing from the principles of the invention and these modifications and adaptations are intended to be within the scope of the invention as defined in the following claims.
Claims (10)
1. An unknown log classification decision system, comprising: the system comprises a large model summarization module, a named entity recognition module and a classification decision module;
the large model summarizing module is configured to summarize, generalize and deduce the existing log classification rule strategies and named entities in different classifications through a large language model, generate a classification model based on the named entities, and perform new classification recommendation by combining the unknown logs with the existing classifications to update the classification model;
the named entity identification module is configured to extract named entity information from logs of unknown types;
and the classification decision module is configured to classify the log entries of the unknown type into proper categories by using a classification model based on the named entity, and generate unclassified logs to be fed back to the large model summarization module.
2. The unknown journal classification decision system of claim 1, wherein the large model summarization module comprises:
a rule extraction unit configured to extract key features and patterns from existing log classification rules and logs of corresponding rules to capture commonalities and differences of different types of logs;
an entity analysis unit configured to analyze named entities occurring in different classifications to determine their importance and relevance in the log classification;
the classification model generation unit is configured to generate a classification model based on named entities based on the extracted key features and modes and named entities appearing in different classifications;
and the classification model updating unit is configured to combine the existing classification rules based on the named entities, perform newly added classification recommendation on the unclassified logs, and update the classification model based on the named entities.
3. The unknown log classification decision system of claim 2, wherein the named entity recognition module comprises:
a text preprocessing unit configured to perform text preprocessing on logs of unknown types;
the entity identification unit is configured to apply a named entity identification technology and automatically identify named entities in the log;
and the entity extraction unit is configured to extract the identified named entity information and perform structured storage.
4. The unknown log categorization decision system of claim 3, wherein the categorization decision module comprises:
the entity and rule comprehensive analysis unit is configured to analyze the entity and rule by using a classification model based on the named entity, stored named entity information and the content of an unknown type log, determine the frequency of the named entity and the association relationship of the named entity, and generate a comprehensive analysis result;
the classification decision unit is configured to determine the category or sub-category of the log entry classification according to the comprehensive analysis result, and generate a classification decision;
the feedback recommendation unit is configured to feed back the logs of the sub-categories which cannot be classified to the large model summarization module according to the comprehensive analysis result;
and the classification result output unit is configured to output the result of the classification decision for subsequent log management, monitoring or alarm systems.
5. An unknown log classification decision method, comprising:
summarizing, generalizing and deducting the existing log classification rule strategy and named entities in different classifications through a large language model to generate a classification model based on the named entities;
extracting named entity information from logs of unknown types;
determining the category of the log entry according to the named entity information by using a classification model based on the named entity;
and obtaining an unknown type log of which the log entry category cannot be determined, and updating a classification model based on the named entity by adjusting parameters and the entity.
6. The method according to claim 5, wherein the summarizing, generalizing, and deducting existing log classification rule policies and named entities in different classifications by a large language model to generate a named entity-based classification model comprises:
analyzing the existing log classification rules and history log data by using a large language model, automatically learning and understanding the log rules, and extracting key features and modes in the log rules to identify the commonalities and differences of different types of logs;
analyzing named entities appearing in different classifications using a large language model to determine their importance and relevance in log classifications;
based on the learning result, the extracted key features and modes and the analysis result of the large language model, a classification model based on the named entity is generated.
7. The unknown log categorization decision method of claim 6, wherein the extracting named entity information from the unknown type log comprises:
performing text word segmentation, denoising and punctuation processing on the logs of unknown types;
applying a named entity recognition technology to automatically recognize named entities in the log;
and extracting the identified named entity information and carrying out structured storage.
8. The method of claim 5, wherein updating the named entity-based classification model by adjusting parameters and entities comprises:
and updating the classification model based on the named entity through parameter adjustment, entity updating and entity relation changing.
9. An unknown log classification decision device, comprising:
the memory is used for storing an unknown log classification decision program;
a processor for implementing the steps of the unknown log classification decision method of any of claims 5 to 8 when executing the unknown log classification decision program.
10. A readable storage medium, characterized by: the readable storage medium has stored thereon an unknown log classification decision program which when executed by a processor implements the steps of the unknown log classification decision method of any of claims 5 to 8.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311346153.7A CN117112791B (en) | 2023-10-18 | 2023-10-18 | Unknown log classification decision system, method and device and readable storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311346153.7A CN117112791B (en) | 2023-10-18 | 2023-10-18 | Unknown log classification decision system, method and device and readable storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN117112791A true CN117112791A (en) | 2023-11-24 |
CN117112791B CN117112791B (en) | 2024-02-20 |
Family
ID=88809352
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311346153.7A Active CN117112791B (en) | 2023-10-18 | 2023-10-18 | Unknown log classification decision system, method and device and readable storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117112791B (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170178026A1 (en) * | 2015-12-22 | 2017-06-22 | Sap Se | Log normalization in enterprise threat detection |
CN113407505A (en) * | 2021-07-01 | 2021-09-17 | 中孚安全技术有限公司 | Method and system for processing security log elements |
CN113590556A (en) * | 2021-07-30 | 2021-11-02 | 中国工商银行股份有限公司 | Database-based log processing method, device and equipment |
-
2023
- 2023-10-18 CN CN202311346153.7A patent/CN117112791B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170178026A1 (en) * | 2015-12-22 | 2017-06-22 | Sap Se | Log normalization in enterprise threat detection |
CN113407505A (en) * | 2021-07-01 | 2021-09-17 | 中孚安全技术有限公司 | Method and system for processing security log elements |
CN113590556A (en) * | 2021-07-30 | 2021-11-02 | 中国工商银行股份有限公司 | Database-based log processing method, device and equipment |
Non-Patent Citations (2)
Title |
---|
SHEKAR RAMACHANDRAN等: "Automated Log Classification Using Deep Learning", 《PROCEDIA COMPUTER SCIENCE》 * |
翟海军;郭嘉丰;王小磊;许洪波;: "基于用户查询日志的命名实体挖掘", 中文信息学报, no. 01 * |
Also Published As
Publication number | Publication date |
---|---|
CN117112791B (en) | 2024-02-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108256074B (en) | Verification processing method and device, electronic equipment and storage medium | |
US11170179B2 (en) | Systems and methods for natural language processing of structured documents | |
CN109829629B (en) | Risk analysis report generation method, apparatus, computer device and storage medium | |
US20010011259A1 (en) | Method and apparatus for interpreting information | |
CN111860981A (en) | Enterprise national industry category prediction method and system based on LSTM deep learning | |
CN111462752A (en) | Client intention identification method based on attention mechanism, feature embedding and BI-L STM | |
CN114818643A (en) | Log template extraction method for reserving specific service information | |
CN113468317A (en) | Resume screening method, system, equipment and storage medium | |
CN112487186A (en) | Human-human conversation log analysis method, system, equipment and storage medium | |
CN117112791B (en) | Unknown log classification decision system, method and device and readable storage medium | |
CN111104422A (en) | Training method, device, equipment and storage medium of data recommendation model | |
CN110807082A (en) | Quality spot check item determination method, system, electronic device and readable storage medium | |
CN115544250A (en) | Data processing method and system | |
Korzeniowski et al. | Discovering interactions between applications with log analysis | |
CN115482075A (en) | Financial data anomaly analysis method and device, electronic equipment and storage medium | |
US11822578B2 (en) | Matching machine generated data entries to pattern clusters | |
CN117501275A (en) | Method, computer program product and computer system for analyzing data consisting of a large number of individual messages | |
CN111027296A (en) | Report generation method and system based on knowledge base | |
CN112991131A (en) | Government affair data processing method suitable for electronic government affair platform | |
CN113239126A (en) | Business activity information standardization scheme based on BOR method | |
CN112380321A (en) | Primary and secondary database distribution method based on bill knowledge graph and related equipment | |
CN113095073A (en) | Corpus tag generation method and device, computer equipment and storage medium | |
CN117540004B (en) | Industrial domain intelligent question-answering method and system based on knowledge graph and user behavior | |
CN113343051B (en) | Abnormal SQL detection model construction method and detection method | |
CN114546706B (en) | Application program defect analysis method applied to deep learning and server |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |