CN117112791A - Unknown log classification decision system, method and device and readable storage medium - Google Patents

Unknown log classification decision system, method and device and readable storage medium Download PDF

Info

Publication number
CN117112791A
CN117112791A CN202311346153.7A CN202311346153A CN117112791A CN 117112791 A CN117112791 A CN 117112791A CN 202311346153 A CN202311346153 A CN 202311346153A CN 117112791 A CN117112791 A CN 117112791A
Authority
CN
China
Prior art keywords
classification
log
unknown
named
named entity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202311346153.7A
Other languages
Chinese (zh)
Other versions
CN117112791B (en
Inventor
罗圣美
路冰
卢延科
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhongfu Safety Technology Co Ltd
Original Assignee
Zhongfu Safety Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhongfu Safety Technology Co Ltd filed Critical Zhongfu Safety Technology Co Ltd
Priority to CN202311346153.7A priority Critical patent/CN117112791B/en
Publication of CN117112791A publication Critical patent/CN117112791A/en
Application granted granted Critical
Publication of CN117112791B publication Critical patent/CN117112791B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/353Clustering; Classification into predefined classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Databases & Information Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides an unknown log classification decision system, an unknown log classification decision method, an unknown log classification decision device and a readable storage medium, and belongs to the technical field of log classification. The system comprises: the large model summarizing module is configured to summarize, generalize and deduce the existing log classification rule strategies and named entities in different classifications through a large language model, generate a classification model based on the named entities, and perform new classification recommendation by combining the unknown logs with the existing classifications to update the classification model; the named entity identification module is configured to extract named entity information from logs of unknown types; and the classification decision module is configured to classify the log entries of the unknown type into proper categories by using a classification model based on the named entity, and generate unclassified logs to be fed back to the large model summarization module. The invention utilizes a large language model and a named entity technology, and effectively improves the accuracy and the intelligence of unknown log classification.

Description

Unknown log classification decision system, method and device and readable storage medium
Technical Field
The invention relates to the technical field of log classification, in particular to an unknown log classification decision system, an unknown log classification decision method, an unknown log classification decision device and a readable storage medium.
Background
In the big data age, the handling and sorting of log files is becoming more and more important. Conventional log classification methods typically rely on manually written rules or conventional machine learning classification algorithms that perform poorly when dealing with large numbers of logs of complex and unknown types.
Among them, the rule-based log classification method is one of conventional log classification methods, which relies on a rule set defined in advance to identify and classify logs, and is limited to defined rules, cannot adapt to logs of unknown types or complex log structures, and when new log types occur, the conventional method requires re-writing rules or updating algorithms, which may cause processing delay and inaccuracy. Learning a log classification model from training data based on a scheme of a machine learning classification algorithm, such as a decision tree, a support vector machine, or a neural network, automatically learning log classification rules by using labeled training data, which generally requires a large amount of labeled data to train the model, and requires periodic updating to accommodate new log types; moreover, due to the fixed classification decision adopted, the method has no flexibility and adaptability, and cannot cope with the continuously evolving log data.
Furthermore, these methods often employ regular expression matching schemes in handling special named entity recognition, which cannot accurately and efficiently process logs containing named entities that are critical to understanding the meaning of the log.
Therefore, the traditional log classification scheme has the defects of rule dependence, poor adaptability, incapability of coping with unknown types and the like.
Disclosure of Invention
Aiming at the problems, the invention aims to provide an unknown log classification decision system, an unknown log classification decision method, an unknown log classification decision device and a unknown log classification decision-making medium.
The invention aims to achieve the aim, and the aim is achieved by the following technical scheme: an unknown log classification decision system comprising: the system comprises a large model summarization module, a named entity recognition module and a classification decision module;
the large model summarizing module is configured to summarize, generalize and deduce the existing log classification rule strategies and named entities in different classifications through a large language model, generate a classification model based on the named entities, and perform new classification recommendation by combining the unknown logs with the existing classifications to update the classification model;
the named entity identification module is configured to extract named entity information from logs of unknown types;
and the classification decision module is configured to classify the log entries of the unknown type into proper categories by using a classification model based on the named entity, and generate unclassified logs to be fed back to the large model summarization module.
Further, the large model summarization module includes:
a rule extraction unit configured to extract key features and patterns from existing log classification rules and logs of corresponding rules to capture commonalities and differences of different types of logs;
an entity analysis unit configured to analyze named entities occurring in different classifications to determine their importance and relevance in the log classification;
the classification model generation unit is configured to generate a classification model based on named entities based on the extracted key features and modes and named entities appearing in different classifications;
and the classification model updating unit is configured to combine the existing classification rules based on the named entities, perform newly added classification recommendation on the unclassified logs, and update the classification model based on the named entities.
Further, the named entity recognition module includes:
a text preprocessing unit configured to perform text preprocessing on logs of unknown types;
the entity identification unit is configured to apply a named entity identification technology and automatically identify named entities in the log;
and the entity extraction unit is configured to extract the identified named entity information and perform structured storage.
Further, the classification decision module comprises:
the entity and rule comprehensive analysis unit is configured to analyze the entity and rule by using a classification model based on the named entity, stored named entity information and the content of an unknown type log, determine the frequency of the named entity and the association relationship of the named entity, and generate a comprehensive analysis result;
the classification decision unit is configured to determine the category or sub-category of the log entry classification according to the comprehensive analysis result, and generate a classification decision;
the feedback recommendation unit is configured to feed back the logs of the sub-categories which cannot be classified to the large model summarization module according to the comprehensive analysis result;
and the classification result output unit is configured to output the result of the classification decision for subsequent log management, monitoring or alarm systems.
Correspondingly, the invention also discloses an unknown log classification decision method, which comprises the following steps:
summarizing, generalizing and deducting the existing log classification rule strategy and named entities in different classifications through a large language model to generate a classification model based on the named entities;
extracting named entity information from logs of unknown types;
determining the category of the log entry according to the named entity information by using a classification model based on the named entity;
and obtaining an unknown type log of which the log entry category cannot be determined, and updating a classification model based on the named entity by adjusting parameters and the entity.
Further, the step of summarizing, generalizing and deducting the existing log classification rule strategy and named entities in different classifications through the large language model to generate a classification model based on the named entities comprises the following steps:
analyzing the existing log classification rules and history log data by using a large language model, automatically learning and understanding the log rules, and extracting key features and modes in the log rules to identify the commonalities and differences of different types of logs;
analyzing named entities appearing in different classifications using a large language model to determine their importance and relevance in log classifications;
based on the learning result, the extracted key features and modes and the analysis result of the large language model, a classification model based on the named entity is generated.
Further, the extracting named entity information from the log of unknown type includes:
performing text word segmentation, denoising and punctuation processing on the logs of unknown types;
applying a named entity recognition technology to automatically recognize named entities in the log;
and extracting the identified named entity information and carrying out structured storage.
Further, the updating the classification model based on the named entity by adjusting parameters and entities comprises the following steps:
and updating the classification model based on the named entity through parameter adjustment, entity updating and entity relation changing.
Correspondingly, the invention discloses an unknown log classification decision device, which comprises:
the memory is used for storing an unknown log classification decision program;
a processor for implementing the steps of the unknown log classification decision method as described in any of the preceding claims when executing the unknown log classification decision program.
Correspondingly, the invention discloses a readable storage medium, wherein the readable storage medium is stored with an unknown log classification decision program, and the unknown log classification decision program realizes the steps of the unknown log classification decision method according to any one of the above when being executed by a processor.
Compared with the prior art, the invention has the beneficial effects that:
1. according to the invention, the log rule and the named entity information are summarized by using the deep learning model, so that logs of unknown types can be more accurately classified, and misclassification is reduced.
2. The invention utilizes a large-scale deep learning model and a named entity recognition technology to realize intelligent classification decision, and can automatically adapt to new log types and changes.
3. Compared with the traditional method based on rules or heuristic algorithms, the method reduces the dependence on manual rule writing and characteristic engineering, and reduces the workload of operators and managers.
4. The invention is excellent in facing complex, diversified and unknown log data, does not need to update rules or models frequently, and has higher adaptability.
5. The method is excellent in facing complex, diversified and unknown log data, does not need to update rules or models frequently, has higher adaptability, has wide application prospects especially in the fields of network safety, system monitoring, fault diagnosis and the like, and can remarkably improve the value and the utilization degree of the log data.
It can be seen that the present invention has outstanding substantial features and significant advances over the prior art, as well as the benefits of its implementation.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present invention, and that other drawings can be obtained according to the provided drawings without inventive effort for a person skilled in the art.
FIG. 1 is a diagram of the results of a system in accordance with an embodiment of the present invention.
Fig. 2 is a flow chart of a method of an embodiment of the present invention.
In the figure, 1, a large model summarizing module; 2. a named entity recognition module; 3. a classification decision module; 11. a rule extraction unit; 12. an entity analysis unit; 13. a classification model generation unit; 14. a classification model updating unit; 21. a text preprocessing unit; 22. an entity identification unit; 23. an entity extraction unit; 31. an entity and rule comprehensive analysis unit; 32. a classification decision unit; 33. a feedback recommendation unit; 34. and a classification result output unit.
Detailed Description
In order to better understand the aspects of the present invention, the present invention will be described in further detail with reference to the accompanying drawings and detailed description. It will be apparent that the described embodiments are only some, but not all, embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
As shown in fig. 1, the present invention provides an unknown log classification decision system, comprising: a large model summarization module 1, a named entity recognition module 2 and a classification decision module 3.
The large model summarization module 1 is configured to summarize, generalize and deduct existing log classification rule strategies and named entities in different classifications through a large language model, generate a classification model based on the named entities, and perform new classification recommendation by combining an unknown log with the existing classifications to update the classification model.
The named entity recognition module 2 is configured to extract named entity information from logs of unknown type.
A classification decision module 3 configured to classify log entries of unknown type into appropriate categories using a named entity based classification model and generate unclassified logs for feedback to the large model summarization module 1.
In a specific embodiment, as the core of the present system, the large model summarization module 1 specifically includes: a rule extraction unit 11, an entity analysis unit 12, a classification model generation unit 13, and a classification model update unit 14.
The rule extraction unit 11 is configured to extract key features and patterns from the existing log classification rules and logs of the corresponding rules to capture commonalities and differences of the different category logs.
An entity analysis unit 12 configured to analyze named entities that appear in different classifications to determine their importance and relevance in the log classification. Among the named entities include, but are not limited to: application name, error code, date, etc.
The classification model generation unit 13 is configured to generate a classification model based on named entities based on the extracted key features and patterns, named entities occurring in different classifications. In the actual operation process, the classification model generating unit 13 generates a classification method based on named entities based on rule extraction and entity analysis, so that the subsequent classification decision module can use the classification method.
The classification model updating unit 14 is configured to combine existing classification rules based on named entities, and perform newly added classification recommendation on the unclassified logs, and update the classification model based on named entities.
In a specific embodiment, the named entity recognition module 2 includes: a text preprocessing unit 21, an entity recognition unit 22, and an entity extraction unit 23.
The text preprocessing unit 21 is configured to perform text preprocessing on logs of unknown types, including word segmentation, denoising, punctuation processing, and the like.
The entity recognition unit 22 is configured to apply a named entity recognition technique to automatically recognize named entities in the log, such as a person name, a place name, an IP address, a file path, and the like.
The entity extraction unit 23 is configured to extract the identified named entity information, and perform structural storage for use by a subsequent classification decision module.
In a specific embodiment, the classification decision module 3 comprises: the entity and rule comprehensive analysis unit 31, the classification decision unit 32, the feedback recommendation unit 33 and the classification result output unit 34.
And the entity and rule comprehensive analysis unit 31 is configured to analyze the entity and rule by using the classification model based on the named entity, the stored named entity information and the content of the unknown type log, determine the frequency of the named entity and the association relationship of the named entity, and generate a comprehensive analysis result.
A classification decision unit 32 configured to decide into which category or subcategory the log entries are classified based on the result of the comprehensive analysis.
And a feedback recommending unit 33 configured to feed back the logs of the sub-categories that cannot be classified to the large model summarizing module according to the result of the comprehensive analysis.
The classification result output unit 34 is configured to output the result of the classification decision for use by a subsequent log management, monitoring or alarm system.
Therefore, through the cooperative work of the core modules, the system realizes the intelligent classification of logs of unknown types, and fully utilizes a large-scale deep learning model and a named entity recognition technology to improve the accuracy and the intelligence of classification. The large model summarization module is responsible for summarizing rules and entities, the named entity recognition module is used for extracting key information, and the classification decision module integrates the information to make a final log classification decision. The system is excellent in facing complex and diversified log data, and has extremely high adaptability and intelligence.
As shown in fig. 2, the invention also discloses an unknown log classification decision method, which comprises the following steps:
s1: summarizing, generalizing and deducting the existing log classification rule strategies and named entities in different classifications through a large language model to generate a classification model based on the named entities.
In a specific embodiment, the step uses a large language model to summarize and generalize existing log classification rule policies, log records and named entities in different classifications to form a classification method based on named entities, and the specific process includes:
rule summarization: the large language model analyzes the existing log classification rules and historical log data, automatically learns and understands the log rules, and extracts key features and modes in the log classification rules to identify commonalities and differences of different types of logs.
Physical induction: the model analyzes named entities that appear in different classifications, such as application names, error codes, dates, etc., to determine their importance and relevance in the log classification.
The generation classification method comprises the following steps: model-based learning results, rule extraction and entity induction, a named entity-based classification method is generated, which can be used for subsequent log classification.
S2: named entity information is extracted from logs of unknown type.
In particular embodiments, this step is used to process logs of unknown type, from which named entity information is extracted, these entities being critical to the meaning and classification of the logs. The method comprises the following specific steps:
text preprocessing: text preprocessing, including word segmentation, denoising, punctuation processing, etc., is performed on logs of unknown type in preparation for entity recognition.
Entity identification: named entity recognition technology is applied to automatically recognize named entities in the log, such as person names, place names, IP addresses, file paths and the like.
And (3) entity extraction: and extracting the identified named entity information, and carrying out structural storage for later steps.
S3: and determining the category of the log entry according to the named entity information by using a classification model based on the named entity.
In particular embodiments, this step is used in the final log categorization decision to categorize the log entries into the appropriate categories. The method comprises the following specific steps:
and (3) comprehensively analyzing entities and rules: and comprehensively considering the classification method generated in the first step, the named entity information extracted in the second step and the content of the unknown log, and analyzing the aspects of named entity frequency, association relation and the like.
Classification decision: based on the results of the comprehensive analysis, a decision is made as to which category or subcategory the journal entries are to be categorized into.
Outputting a classification result: and outputting the result of the classification decision for subsequent log management, monitoring or alarm systems.
S4: and obtaining an unknown type log of which the log entry category cannot be determined, and updating a classification model based on the named entity by adjusting parameters and the entity.
In a specific embodiment, the step is used for updating the classification method for the unknown log, so that the classification method is continuously adapted to various types of log data. The method comprises the following specific steps:
and (5) feedback updating: and (3) accumulating unknown logs which cannot be subjected to decision classification according to the classification decision in the step (S3), feeding back to a large model summarizing module, and updating classification rules through approaches such as parameter adjustment, entity updating, entity relation changing and the like.
Thus, through the cooperative work of the four steps, the method realizes intelligent classification of the logs of unknown types. The first step summarizes rules and entity information, the second step extracts important named entities, the third step combines the information to make accurate classification decisions, and the fourth step realizes system feedback so that the system can adaptively update the classification model. The method has high adaptability and intelligence when facing complex and diversified log data, and can improve the accuracy and efficiency of log classification.
The invention also discloses an unknown log classification decision device, which comprises a processor and a memory; the processor performs the following steps when executing the unknown log classification decision program stored in the memory:
1. summarizing, generalizing and deducting the existing log classification rule strategies and named entities in different classifications through a large language model to generate a classification model based on the named entities.
2. Named entity information is extracted from logs of unknown type.
3. And determining the category of the log entry according to the named entity information by using a classification model based on the named entity.
4. And obtaining an unknown type log of which the log entry category cannot be determined, and updating a classification model based on the named entity by adjusting parameters and the entity.
Further, the unknown log classification decision device in this embodiment may further include:
the input interface is used for acquiring an unknown log classification decision program imported from the outside, storing the acquired unknown log classification decision program into the memory, and also can be used for acquiring various instructions and parameters transmitted by the external terminal equipment and transmitting the various instructions and parameters into the processor so that the processor can develop corresponding processing by utilizing the various instructions and parameters. In this embodiment, the input interface may specifically include, but is not limited to, a USB interface, a serial interface, a voice input interface, a fingerprint input interface, a hard disk reading interface, and the like.
And the output interface is used for outputting various data generated by the processor to the terminal equipment connected with the output interface so that other terminal equipment connected with the output interface can acquire various data generated by the processor. In this embodiment, the output interface may specifically include, but is not limited to, a USB interface, a serial interface, and the like.
The communication unit is used for establishing remote communication connection between the unknown log classification decision device and the external server so that the unknown log classification decision device can mount the image file to the external server. In this embodiment, the communication unit may specifically include, but is not limited to, a remote communication unit based on a wireless communication technology or a wired communication technology.
And the keyboard is used for acquiring various parameter data or instructions input by a user by knocking the key cap in real time.
And the display is used for running the related information of the unknown log classification decision process and displaying the related information in real time.
A mouse may be used to assist a user in inputting data and to simplify user operations.
The invention also discloses a readable storage medium, which includes Random Access Memory (RAM), memory, read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, removable disk, CD-ROM, or any other form of storage medium known in the art. The readable storage medium stores an unknown log categorization decision program which when executed by the processor performs the steps of:
1. summarizing, generalizing and deducting the existing log classification rule strategies and named entities in different classifications through a large language model to generate a classification model based on the named entities.
2. Named entity information is extracted from logs of unknown type.
3. And determining the category of the log entry according to the named entity information by using a classification model based on the named entity.
4. And obtaining an unknown type log of which the log entry category cannot be determined, and updating a classification model based on the named entity by adjusting parameters and the entity.
In conclusion, the invention utilizes a large language model and a named entity technology to effectively improve the accuracy and the intelligence of unknown log classification.
In this specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, so that the same or similar parts between the embodiments are referred to each other. For the method disclosed in the embodiment, since it corresponds to the system disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section.
Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative elements and steps are described above generally in terms of functionality in order to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
In the several embodiments provided by the present invention, it should be understood that the disclosed systems, and methods may be implemented in other ways. For example, the system embodiments described above are merely illustrative, e.g., the division of the elements is merely a logical functional division, and there may be additional divisions when actually implemented, e.g., multiple elements or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some interface, system or unit indirect coupling or communication connection, which may be in electrical, mechanical or other form.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional module in the embodiments of the present invention may be integrated in one processing unit, or each module may exist alone physically, or two or more modules may be integrated in one unit.
Similarly, each processing unit in the embodiments of the present invention may be integrated in one functional module, or each processing unit may exist physically, or two or more processing units may be integrated in one functional module.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. The software modules may be disposed in Random Access Memory (RAM), memory, read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
Finally, it is further noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The unknown log classification decision system, the method, the device and the readable storage medium provided by the invention are described in detail above. The principles and embodiments of the present invention have been described herein with reference to specific examples, the description of which is intended only to facilitate an understanding of the method of the present invention and its core ideas. It should be noted that it will be apparent to those skilled in the art that various modifications and adaptations of the invention can be made without departing from the principles of the invention and these modifications and adaptations are intended to be within the scope of the invention as defined in the following claims.

Claims (10)

1. An unknown log classification decision system, comprising: the system comprises a large model summarization module, a named entity recognition module and a classification decision module;
the large model summarizing module is configured to summarize, generalize and deduce the existing log classification rule strategies and named entities in different classifications through a large language model, generate a classification model based on the named entities, and perform new classification recommendation by combining the unknown logs with the existing classifications to update the classification model;
the named entity identification module is configured to extract named entity information from logs of unknown types;
and the classification decision module is configured to classify the log entries of the unknown type into proper categories by using a classification model based on the named entity, and generate unclassified logs to be fed back to the large model summarization module.
2. The unknown journal classification decision system of claim 1, wherein the large model summarization module comprises:
a rule extraction unit configured to extract key features and patterns from existing log classification rules and logs of corresponding rules to capture commonalities and differences of different types of logs;
an entity analysis unit configured to analyze named entities occurring in different classifications to determine their importance and relevance in the log classification;
the classification model generation unit is configured to generate a classification model based on named entities based on the extracted key features and modes and named entities appearing in different classifications;
and the classification model updating unit is configured to combine the existing classification rules based on the named entities, perform newly added classification recommendation on the unclassified logs, and update the classification model based on the named entities.
3. The unknown log classification decision system of claim 2, wherein the named entity recognition module comprises:
a text preprocessing unit configured to perform text preprocessing on logs of unknown types;
the entity identification unit is configured to apply a named entity identification technology and automatically identify named entities in the log;
and the entity extraction unit is configured to extract the identified named entity information and perform structured storage.
4. The unknown log categorization decision system of claim 3, wherein the categorization decision module comprises:
the entity and rule comprehensive analysis unit is configured to analyze the entity and rule by using a classification model based on the named entity, stored named entity information and the content of an unknown type log, determine the frequency of the named entity and the association relationship of the named entity, and generate a comprehensive analysis result;
the classification decision unit is configured to determine the category or sub-category of the log entry classification according to the comprehensive analysis result, and generate a classification decision;
the feedback recommendation unit is configured to feed back the logs of the sub-categories which cannot be classified to the large model summarization module according to the comprehensive analysis result;
and the classification result output unit is configured to output the result of the classification decision for subsequent log management, monitoring or alarm systems.
5. An unknown log classification decision method, comprising:
summarizing, generalizing and deducting the existing log classification rule strategy and named entities in different classifications through a large language model to generate a classification model based on the named entities;
extracting named entity information from logs of unknown types;
determining the category of the log entry according to the named entity information by using a classification model based on the named entity;
and obtaining an unknown type log of which the log entry category cannot be determined, and updating a classification model based on the named entity by adjusting parameters and the entity.
6. The method according to claim 5, wherein the summarizing, generalizing, and deducting existing log classification rule policies and named entities in different classifications by a large language model to generate a named entity-based classification model comprises:
analyzing the existing log classification rules and history log data by using a large language model, automatically learning and understanding the log rules, and extracting key features and modes in the log rules to identify the commonalities and differences of different types of logs;
analyzing named entities appearing in different classifications using a large language model to determine their importance and relevance in log classifications;
based on the learning result, the extracted key features and modes and the analysis result of the large language model, a classification model based on the named entity is generated.
7. The unknown log categorization decision method of claim 6, wherein the extracting named entity information from the unknown type log comprises:
performing text word segmentation, denoising and punctuation processing on the logs of unknown types;
applying a named entity recognition technology to automatically recognize named entities in the log;
and extracting the identified named entity information and carrying out structured storage.
8. The method of claim 5, wherein updating the named entity-based classification model by adjusting parameters and entities comprises:
and updating the classification model based on the named entity through parameter adjustment, entity updating and entity relation changing.
9. An unknown log classification decision device, comprising:
the memory is used for storing an unknown log classification decision program;
a processor for implementing the steps of the unknown log classification decision method of any of claims 5 to 8 when executing the unknown log classification decision program.
10. A readable storage medium, characterized by: the readable storage medium has stored thereon an unknown log classification decision program which when executed by a processor implements the steps of the unknown log classification decision method of any of claims 5 to 8.
CN202311346153.7A 2023-10-18 2023-10-18 Unknown log classification decision system, method and device and readable storage medium Active CN117112791B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311346153.7A CN117112791B (en) 2023-10-18 2023-10-18 Unknown log classification decision system, method and device and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311346153.7A CN117112791B (en) 2023-10-18 2023-10-18 Unknown log classification decision system, method and device and readable storage medium

Publications (2)

Publication Number Publication Date
CN117112791A true CN117112791A (en) 2023-11-24
CN117112791B CN117112791B (en) 2024-02-20

Family

ID=88809352

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311346153.7A Active CN117112791B (en) 2023-10-18 2023-10-18 Unknown log classification decision system, method and device and readable storage medium

Country Status (1)

Country Link
CN (1) CN117112791B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170178026A1 (en) * 2015-12-22 2017-06-22 Sap Se Log normalization in enterprise threat detection
CN113407505A (en) * 2021-07-01 2021-09-17 中孚安全技术有限公司 Method and system for processing security log elements
CN113590556A (en) * 2021-07-30 2021-11-02 中国工商银行股份有限公司 Database-based log processing method, device and equipment

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170178026A1 (en) * 2015-12-22 2017-06-22 Sap Se Log normalization in enterprise threat detection
CN113407505A (en) * 2021-07-01 2021-09-17 中孚安全技术有限公司 Method and system for processing security log elements
CN113590556A (en) * 2021-07-30 2021-11-02 中国工商银行股份有限公司 Database-based log processing method, device and equipment

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
SHEKAR RAMACHANDRAN等: "Automated Log Classification Using Deep Learning", 《PROCEDIA COMPUTER SCIENCE》 *
翟海军;郭嘉丰;王小磊;许洪波;: "基于用户查询日志的命名实体挖掘", 中文信息学报, no. 01 *

Also Published As

Publication number Publication date
CN117112791B (en) 2024-02-20

Similar Documents

Publication Publication Date Title
CN108256074B (en) Verification processing method and device, electronic equipment and storage medium
US11170179B2 (en) Systems and methods for natural language processing of structured documents
CN109829629B (en) Risk analysis report generation method, apparatus, computer device and storage medium
US20010011259A1 (en) Method and apparatus for interpreting information
CN111860981A (en) Enterprise national industry category prediction method and system based on LSTM deep learning
CN111462752A (en) Client intention identification method based on attention mechanism, feature embedding and BI-L STM
CN114818643A (en) Log template extraction method for reserving specific service information
CN113468317A (en) Resume screening method, system, equipment and storage medium
CN112487186A (en) Human-human conversation log analysis method, system, equipment and storage medium
CN117112791B (en) Unknown log classification decision system, method and device and readable storage medium
CN111104422A (en) Training method, device, equipment and storage medium of data recommendation model
CN110807082A (en) Quality spot check item determination method, system, electronic device and readable storage medium
CN115544250A (en) Data processing method and system
Korzeniowski et al. Discovering interactions between applications with log analysis
CN115482075A (en) Financial data anomaly analysis method and device, electronic equipment and storage medium
US11822578B2 (en) Matching machine generated data entries to pattern clusters
CN117501275A (en) Method, computer program product and computer system for analyzing data consisting of a large number of individual messages
CN111027296A (en) Report generation method and system based on knowledge base
CN112991131A (en) Government affair data processing method suitable for electronic government affair platform
CN113239126A (en) Business activity information standardization scheme based on BOR method
CN112380321A (en) Primary and secondary database distribution method based on bill knowledge graph and related equipment
CN113095073A (en) Corpus tag generation method and device, computer equipment and storage medium
CN117540004B (en) Industrial domain intelligent question-answering method and system based on knowledge graph and user behavior
CN113343051B (en) Abnormal SQL detection model construction method and detection method
CN114546706B (en) Application program defect analysis method applied to deep learning and server

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant