CN116578534A - Log message data format identification method and system - Google Patents

Log message data format identification method and system Download PDF

Info

Publication number
CN116578534A
CN116578534A CN202310385045.4A CN202310385045A CN116578534A CN 116578534 A CN116578534 A CN 116578534A CN 202310385045 A CN202310385045 A CN 202310385045A CN 116578534 A CN116578534 A CN 116578534A
Authority
CN
China
Prior art keywords
log message
analysis rule
rule
log
message
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310385045.4A
Other languages
Chinese (zh)
Other versions
CN116578534B (en
Inventor
韩硕
戚红建
王宇飞
徐蕾
秦绪帅
朱梦迪
袁阳
潘中英
李亚楠
师凤瑞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Bidding Branch Of China Huaneng Group Co ltd
Huaneng Information Technology Co Ltd
Original Assignee
Beijing Bidding Branch Of China Huaneng Group Co ltd
Huaneng Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Bidding Branch Of China Huaneng Group Co ltd, Huaneng Information Technology Co Ltd filed Critical Beijing Bidding Branch Of China Huaneng Group Co ltd
Priority to CN202310385045.4A priority Critical patent/CN116578534B/en
Publication of CN116578534A publication Critical patent/CN116578534A/en
Application granted granted Critical
Publication of CN116578534B publication Critical patent/CN116578534B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/16File or folder operations, e.g. details of user interfaces specifically adapted to file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/1805Append-only file systems, e.g. using logs or journals to store data
    • G06F16/1815Journaling file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/18Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation
    • G06N5/022Knowledge engineering; Knowledge acquisition
    • G06N5/025Extracting rules from data
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Optimization (AREA)
  • Mathematical Analysis (AREA)
  • Computational Mathematics (AREA)
  • Computing Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Human Computer Interaction (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Operations Research (AREA)
  • Probability & Statistics with Applications (AREA)
  • Artificial Intelligence (AREA)
  • Algebra (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The application discloses a method and a system for identifying a data format of a log message, which relate to the technical field of log message format identification, and disclose 5 steps, wherein, step 1 performs statistical analysis on a log message analysis rule, matches first identification information based on characteristics of the log message format which can be analyzed, determines a mapping relation of the first identification information relative to an analysis rule call library in step 2, generates first call information based on characteristics of a past log message in a receiving process, combines with log message requirements for joint analysis to obtain an analysis rule judgment model, step 4 generates an initial analysis rule strategy based on the analysis rule judgment model, performs requirement satisfaction capability assessment on the first identification information corresponding to the log message analysis rule in the initial analysis rule strategy, performs resource allocation on the log analysis rule which runs simultaneously according to an assessment result, further avoids analysis congestion of the log message, and improves the efficiency of the system on log message analysis.

Description

Log message data format identification method and system
Technical Field
The application relates to the technical field of log message format recognition, in particular to a method and a system for recognizing a log message data format.
Background
In order to improve the working efficiency, the existing various production units all use various auxiliary working systems, wherein the completion of some working tasks requires the coordinated operation of multiple devices, in order to realize the coordinated operation of the multiple devices, the devices need to mutually send and receive various log messages so as to know the operation conditions among different ends, but because different devices apply different formats of the log messages, the different device ends can store various log message analysis rules in advance so as to realize the reading and application of the log messages of other devices, the conventional log message reading method is to firstly determine the characteristics of the log messages and then analyze the characteristics by calling the corresponding log message analysis rules, but as the volume of the log messages sent and received between the devices is increased, the identification of the characteristics of the log messages and the calling of the log message analysis rules consume excessive system resources so as to cause the congestion of the log messages, and in order to avoid the problem, the identification method of the log message format needs to be improved.
Disclosure of Invention
The application aims to provide a method for identifying a data format of a log message, which comprises the following steps:
step 1, carrying out statistical analysis on log message analysis rules of all devices, and matching first identification information based on characteristics of log messages which can be analyzed by the log message analysis rules;
step 2, based on the log message parsing rules of all the devices, a parsing rule calling library is established, and based on the first identification information, the mapping relation between the first identification information and the parsing rule calling library is determined;
step 3, generating first call information based on the time characteristics and source equipment of the past log message reception, and carrying out joint analysis by combining with the log message demand to obtain an analysis rule judgment model for determining an initial analysis rule strategy;
step 4, determining an initial analysis rule strategy based on the analysis rule judgment model, evaluating first identification information corresponding to the analysis rule of the log message in the initial analysis rule strategy according to the requirement meeting capability, and carrying out resource allocation on the log analysis rule running simultaneously according to the evaluation result to obtain a final analysis rule strategy;
and step 5, based on the final analysis rule policy, determining the calling of the log message analysis rule at different time points, the time length for calling the log message analysis rule and the resource allocation of each log message analysis rule.
The technical scheme has the advantages that the statistical analysis of the analysis rules of the log message is realized through the step 1, the mapping relation of the first identification information relative to the analysis rule call library is determined in the step 2 based on the characteristics of the analysis rules of the log message format which can be analyzed, the first call information is generated based on the characteristics of the past log message in the receiving process, the joint analysis is carried out by combining the requirements of the log message, the analysis rule judgment model is obtained, the initial analysis rule strategy is generated based on the analysis rule judgment model, the requirement meeting capability assessment is carried out on the first identification information corresponding to the analysis rules of the log message in the initial analysis rule strategy, and the resource allocation is carried out on the log analysis rules which are operated simultaneously according to the assessment result, so that the analysis congestion of the log message is avoided, and the efficiency of the system on the analysis of the log message is improved.
In some embodiments of the present application, in order to distinguish the log message, the method is perfected as follows, and the feature matching of the log message which can be parsed based on the log message parsing rule includes:
based on a message header of a log message, matching a corresponding log message parsing rule with a first identification element;
matching the corresponding log message analysis rule with a second identification element based on the priority of the log message;
and matching the corresponding log message analysis rule with a third identification element based on the interval where the character number of the message content of the log message is.
In some embodiments of the present application, to determine a corresponding log message parsing rule in the parsing rule call library, determining a mapping relationship between the first identification information and the parsing rule call library includes:
based on the log message analysis rule, relative to the storage position in the analysis rule call library, a storage position array { a1, a2, a3, … an } is established, wherein a1 is a first storage position, a2 is a second storage position, a3 is a third storage position, and an is an nth storage position;
based on the matching relation between the first identification information and the log message analysis rule, a first identification information-storage position mapping set { a1-b1, a2-b2, a3-b3, …, an-bn } is established, wherein a1-b1 is a mapping relation group of the first identification information corresponding to the message analysis rule at the first storage position, a2-b2 is a mapping relation group of the first identification information corresponding to the message analysis rule at the second storage position, a3-b3 is a mapping relation group of the first identification information corresponding to the message analysis rule at the third storage position and the third storage position, and an-bn is a mapping relation group of the first identification information corresponding to the message analysis rule at the nth storage position and the nth storage position.
In some embodiments of the present application, in order to obtain an initial parsing rule policy, a method for generating a parsing rule judgment model is disclosed, to obtain a parsing rule judgment model for determining the initial parsing rule policy, including:
monitoring a receiving time point, a receiving time length and source equipment of a log message, and generating a first log message receiving record line, wherein the first log message receiving record line gradually evolves along with time;
setting a first label at a corresponding time point of the first log message receiving record line based on the receiving time point of the log message;
configuring corresponding receiving time length and source equipment of the log message for the first label;
aiming at the requirement of the log message, a second label is marked on the first log message receiving record line, and the second label is configured with a device identifier for sending the log message and a required log message identifier;
monitoring an application time point and an application time length of log message analysis rule application, and generating a first log message analysis rule application record line, wherein the first log message analysis rule application record line gradually evolves along with time;
for the log message analysis rule, marking a third label on the application record line of the first log message analysis rule, wherein the third label is configured with the application time length of the log message analysis rule;
aligning the first log message receiving record line and the first log message analyzing rule application record line, and determining to obtain an analyzing rule judging model according to the relation characteristic of the first log message receiving record line and the first log message analyzing rule application record line.
In some embodiments of the present application, a method for obtaining an analysis rule judgment model is further disclosed, and determining, according to a relationship feature between the first log message receiving record line and the first log message analysis rule application record line, to obtain the analysis rule judgment model includes:
performing feature parameterization on the existence positions and matching contents of the first label and the second label of the first log message receiving record line to obtain a first input parameter set;
performing characteristic parameterization on the existence position of the third label of the first log message analysis rule application record line and the matched content to obtain a first output parameter set;
and training the first input parameter set as input parameters and the first output parameter set as output parameters by utilizing a neural network learning algorithm to obtain the analysis rule judgment model.
In some embodiments of the present application, it is mentioned that the evaluating the first identification information of the log message parsing rule in the initial parsing rule policy according to the requirement satisfaction capability includes:
determining the message characteristics of the log messages analyzed by different log message analysis rules based on the first identification information;
configuring a first requirement weight factor for the corresponding log message based on different log message requirements;
configuring a first message length factor for the corresponding log message based on the message characteristics of the log message;
determining a satisfaction capability value of a log message parsing rule corresponding to a single log message based on the first demand weight factor and the first message length factor;
the expression for determining the satisfaction capability value of the log message parsing rule is as follows:
wherein ,satisfaction capability value for the i-th log message parsing rule, < >>The nth first demand weight factor of the log message parsed by the ith log message parsing rule is +.>Adjusting coefficient of first message length factor of log message analyzed by ith log message analysis rule, +.>And the first message length factor of the log message analyzed by the i-th log message analysis rule.
In some embodiments of the present application, in order to obtain a final parsing rule policy, performing resource allocation on log parsing rules running simultaneously to obtain a final parsing rule policy, including:
and according to the satisfaction capability value of the log message analysis rule, performing resource allocation on the log message analysis rule, and associating a resource allocation result with the log analysis rule applied in the initial analysis rule to obtain a final analysis rule strategy.
The application also discloses a log message data format recognition system, which comprises:
the analysis rule calling module is used for storing the analysis rules of the log messages of all the devices and constructing the analysis rule calling library of the log messages of all the devices, wherein the log message Wen Pi which can be analyzed by the analysis rule of the log messages is provided with first identification information, and a mapping relation is established between the first identification information and the analysis rules of the log messages in the rule calling library;
the analysis rule judgment model generation module is used for analyzing the time characteristics and source equipment received by the past log message to generate first call information, carrying out joint analysis on the first call information and the log message requirement, and training to generate an analysis rule judgment model;
the initial analysis rule strategy generation module is used for operating the analysis rule judgment model and inputting a current time point and a current log message requirement to the analysis rule judgment model so that the analysis rule judgment model generates an initial analysis rule strategy;
and the final analysis rule strategy generation module is used for evaluating the first identification information corresponding to the log message analysis rule in the initial analysis rule strategy according to the requirement satisfaction capability, and carrying out resource allocation on the log analysis rule running simultaneously according to the evaluation result to obtain the final analysis rule strategy. The technical scheme of the application is further described in detail through the drawings and the embodiments.
Drawings
Fig. 1 is a method step diagram of a log message data format recognition method in an embodiment of the application.
Detailed Description
The technical scheme of the application is further described below through the attached drawings and the embodiments.
The technical solution of the present application will be clearly and completely described below with reference to the accompanying drawings and specific embodiments, it being understood that the preferred embodiments described herein are for illustrating and explaining the present application only and are not to be construed as limiting the scope of the present application, and that some insubstantial modifications and adaptations can be made by those skilled in the art in light of the following disclosure. In the present application, unless explicitly specified and defined otherwise, technical terms used in the present application should be construed in a general sense as understood by those skilled in the art to which the present application pertains. The terms "connected," "fixedly," "disposed" and the like are to be construed broadly and may be fixedly connected, detachably connected or integrally formed; can be directly connected or indirectly connected through an intermediate medium; either mechanically or electrically. Unless explicitly defined otherwise. The specific meaning of the above terms in the present application can be understood by those of ordinary skill in the art according to the specific circumstances. Unless expressly stated or limited otherwise, a first feature "up" or "down" a second feature may be the first and second features in direct contact, or the first and second features in indirect contact through an intervening medium. Moreover, a first feature being "above" or "over" or "upper" a second feature may be a first feature being directly above or diagonally above the second feature, or simply indicating that the first feature is higher in level than the second feature. The first feature being "under" or "beneath" or "under" the second feature may be the first feature being directly under or obliquely under the second feature, or simply indicating that the first feature is level less than the second feature. Relational terms such as first, second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further definition or explanation thereof is necessary in the following figures.
Examples:
the application aims to provide a method for identifying a data format of a log message, referring to fig. 1, comprising the following steps:
and step 1, carrying out statistical analysis on the log message analysis rules of all the devices, and matching first identification information based on the characteristics of the log message which can be analyzed by the log message analysis rules.
It should be understood that, due to the diversity of devices, the log message parsing rules of the protocol used for parsing the transmitted log message between the devices also have various differences, for example, the log message parsing rules between the router, the switch and the computer device are different, if the computer wants to read or utilize the log message of the router and the switch, the corresponding log message parsing rules need to be stored, and the first identification information may include a Priority (PRI) feature, a HEADER (HEADER) feature and a Message (MSG) feature in the log message feature.
And 2, establishing an analysis rule call library based on the log message analysis rules of all the devices, and determining the mapping relation between the first identification information and the analysis rule call library based on the first identification information.
It should be understood that the log message parsing rule of all the devices can be input into the corresponding devices when the devices are connected, and the input mode can be manual input through peripheral devices or network transmission input when network connection is established.
And step 3, generating first call information based on the time characteristics and source equipment of the past log message reception, and carrying out joint analysis by combining with the log message demand to obtain an analysis rule judgment model for determining an initial analysis rule strategy.
It should be understood that the time characteristics of the past log message and the method for acquiring the source device may be that the log message is received for a long time for monitoring and recording; the log message requirement can be understood as a requirement generated by a specific device for achieving monitoring of other devices or achieving a corresponding function based on log messages of other devices, and the log messages of different devices are collected and analyzed, so that the requirement is the log message requirement for achieving the purpose; the analysis rule judgment model is trained, the first calling information and the log message requirement are taken as input parameters, the log message analysis rule called at the same past time point is taken as output parameters, and the trained analysis rule judgment model comprises the following steps: the first call information includes a type of a log message, a time point of receiving the log message, an application time length of receiving the log message, and a source device of sending the log message, and more specifically, the first call information is represented as { OP, AB.CD, EF, GT }, where OP is the type of the log message applied, ab.cd is the time point of receiving the log message, EF is the application time length of receiving the log message, GT is the source device of the log message, the log message requirement is { a01, b01.c01, D01}, where a01 is a specific device of the log message required, b01.c01 is a time point of the log message required, D01 is a unit time requirement of the log message required, and a message analysis rule applied at the time point of receiving the log message corresponding to the log message is { E01, f01.h01, G01}, where E01 is a corresponding log message analysis rule, f01.h01 is a time point of using the log message analysis rule, and G01 is a time point of using the log rule of using the log message; the initial parsing rule policy specifically refers to what log message parsing rule is prepared to be called at what time point, and how long the log message parsing rule is applied.
And 4, determining an initial analysis rule strategy based on the analysis rule judgment model, evaluating the first identification information corresponding to the log message analysis rule in the initial analysis rule strategy according to the requirement satisfaction capability, and carrying out resource allocation on the log analysis rule running simultaneously according to the evaluation result to obtain a final analysis rule strategy.
And step 5, based on the final analysis rule policy, determining the calling of the log message analysis rule at different time points, the time length for calling the log message analysis rule and the resource allocation of each log message analysis rule.
It should be understood that the requirement meeting capability is evaluated as an evaluation of the capability of solving the requirement of the log message by the log message parsing rule in the initial parsing rule policy, and if the evaluation value is higher, it means that more system resources should be divided for priority processing.
In some embodiments of the present application, in order to distinguish the log message, the method is perfected as follows, and the feature matching of the log message which can be parsed based on the log message parsing rule includes:
the first step, based on the message header of the log message, the corresponding analysis rule of the log message is matched with a first identification element.
And step two, matching the corresponding log message analysis rule with a second identification element based on the priority of the log message.
And thirdly, matching the corresponding log message analysis rule with a third identification element based on the interval where the character number of the message content of the log message is.
It may be understood that the first identification information is identification information of a log message parsing rule, and is used for determining a log message that can be parsed by the log message parsing rule, where the first identification information includes { a first identification element, a second identification element, and a third identification element }.
In some embodiments of the present application, to determine a corresponding log message parsing rule in the parsing rule call library, determining a mapping relationship between the first identification information and the parsing rule call library includes:
the first step, based on the log message parsing rule, a storage position array { a1, a2, a3, … an } is established relative to the storage position in the parsing rule call library, wherein a1 is a first storage position, a2 is a second storage position, a3 is a third storage position, and an is an nth storage position.
And establishing a first identification information-storage position mapping set { a1-b1, a2-b2, a3-b3, …, an-bn } based on the matching relation between the first identification information and the log message parsing rule, wherein a1-b1 is a mapping relation group of the first identification information corresponding to the first storage position and the message parsing rule located at the first storage position, a2-b2 is a mapping relation group of the first identification information corresponding to the second storage position and the message parsing rule located at the second storage position, a3-b3 is a mapping relation group of the first identification information corresponding to the third storage position and the message parsing rule located at the third storage position, and an-bn is a mapping relation group of the first identification information corresponding to the n-th storage position and the message parsing rule located at the n-th storage position.
In some embodiments of the present application, in order to obtain an initial parsing rule policy, a method for generating a parsing rule judgment model is disclosed, to obtain a parsing rule judgment model for determining the initial parsing rule policy, including:
the method comprises the steps of monitoring a receiving time point, a receiving time length and source equipment of a log message, and generating a first log message receiving record line, wherein the first log message receiving record line gradually evolves along with time.
And a second step of setting a first label at a corresponding time point of the first log message receiving record line based on the receiving time point of the log message.
Third, for the first tag, a receiving duration and source equipment corresponding to the log message are configured.
Fourth, a second label is marked on the first log message receiving record line according to the log message demand, and the second label is configured with a device identifier for sending the log message and a required log message identifier.
And fifthly, monitoring and recording application time points and application time periods of the application of the log message analysis rules, and generating a first log message analysis rule application record line, wherein the first log message analysis rule application record line gradually evolves along with time.
And sixthly, aiming at the log message analysis rule, marking a third label on the application record line of the first log message analysis rule, wherein the third label is configured with the application time length of the log message analysis rule.
And seventhly, aligning the first log message receiving record line and the first log message analyzing rule application record line, and determining to obtain an analyzing rule judging model according to the relation characteristics of the first log message receiving record line and the first log message analyzing rule application record line.
Specifically, the analysis rule judgment model is an application program, when the application program is applied, scanning and comparing the current state parameters of all devices with a stored first log message receiving record line to determine a section conforming to the first log message receiving record line, and using a log analysis rule policy displayed by a corresponding section of the first log message analysis rule application record line corresponding to the section as an initial analysis rule policy.
In some embodiments of the present application, a method for obtaining an analysis rule judgment model is further disclosed, and determining, according to a relationship feature between the first log message receiving record line and the first log message analysis rule application record line, to obtain the analysis rule judgment model includes:
and the first step is to perform characteristic parameterization on the existence positions of the first label and the second label of the first log message receiving record line and the matching content to obtain a first input parameter set.
And secondly, carrying out characteristic parameterization on the existence position of the third label of the first log message analysis rule application record line and the matched content to obtain a first output parameter set.
And thirdly, taking the first input parameter set as an input parameter, taking the first output parameter set as an output parameter, and training by utilizing a neural network learning algorithm to obtain the analysis rule judgment model.
For example, the first input parameter set may be { K01, K02, K03, P01, P02, P03}, where K01 is a log message identifier, K02 is a location point of the log message, K03 is a receiving duration of the log message, and the first output parameter set may be { T01, T02, T03}, where T01 is an identifier of a log message parsing rule, T02 is a location point of the log message parsing rule, and T03 is a duration of applying the log message parsing rule.
In some embodiments of the present application, the evaluating the first identification information of the log message parsing rule in the initial parsing rule policy according to the requirement satisfaction capability includes:
the first step is to determine the message characteristics of the log messages analyzed by different log message analysis rules based on the first identification information.
And secondly, configuring a first requirement weight factor for the corresponding log message based on different log message requirements.
Third, based on the information characteristics of the log message, configuring a first information length factor for the corresponding log message.
Fourth, based on the first demand weight factor and the first message length factor, determining a meeting capability value of a log message parsing rule corresponding to the single log message.
The expression for determining the satisfaction capability value of the log message parsing rule is as follows:
wherein ,satisfaction capability value for the i-th log message parsing rule, < >>The nth first demand weight factor of the log message parsed by the ith log message parsing rule is +.>Adjusting coefficient of first message length factor of log message analyzed by ith log message analysis rule, +.>And the first message length factor of the log message analyzed by the i-th log message analysis rule.
It should be understood that, for a specific log message, the system may have multiple requirements, and based on the multiple requirements of the system, that is, different log message requirements, there are different requirement strengths for the corresponding log message, so that the first requirement weight factor needs to be configured for the corresponding log message.
In some embodiments of the present application, in order to obtain a final parsing rule policy, performing resource allocation on log parsing rules running simultaneously to obtain a final parsing rule policy, including: and according to the satisfaction capability value of the log message analysis rule, performing resource allocation on the log message analysis rule, and associating a resource allocation result with the log analysis rule applied in the initial analysis rule to obtain a final analysis rule strategy.
The application also discloses a log message data format recognition system, which comprises: the system comprises an analysis rule calling module, an initial analysis rule policy generation module and a final analysis rule policy generation module.
The parsing rule calling module is used for storing the log message parsing rules of all the devices, and building the log message parsing rules of all the devices into a parsing rule calling library, wherein the log message Wen Pi which can be parsed by the log message parsing rules is provided with first identification information, and a mapping relation is built between the first identification information and the log message parsing rules in the rule calling library.
The analysis rule judgment model generation module is used for analyzing the time characteristics and source equipment received by the past log message to generate first call information, and carrying out joint analysis on the first call information and the log message requirement to train and generate an analysis rule judgment model.
The initial analysis rule strategy generation module is used for operating the analysis rule judgment model, and inputting a current time point and a current log message requirement to the analysis rule judgment model so that the analysis rule judgment model generates an initial analysis rule strategy.
The final analysis rule policy generation module is used for evaluating the first identification information corresponding to the log message analysis rule in the initial analysis rule policy according to the requirement satisfaction capability, and performing resource allocation on the log analysis rule running simultaneously according to the evaluation result to obtain the final analysis rule policy. The technical scheme of the application is further described in detail through the drawings and the embodiments.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present application and not for limiting it, and although the present application has been described in detail with reference to the preferred embodiments, it will be understood by those skilled in the art that: the technical scheme of the application can be modified or replaced by the same, and the modified technical scheme cannot deviate from the spirit and scope of the technical scheme of the application.

Claims (8)

1. The method for identifying the data format of the log message is characterized by comprising the following steps:
step 1, carrying out statistical analysis on log message analysis rules of all devices, and matching first identification information based on characteristics of log messages which can be analyzed by the log message analysis rules;
step 2, based on the log message parsing rules of all the devices, a parsing rule calling library is established, and based on the first identification information, the mapping relation between the first identification information and the parsing rule calling library is determined;
step 3, generating first call information based on the time characteristics and source equipment of the past log message reception, and carrying out joint analysis by combining with the log message demand to obtain an analysis rule judgment model for determining an initial analysis rule strategy;
step 4, determining an initial analysis rule strategy based on the analysis rule judgment model, evaluating first identification information corresponding to the analysis rule of the log message in the initial analysis rule strategy according to the requirement meeting capability, and carrying out resource allocation on the log analysis rule running simultaneously according to the evaluation result to obtain a final analysis rule strategy;
and step 5, based on the final analysis rule policy, determining the calling of the log message analysis rule at different time points, the time length for calling the log message analysis rule and the resource allocation of each log message analysis rule.
2. The method for identifying a data format of a log message according to claim 1, wherein the feature matching of the log message which can be parsed based on the log message parsing rule includes:
based on a message header of a log message, matching a corresponding log message parsing rule with a first identification element;
matching the corresponding log message analysis rule with a second identification element based on the priority of the log message;
and matching the corresponding log message analysis rule with a third identification element based on the interval where the character number of the message content of the log message is.
3. The method for identifying a log message data format according to claim 2, wherein determining a mapping relationship between the first identification information and the parsing rule call library comprises:
based on the log message analysis rule, relative to the storage position in the analysis rule call library, a storage position array { a1, a2, a3, … an } is established, wherein a1 is a first storage position, a2 is a second storage position, a3 is a third storage position, and an is an nth storage position;
based on the matching relation between the first identification information and the log message analysis rule, a first identification information-storage position mapping set { a1-b1, a2-b2, a3-b3, …, an-bn } is established, wherein a1-b1 is a mapping relation group of the first identification information corresponding to the message analysis rule at the first storage position, a2-b2 is a mapping relation group of the first identification information corresponding to the message analysis rule at the second storage position, a3-b3 is a mapping relation group of the first identification information corresponding to the message analysis rule at the third storage position and the third storage position, and an-bn is a mapping relation group of the first identification information corresponding to the message analysis rule at the nth storage position and the nth storage position.
4. The method for identifying a data format of a log message according to claim 1, wherein obtaining an parsing rule judgment model for determining an initial parsing rule policy comprises:
monitoring a receiving time point, a receiving time length and source equipment of a log message, and generating a first log message receiving record line, wherein the first log message receiving record line gradually evolves along with time;
setting a first label at a corresponding time point of the first log message receiving record line based on the receiving time point of the log message;
configuring corresponding receiving time length and source equipment of the log message for the first label;
aiming at the requirement of the log message, a second label is marked on the first log message receiving record line, and the second label is configured with a device identifier for sending the log message and a required log message identifier;
monitoring an application time point and an application time length of log message analysis rule application, and generating a first log message analysis rule application record line, wherein the first log message analysis rule application record line gradually evolves along with time;
for the log message analysis rule, marking a third label on the application record line of the first log message analysis rule, wherein the third label is configured with the application time length of the log message analysis rule;
aligning the first log message receiving record line and the first log message analyzing rule application record line, and determining to obtain an analyzing rule judging model according to the relation characteristic of the first log message receiving record line and the first log message analyzing rule application record line.
5. The method for identifying a data format of a log message according to claim 4, wherein determining to obtain the analysis rule judgment model according to the relationship characteristics of the first log message receiving record line and the first log message analysis rule application record line comprises:
performing feature parameterization on the existence positions and matching contents of the first label and the second label of the first log message receiving record line to obtain a first input parameter set;
performing characteristic parameterization on the existence position of the third label of the first log message analysis rule application record line and the matched content to obtain a first output parameter set;
and training the first input parameter set as input parameters and the first output parameter set as output parameters by utilizing a neural network learning algorithm to obtain the analysis rule judgment model.
6. The method for identifying a data format of a log message according to claim 1, wherein the step of evaluating the first identification information of the log message parsing rule in the initial parsing rule policy according to the requirement satisfaction capability includes:
determining the message characteristics of the log messages analyzed by different log message analysis rules based on the first identification information;
configuring a first requirement weight factor for the corresponding log message based on different log message requirements;
configuring a first message length factor for the corresponding log message based on the message characteristics of the log message;
determining a satisfaction capability value of a log message parsing rule corresponding to a single log message based on the first demand weight factor and the first message length factor;
the expression for determining the satisfaction capability value of the log message parsing rule is as follows:
wherein ,satisfaction capability value for the i-th log message parsing rule, < >>The nth first demand weight factor of the log message parsed by the ith log message parsing rule is +.>Adjusting coefficient of first message length factor of log message analyzed by ith log message analysis rule, +.>The first message length of the log message parsed for the ith log message parsing rule is due toAnd (5) a seed.
7. The method for identifying a data format of a log message according to claim 6, wherein the resource allocation is performed on the log parsing rule running simultaneously to obtain a final parsing rule policy, comprising:
and according to the satisfaction capability value of the log message analysis rule, performing resource allocation on the log message analysis rule, and associating a resource allocation result with the log analysis rule applied in the initial analysis rule to obtain a final analysis rule strategy.
8. A log message data format recognition system, comprising:
the analysis rule calling module is used for storing the analysis rules of the log messages of all the devices and constructing the analysis rule calling library of the log messages of all the devices, wherein the log message Wen Pi which can be analyzed by the analysis rule of the log messages is provided with first identification information, and a mapping relation is established between the first identification information and the analysis rules of the log messages in the rule calling library;
the analysis rule judgment model generation module is used for analyzing the time characteristics and source equipment received by the past log message to generate first call information, carrying out joint analysis on the first call information and the log message requirement, and training to generate an analysis rule judgment model;
the initial analysis rule strategy generation module is used for operating the analysis rule judgment model and inputting a current time point and a current log message requirement to the analysis rule judgment model so that the analysis rule judgment model generates an initial analysis rule strategy;
and the final analysis rule strategy generation module is used for evaluating the first identification information corresponding to the log message analysis rule in the initial analysis rule strategy according to the requirement satisfaction capability, and carrying out resource allocation on the log analysis rule running simultaneously according to the evaluation result to obtain the final analysis rule strategy.
CN202310385045.4A 2023-04-11 2023-04-11 Log message data format identification method and system Active CN116578534B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310385045.4A CN116578534B (en) 2023-04-11 2023-04-11 Log message data format identification method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310385045.4A CN116578534B (en) 2023-04-11 2023-04-11 Log message data format identification method and system

Publications (2)

Publication Number Publication Date
CN116578534A true CN116578534A (en) 2023-08-11
CN116578534B CN116578534B (en) 2024-06-04

Family

ID=87540289

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310385045.4A Active CN116578534B (en) 2023-04-11 2023-04-11 Log message data format identification method and system

Country Status (1)

Country Link
CN (1) CN116578534B (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070283194A1 (en) * 2005-11-12 2007-12-06 Phillip Villella Log collection, structuring and processing
US20160292263A1 (en) * 2015-04-03 2016-10-06 Oracle International Corporation Method and system for implementing a log parser in a log analytics system
CN107332824A (en) * 2017-06-07 2017-11-07 北京奇安信科技有限公司 A kind of recognition methods of cloud application and device
CN111061696A (en) * 2019-12-17 2020-04-24 中国银行股份有限公司 Method and device for analyzing transaction message log
CN111368534A (en) * 2018-12-25 2020-07-03 中国移动通信集团浙江有限公司 Application log noise reduction method and device
CN111796997A (en) * 2020-07-02 2020-10-20 北京字节跳动网络技术有限公司 Log information processing method and device and electronic equipment
CN112468472A (en) * 2020-11-18 2021-03-09 中通服咨询设计研究院有限公司 Security policy self-feedback method based on security log association analysis
CN114706839A (en) * 2022-04-07 2022-07-05 京东科技信息技术有限公司 Log data processing method and device, electronic equipment and storage medium
CN115185964A (en) * 2022-06-25 2022-10-14 平安银行股份有限公司 Data synchronization method and device, computer equipment and storage medium

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070283194A1 (en) * 2005-11-12 2007-12-06 Phillip Villella Log collection, structuring and processing
US20160292263A1 (en) * 2015-04-03 2016-10-06 Oracle International Corporation Method and system for implementing a log parser in a log analytics system
CN107332824A (en) * 2017-06-07 2017-11-07 北京奇安信科技有限公司 A kind of recognition methods of cloud application and device
CN111368534A (en) * 2018-12-25 2020-07-03 中国移动通信集团浙江有限公司 Application log noise reduction method and device
CN111061696A (en) * 2019-12-17 2020-04-24 中国银行股份有限公司 Method and device for analyzing transaction message log
CN111796997A (en) * 2020-07-02 2020-10-20 北京字节跳动网络技术有限公司 Log information processing method and device and electronic equipment
CN112468472A (en) * 2020-11-18 2021-03-09 中通服咨询设计研究院有限公司 Security policy self-feedback method based on security log association analysis
CN114706839A (en) * 2022-04-07 2022-07-05 京东科技信息技术有限公司 Log data processing method and device, electronic equipment and storage medium
CN115185964A (en) * 2022-06-25 2022-10-14 平安银行股份有限公司 Data synchronization method and device, computer equipment and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
付建平等: ""基于事件日志的业务过程控制流异常检测算法:现状与评测"", 《计算机集成制造系统》, 28 February 2023 (2023-02-28), pages 18 *

Also Published As

Publication number Publication date
CN116578534B (en) 2024-06-04

Similar Documents

Publication Publication Date Title
CN112564974B (en) Deep learning-based fingerprint identification method for Internet of things equipment
CN111475370A (en) Operation and maintenance monitoring method, device and equipment based on data center and storage medium
CN109740838B (en) Provider service evaluation method based on big data and related equipment
CN111881164B (en) Data processing method based on edge computing and path analysis and big data cloud platform
CN116319777A (en) Intelligent gateway service processing method based on edge calculation
CN115277258B (en) Network attack detection method and system based on temporal-spatial feature fusion
CN112512073A (en) Internet of things equipment anomaly detection method based on fingerprint identification technology
CN114641080A (en) Data interaction method, sensor network and storage medium
CN117041019A (en) Log analysis method, device and storage medium of content delivery network CDN
CN116578534B (en) Log message data format identification method and system
CN117556318A (en) Early warning method and device of cable network identification system
CN111817935B (en) Internet intelligent home data processing method and system
CN115134403B (en) Internet of things communication assembly control method supporting multi-protocol access
CN105847978B (en) A kind of formation gathering method and system based on smart television
CN115967730A (en) Data acquisition method and system based on Internet of things
CN114006945A (en) Intelligent grouping distribution method of Internet of things data and Internet of things platform
CN110544182B (en) Power distribution communication network fusion control method and system based on machine learning technology
CN111597068A (en) IT operation and maintenance management method and IT operation and maintenance management device
CN112721933A (en) Agricultural tractor&#39;s control terminal based on speech recognition
CN112801136B (en) Internet of things gateway data processing method and device with characteristic identification
CN116385080B (en) Mobile internet user data statistics popularization system based on artificial intelligence
CN116882968B (en) Design and implementation method for fault defect overall process treatment
CN116938986B (en) Intelligent campus management method and system based on Internet of things
CN117395198B (en) Congestion alarm method and system for power communication network
CN117057786B (en) Intelligent operation and maintenance management method, system and storage medium for data center

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant