CN110990350A - Log analysis method and device - Google Patents

Log analysis method and device Download PDF

Info

Publication number
CN110990350A
CN110990350A CN201911190459.1A CN201911190459A CN110990350A CN 110990350 A CN110990350 A CN 110990350A CN 201911190459 A CN201911190459 A CN 201911190459A CN 110990350 A CN110990350 A CN 110990350A
Authority
CN
China
Prior art keywords
regular expression
field
field identifier
identifier
requirement
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911190459.1A
Other languages
Chinese (zh)
Other versions
CN110990350B (en
Inventor
韩佩利
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Taikang Insurance Group Co Ltd
Original Assignee
Taikang Insurance Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Taikang Insurance Group Co Ltd filed Critical Taikang Insurance Group Co Ltd
Priority to CN201911190459.1A priority Critical patent/CN110990350B/en
Publication of CN110990350A publication Critical patent/CN110990350A/en
Application granted granted Critical
Publication of CN110990350B publication Critical patent/CN110990350B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/1734Details of monitoring file system events, e.g. by the use of hooks, filter drivers, logs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/90335Query processing
    • G06F16/90344Query processing by using string matching techniques
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The application discloses a log analysis method and device. The method comprises the steps of obtaining current field requirements of target application services, wherein the current field requirements comprise at least one target field identifier and the sequence of the at least one target field identifier; searching a stored regular expression library according to a stored regular expression of the target application service and the current field requirement, and acquiring a field identifier to be updated and an updating operation aiming at the stored regular expression; updating the stored regular expression according to the field identifier to be updated and the updating operation to generate a regular expression meeting the requirements of the current field; and analyzing the log of the target application service by adopting a regular expression meeting the current field requirement to obtain log analysis content. Compared with the prior art, the method avoids manual modification of the regular expression, and improves the modification efficiency and the accuracy of the regular expression modification.

Description

Log analysis method and device
Technical Field
The present application relates to the field of data processing technologies, and in particular, to a method and an apparatus for analyzing a log.
Background
Currently, each company stores log information of a plurality of Application services, such as Application programs (APPs), used by a log system having the same log template. The log information generated by the APP has a requirement for accessing a log analysis platform, and after an application service expects to access the log analysis platform, a field analyzed by data analysis (ETL) can be seen on an elastic search of the distributed full-text retrieval system.
The method adopted at present is that a logstack log collecting tool is installed on terminal equipment where an APP operates, log information under a specified directory is collected and stored in a configuration file of logstack, a regular expression is adopted for analysis, target analysis fields of each log are analyzed, and then the fields are written into an Elasticsearch for service viewing, statistics and the like.
As described above, for the existing technical solution, there are many APPs having log access requirements, and although the log formats of each APP are substantially the same, the fields that each application service is expected to parse are possibly different, for example, the fields that APP1 is expected to parse are a field, B field, and C field, and the fields that APP2 is expected to parse are a field and B field, so that in actual development, regular expressions need to be manually set one by one for the specific situation of each APP. When the parsing requirement (or "field requirement") of an application service changes, such as increasing or decreasing fields, the regular expression also needs to be modified manually.
Because regular expressions for log analysis are long, manual modification of the regular expressions for log analysis is complex and low in efficiency based on the conventional manual modification mode, and meanwhile, manual modification errors exist, the modification accuracy is low, and log analysis errors are caused.
Disclosure of Invention
The embodiment of the application provides a log analysis method and device, solves the problems in the prior art, avoids manual modification of regular expressions, and improves modification efficiency and accuracy of the regular expressions.
In a first aspect, a method for parsing a log is provided, where the method may include:
acquiring a current field requirement of the target application service, wherein the current field requirement comprises at least one target field identifier and the sequence of the at least one target field identifier;
searching a stored regular expression library according to the stored regular expression of the target application service and the current field requirement, and acquiring a field identifier to be updated and an updating operation aiming at the stored regular expression, wherein the regular expression library is used for storing the regular expression of the field identifier and the regular expression of the field identifier combination;
updating the stored regular expression according to the field identifier to be updated and the updating operation to generate a regular expression meeting the requirement of the current field;
and analyzing the log of the target application service by adopting a regular expression meeting the requirements of the current field to obtain log analysis content.
In an optional implementation, before obtaining the current field requirement of the target application service, the method further includes:
acquiring an initial field requirement of a target application service, wherein the initial field requirement comprises at least one field identifier and the sequence of the at least one field identifier;
searching a stored regular expression library, and acquiring a regular expression of each field identifier in the at least one field identifier;
and combining the regular expressions of each field identifier in the at least one field identifier according to the sequence of the at least one field identifier by adopting an expression combination algorithm to generate the regular expressions meeting the requirements of the initial field and the regular expressions combined by the field identifiers.
In an optional implementation, an expression combination algorithm is adopted to combine the regular expressions of each field identifier in the at least one field identifier according to the sequence of the at least one field identifier, so as to generate a regular expression meeting the requirement of the initial field, and the regular expression of the field identifier combination includes:
sorting the regular expressions of the at least one field identifier according to the sequence of the at least one field identifier;
and combining adjacent regular expressions in the sorted regular expressions by adding a preset regular expression to generate the regular expression meeting the requirement of the initial field and the regular expression combined by the field identifier, wherein the preset regular expression is a regular expression representing any character matched.
In an optional implementation, according to the field identifier to be updated and the update operation, performing update processing on the stored regular expression to generate a regular expression meeting the requirement of the current field, including:
acquiring a regular expression of the field identifier to be updated;
if the updating operation is an adding operation, adding the regular expression of the field identifier to be updated in the stored regular expression to generate a regular expression meeting the requirement of the current field;
and if the updating operation is a deleting operation, deleting the regular expression of the field identifier to be updated in the stored regular expression, and generating the regular expression meeting the requirement of the current field.
In an optional implementation, adding the regular expression identified by the field to be updated to the stored regular expression, and generating the regular expression meeting the requirement of the current field, includes:
searching the regular expression library according to the sequence of the at least one target field identifier, and acquiring a regular expression of the field identifier corresponding to the stored regular expression or a regular expression of the field identifier combination;
and combining the regular expression of the field identifier to be updated with the obtained regular expression of the field identifier or the regular expression of the field identifier combination by adopting an expression combination algorithm to generate the regular expression meeting the requirement of the current field.
In an alternative implementation, the field identifiers and corresponding regular expressions in the regular expression library are stored in the form of key-value pairs.
In an optional implementation, the regular expression library includes a general regular expression library and a customized regular expression library;
before obtaining the current field requirement of the target application service, the method further comprises:
acquiring an input field identifier and a corresponding regular expression, and a field identifier combination and a corresponding regular expression;
counting the number of application services using the input field identifications or field identification combinations;
if the number of the used fields is not less than a preset number threshold, adding the input field identification or field identification combination and the corresponding regular expression into the general regular expression library;
and if the using number is smaller than the preset number threshold, adding the input field identification or the field identification combination and the corresponding regular expression into the customized regular expression library.
In an optional implementation, the method further comprises:
counting the use number of the application service using each field identification or the field identification combination in the customized regular expression library;
and if the usage number counted in real time is not less than the preset number threshold, removing the counted field identifications or field identification combinations and the corresponding regular expressions from the customized regular expression library, and adding the field identifications or field identification combinations and the corresponding regular expressions into the general regular expression library.
In a second aspect, an apparatus for parsing a log is provided, and the apparatus may include: the device comprises an acquisition unit, a generation unit and an analysis unit;
the obtaining unit is configured to obtain a current field requirement of the target application service, where the current field requirement includes at least one target field identifier and an order of the at least one target field identifier;
searching a stored regular expression library according to the stored regular expression of the target application service and the current field requirement, and acquiring a field identifier to be updated and an updating operation of the stored regular expression, wherein the regular expression library is used for storing the regular expression of the field identifier and the regular expression of the field identifier combination;
the generating unit is configured to update the stored regular expression according to the field identifier to be updated and the update operation, and generate a regular expression meeting the requirement of the current field;
and the analysis unit is used for analyzing the log of the target application service by adopting the regular expression meeting the requirement of the current field to obtain log analysis content.
In an optional implementation, the obtaining unit is further configured to obtain an initial field requirement of the target application service, where the initial field requirement includes at least one field identifier and an order of the at least one field identifier;
searching a stored regular expression library, and acquiring the regular expression of each field identifier in the at least one field identifier;
the generating unit is further configured to combine the regular expressions of each field identifier in the at least one field identifier according to the sequence of the at least one field identifier by using an expression combination algorithm, so as to generate a regular expression meeting the requirement of the initial field and a regular expression combined by the field identifiers.
In an optional implementation, the generating unit is specifically configured to sort the regular expressions identified by the at least one field according to the order of the at least one field identifier;
and combining adjacent regular expressions in the sorted regular expressions by adding a preset regular expression to generate the regular expression meeting the requirement of the initial field and the regular expression combined by the field identifier, wherein the preset regular expression is a regular expression representing any character matched.
In an optional implementation, the generating unit is further configured to obtain a regular expression of the field identifier to be updated;
if the updating operation is an adding operation, adding the regular expression of the field identifier to be updated in the stored regular expression to generate a regular expression meeting the requirement of the current field;
and if the updating operation is a deleting operation, deleting the regular expression of the field identifier to be updated in the stored regular expression, and generating the regular expression meeting the requirement of the current field.
In an optional implementation, the generating unit is further specifically configured to search the regular expression library according to the sequence of the at least one target field identifier, and obtain a regular expression of a field identifier corresponding to the stored regular expression or a regular expression of a combination of field identifiers;
and combining the regular expression of the field identifier to be updated with the obtained regular expression of the field identifier or the regular expression of the field identifier combination by adopting an expression combination algorithm to generate the regular expression meeting the requirement of the current field.
In an alternative implementation, the field identifiers and corresponding regular expressions in the regular expression library are stored in the form of key-value pairs.
In an optional implementation, the regular expression library includes a general regular expression library and a customized regular expression library; the device also comprises a statistical unit and an adding unit;
the acquisition unit is further used for acquiring the input field identifier and the corresponding regular expression, and the field identifier combination and the corresponding regular expression;
the statistic unit is used for counting the using quantity of the application services using the input field identifications or field identification combinations;
the adding unit is used for adding the input field identification or field identification combination and the corresponding regular expression into the general regular expression library if the using number is not less than a preset number threshold;
and if the using number is smaller than the preset number threshold, adding the input field identification or the field identification combination and the corresponding regular expression into the customized regular expression library.
In an optional implementation, the statistical unit is further configured to count a usage number of the application service using each field identifier or a combination of field identifiers in the customized regular expression library;
the adding unit is further configured to remove the counted field identifier or field identifier combination and the corresponding regular expression from the customized regular expression library and add the field identifier or field identifier combination and the corresponding regular expression into the general regular expression library if the usage number counted in real time is not less than the preset number threshold.
In a third aspect, an electronic device is provided, which includes a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory complete communication with each other through the communication bus;
a memory for storing a computer program;
a processor adapted to perform the method steps of any of the above first aspects when executing a program stored in the memory.
In a fourth aspect, a computer-readable storage medium is provided, having stored therein a computer program which, when executed by a processor, performs the method steps of any of the above first aspects.
According to the log analysis method provided by the embodiment of the invention, after the current field requirement of the target application service is obtained, the current field requirement comprises at least one target field identifier and the sequence of the at least one target field identifier; searching a stored regular expression library according to a stored regular expression of a target application service and a current field requirement, and acquiring a field identifier to be updated and an updating operation aiming at the stored regular expression, wherein the regular expression library is used for storing the regular expression of the field identifier and the regular expression of the field identifier combination; updating the stored regular expression according to the field identifier to be updated and the updating operation to generate a regular expression meeting the requirements of the current field; and analyzing the log of the target application service by adopting a regular expression meeting the current field requirement to obtain log analysis content. Compared with the prior art, the method avoids manual modification of the regular expression, and improves the modification efficiency and the accuracy of the regular expression modification.
Drawings
Fig. 1 is a system architecture diagram applied to a method for parsing a log according to an embodiment of the present invention;
fig. 2 is a schematic flowchart of a method for parsing a log according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of a log parsing apparatus according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments of the present application without any creative effort belong to the protection scope of the present application.
The method for analyzing the log provided by the embodiment of the invention can be applied to the system shown in fig. 1, and the system can comprise a terminal and a server.
The server may be an application server or a cloud server; a terminal may be a User Equipment (UE) such as a Mobile phone, a smart phone, a laptop, a digital broadcast receiver, a Personal Digital Assistant (PDA), a tablet computer (PAD), a computing device or other processing device connected to a wireless modem, a Mobile Station (MS), etc.
The terminal may include a series of application services, such as a series of different APPs, and the logs generated by the APPs are uploaded to the server by the terminal. The log format of each APP may comprise at least one field and logs of a series of different APPs are stored in a log system of the server having the same log template.
Due to the fact that the log templates are the same, log formats of different APPs are similar, a plurality of fields are the same, and some unique fields are provided, for example, the log format of the APP1 comprises a field 1, a field 2 and a field 3; the log format of APP2 includes field 1, field 3.
In the embodiment of the invention, under the condition that the field requirement of the application service is changed, such as field addition or field reduction, the log is analyzed and field split, the field identification meeting the field requirement of the application service and the field identification stored in a traversing way are extracted, the regular expression is matched, and the regular expression meeting the field requirement is finally output, so that the regular expression meeting the field requirement can be directly loaded into the configuration file of logstack, the regular expression in the configuration file is prevented from being manually and directly modified, and the modification efficiency and the accuracy of the regular expression are improved.
The preferred embodiments of the present application will be described below with reference to the accompanying drawings of the specification, it being understood that the preferred embodiments described herein are merely for illustrating and explaining the present invention and are not intended to limit the present invention, and that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.
Fig. 2 is a schematic flowchart of a method for parsing a log according to an embodiment of the present invention. As shown in fig. 2, the method may include:
step 210, obtaining the current field requirement of the target application service.
Prior to performing this step, a regular expression of the target application service may be obtained, including:
firstly, obtaining an initial field requirement of a target application service, wherein the initial field requirement can comprise at least one field identifier and the sequence of the at least one field identifier;
wherein the initial field requirement is the parsing requirement of the initial target application service, i.e. the log content that the initial target application service wants to obtain from the log. The initial field requirements may include an order of at least one field identification and at least one target field identification. For example, the initial field requirement may include at least one of an application service name (APPName), a user IP address (usersip), a device identification (DeviceId), and a staff identification (StaffNumber), and the like, and a permutation number of the corresponding field identification.
And secondly, searching a stored regular expression library, and acquiring the regular expression of each field identifier in at least one field identifier.
The regular expression library is used for storing the field identifier regular expression and the field identifier combined regular expression, the field identifier and the corresponding regular expression are stored in a key-value mode, the key represents the field identifier, and the value represents the corresponding regular expression. The regular expression library can be stored in a mysql database, a redis database or a configuration file according to the requirements of actual projects.
Optionally, if there is a field identifier that fails to be matched, that is, there is no field identifier that matches the field identifier in the regular expression library, generating notification information to notify a technician to update the regular expression library.
Further, an expression combination algorithm is adopted to combine the regular expressions of each field identifier in at least one field identifier according to the sequence of the at least one field identifier, so as to generate the regular expressions meeting the requirements of the initial field and the regular expressions combined by the field identifiers.
Specifically, the regular expressions of at least one field identifier are sorted according to the sequence of the at least one field identifier;
and combining adjacent regular expressions in the sorted regular expressions by adding a preset regular expression to generate a regular expression meeting the requirement of an initial field and a field identifier combined regular expression, wherein the preset regular expression is a regular expression for representing any character matched.
The preset regular expression is a regular expression which represents that any character is matched, such as that "\ s" is matched with any blank character, including a blank space, a tab character, a page change character and the like; "\ w" matches any word character including underlining, equivalent to "[ A-Za-z0-9_ ]; "\ d" matches a numeric character, equivalent to [0-9 ].
Or, directly splicing and combining adjacent regular expressions in the sorted regular expressions to generate the regular expression meeting the requirement of the initial field and the regular expression combined by the field identifier.
And storing the regular expression combined with the field identification.
Returning to step 210, the current field requirement of the target application service, i.e. the resolution requirement of the current target application service, is obtained, and the current field requirement may include at least one target field identifier and the order of the at least one target field identifier.
The current field requirement and the initial field requirement can be the same or different, and when the current field requirement and the initial field requirement are different, the target application service is indicated to update the analysis content of the log.
Step 220, according to the stored regular expression of the target application service and the current field requirement, searching the stored regular expression library, and acquiring the field identifier to be updated and the updating operation aiming at the stored regular expression.
Searching the stored regular expression of the target application service, and if the current field requirement of the target application service is the requirement of the second acquisition, the searched regular expression of the target application service is the initial regular expression.
Searching a field identifier corresponding to the stored regular expression in a regular expression library;
comparing the field identifier corresponding to the stored regular expression with at least one target field identifier in the current field requirement to obtain a field identifier to be updated and an updating operation, namely obtaining redundant or lacked field identifiers in the stored field identifier corresponding to the regular expression relative to the at least one target field identifier, wherein the redundant field identifiers are determined as the field identifiers to be deleted, and a deleting operation is determined, or the lacked field identifiers are determined as the field identifiers to be added, and an adding operation is determined.
And step 230, updating the stored regular expression according to the field identifier to be updated and the updating operation, and generating the regular expression meeting the requirements of the current field.
Acquiring a regular expression of a field identifier to be updated;
if the updating operation is deleting operation, deleting the regular expression of the field identifier to be updated in the stored regular expression, and generating the regular expression meeting the current field requirement;
and if the updating operation is an adding operation, adding the regular expression of the field identifier to be updated in the stored regular expression, and generating the regular expression meeting the current field requirement.
Wherein, the process for adding the regular expression may include:
searching a regular expression library according to the sequence of at least one target field identifier, and acquiring a regular expression of the field identifier corresponding to the stored regular expression or a regular expression of a field identifier combination;
and combining the regular expression of the field identifier to be updated with the obtained regular expression of the field identifier or the regular expression of the field identifier combination by adopting an expression combination algorithm to generate the regular expression meeting the current field requirement.
It should be noted that, after deleting the regular expression identified by the field to be updated in the stored regular expression, the regular expressions retained at the front end and the rear end are still combined by adopting an expression combination algorithm.
For example, when the fields corresponding to the stored regular expressions are identified as A, B and C, and the field to be added is identified as X:
if the sequence of at least one target field identifier is A, B, C, X, the regular expressions of X are added after the stored regular expressions, or A, B, X, C regular expressions are respectively obtained and combined by adopting an expression combination algorithm.
If the sequence of at least one target field identifier is A, B, X, C, acquiring a regular expression formed by combining A and B, sequentially adding regular expressions of X and C, or respectively acquiring A, B, X, C regular expressions, and combining by adopting an expression combination algorithm.
And 240, analyzing the log of the target application service by adopting the regular expression meeting the current field requirement, and acquiring the log analysis content.
Further, for convenience of management, the regular expression library may include a general regular expression library (or "general regular expression template library") and a customized regular expression library (or "customized regular expression template library").
The forming process of the regular expression library can comprise the following steps: acquiring field identifications and corresponding regular expressions which are manually input by technicians, and field identification combinations and corresponding regular expressions; during the use process, counting the use number of the application service identified by each input field;
if the number of the used fields is not less than the preset number threshold, adding the input field identification or the field identification combination and the corresponding regular expression into a general regular expression library;
and if the using quantity is smaller than the preset quantity threshold value, adding the input field identification or the field identification combination and the corresponding regular expression into a customized regular expression library.
For the general regular expression library, because the number of the application services corresponding to the regular expressions in the general regular expression library is large, and each modification affects the analysis of the application services with large number of the application services to the relevant fields, it is very careful to modify the regular expressions therein, so that the management of the general template library needs to allocate more strict authorities
For the customized regular expression library, the number of application services corresponding to the regular expressions in the customized regular expression library is small, so that slightly loose permissions can be allocated, and a user can update configuration more flexibly.
Further, in actual business, some fields are not owned by every application service at first, and some application services may be used first, that is, regular expressions corresponding to the application services are stored in the customized regular expression library. However, as business progresses and field requirements change, fields not owned by every application service may be used by a large number of application services at first, and since the field identifications of the fields are stored and the customized regular expression library works stably for a period of time, the field identifications can be deleted from the customized regular expression library and added to the general regular expression library.
Specifically, the use number of the application service of each field identifier or the combination of the field identifiers in the customized regular expression library can be used in real time or periodically;
and if the usage number counted in real time is not less than the preset number threshold, removing the counted field identifications or field identification combinations and the corresponding regular expressions from the customized regular expression library, and adding the field identifications or field identification combinations and the corresponding regular expressions into the general regular expression library.
Optionally, in order to improve matching efficiency of the field identifier, when searching the stored regular expression library, the general regular expression library may be searched first, and then the customized regular expression library may be searched.
The method for analyzing the log obtains the current field requirement of the target application service, wherein the current field requirement comprises at least one target field identifier and the sequence of the at least one target field identifier; searching a stored regular expression library according to a stored regular expression of a target application service and a current field requirement, and acquiring a field identifier to be updated and an updating operation of the stored regular expression, wherein the regular expression library is used for storing the regular expression of the field identifier and the regular expression of the field identifier combination; updating the stored regular expression according to the field identifier to be updated and the updating operation to generate a regular expression meeting the requirements of the current field; and analyzing the log of the target application service by adopting a regular expression meeting the current field requirement to obtain log analysis content. Compared with the prior art, the method avoids manual modification of the regular expression, and improves the modification efficiency and the accuracy of the regular expression modification.
Corresponding to the foregoing method, an embodiment of the present invention further provides a log parsing apparatus, and as shown in fig. 3, the apparatus for generating a regular expression for log parsing includes: an acquisition unit 310, a generation unit 320, and an analysis unit 330;
an obtaining unit 310, configured to obtain a current field requirement of the target application service, where the current field requirement includes at least one target field identifier and an order of the at least one target field identifier;
searching a stored regular expression library according to the stored regular expression of the target application service and the current field requirement, and acquiring a field identifier to be updated and an updating operation of the stored regular expression, wherein the regular expression library is used for storing the regular expression of the field identifier and the regular expression of the field identifier combination;
a generating unit 320, configured to update the stored regular expression according to the field identifier to be updated and the update operation, and generate a regular expression meeting the requirement of the current field;
and the analyzing unit 330 is configured to analyze the log of the target application service by using the regular expression meeting the requirement of the current field, and acquire log analysis content.
In an optional implementation, the obtaining unit 310 is further configured to obtain an initial field requirement of the target application service, where the initial field requirement includes at least one field identifier and an order of the at least one field identifier;
searching a stored regular expression library, and acquiring the regular expression of each field identifier in the at least one field identifier;
the generating unit 320 is further configured to combine the regular expressions of each field identifier in the at least one field identifier according to the sequence of the at least one field identifier by using an expression combination algorithm, so as to generate a regular expression meeting the requirement of the initial field and a regular expression combined by the field identifiers.
In an optional implementation, the generating unit 320 is specifically configured to sort the regular expressions identified by the at least one field according to the order of the at least one field identification;
and combining adjacent regular expressions in the sorted regular expressions by adding a preset regular expression to generate the regular expression meeting the requirement of the initial field and the regular expression combined by the field identifier, wherein the preset regular expression is a regular expression representing any character matched.
In an optional implementation, the generating unit 320 is further configured to obtain a regular expression of the field identifier to be updated;
if the updating operation is an adding operation, adding the regular expression of the field identifier to be updated in the stored regular expression to generate a regular expression meeting the requirement of the current field;
and if the updating operation is a deleting operation, deleting the regular expression of the field identifier to be updated in the stored regular expression, and generating the regular expression meeting the requirement of the current field.
In an optional implementation, the generating unit 320 is further specifically configured to search the regular expression library according to the sequence of the at least one target field identifier, and obtain a regular expression of a field identifier corresponding to the stored regular expression or a regular expression of a combination of field identifiers;
and combining the regular expression of the field identifier to be updated with the obtained regular expression of the field identifier or the regular expression of the field identifier combination by adopting an expression combination algorithm to generate the regular expression meeting the requirement of the current field.
In an alternative implementation, the field identifiers and corresponding regular expressions in the regular expression library are stored in the form of key-value pairs.
In an optional implementation, the regular expression library includes a general regular expression library and a customized regular expression library;
the apparatus further comprises a statistics unit 340 and an adding unit 350;
the obtaining unit 310 is further configured to obtain the input field identifier and the corresponding regular expression, and a combination of the field identifiers and the corresponding regular expression;
a counting unit 340 for counting the number of applications using the input field identifier or field identifier combination;
an adding unit 350, configured to add the input field identifier or field identifier combination and the corresponding regular expression to the general regular expression library if the usage number is not less than a preset number threshold;
and if the using number is smaller than the preset number threshold, adding the input field identification or the field identification combination and the corresponding regular expression into the customized regular expression library.
In an optional implementation, the counting unit 340 is further configured to count the number of applications using each field identifier or a combination of field identifiers in the customized regular expression library;
the adding unit 350 is further configured to remove the counted field identifier or field identifier combination and the corresponding regular expression from the customized regular expression library and add the field identifier or field identifier combination and the corresponding regular expression into the general regular expression library, if the usage number counted in real time is not less than the preset number threshold.
The functions of each functional unit of the log analysis device provided in the above embodiment of the present invention may be implemented by using the above method steps, and therefore, detailed working processes and beneficial effects of each unit in the log analysis device provided in the embodiment of the present invention are not repeated herein.
An embodiment of the present invention further provides an electronic device, as shown in fig. 4, including a processor 410, a communication interface 420, a memory 430, and a communication bus 440, where the processor 410, the communication interface 420, and the memory 430 complete mutual communication through the communication bus 440.
A memory 430 for storing computer programs;
the processor 410, when executing the program stored in the memory 430, implements the following steps:
acquiring a current field requirement of the target application service, wherein the current field requirement comprises at least one target field identifier and the sequence of the at least one target field identifier;
searching a stored regular expression library according to the stored regular expression of the target application service and the current field requirement, and acquiring a field identifier to be updated and an updating operation of the stored regular expression, wherein the regular expression library is used for storing the regular expression of the field identifier and the regular expression of the field identifier combination;
updating the stored regular expression according to the field identifier to be updated and the updating operation to generate a regular expression meeting the requirement of the current field;
and analyzing the log of the target application service by adopting a regular expression meeting the requirements of the current field to obtain log analysis content.
In an optional implementation, before obtaining the current field requirement of the target application service, the method further includes:
acquiring an initial field requirement of a target application service, wherein the initial field requirement comprises at least one field identifier and the sequence of the at least one field identifier;
searching a stored regular expression library, and acquiring a regular expression of each field identifier in the at least one field identifier;
and combining the regular expressions of each field identifier in the at least one field identifier according to the sequence of the at least one field identifier by adopting an expression combination algorithm to generate the regular expressions meeting the requirements of the initial field and the regular expressions combined by the field identifiers.
In an optional implementation, an expression combination algorithm is adopted to combine the regular expressions of each field identifier in the at least one field identifier according to the sequence of the at least one field identifier, so as to generate a regular expression meeting the requirement of the initial field, and the regular expression of the field identifier combination includes:
sorting the regular expressions of the at least one field identifier according to the sequence of the at least one field identifier;
and combining adjacent regular expressions in the sorted regular expressions by adding a preset regular expression to generate the regular expression meeting the requirement of the initial field and the regular expression combined by the field identifier, wherein the preset regular expression is a regular expression representing any character matched.
In an optional implementation, according to the field identifier to be updated and the update operation, performing update processing on the stored regular expression to generate a regular expression meeting the requirement of the current field, including:
acquiring a regular expression of the field identifier to be updated;
if the updating operation is an adding operation, adding the regular expression of the field identifier to be updated in the stored regular expression to generate a regular expression meeting the requirement of the current field;
and if the updating operation is a deleting operation, deleting the regular expression of the field identifier to be updated in the stored regular expression, and generating the regular expression meeting the requirement of the current field.
In an optional implementation, adding the regular expression identified by the field to be updated to the stored regular expression, and generating the regular expression meeting the requirement of the current field, includes:
searching the regular expression library according to the sequence of the at least one target field identifier, and acquiring a regular expression of the field identifier corresponding to the stored regular expression or a regular expression of the field identifier combination;
and combining the regular expression of the field identifier to be updated with the obtained regular expression of the field identifier or the regular expression of the field identifier combination by adopting an expression combination algorithm to generate the regular expression meeting the requirement of the current field.
In an alternative implementation, the field identifiers and corresponding regular expressions in the regular expression library are stored in the form of key-value pairs.
In an optional implementation, the regular expression library includes a general regular expression library and a customized regular expression library;
before obtaining the current field requirement of the target application service, the method further comprises:
acquiring an input field identifier and a corresponding regular expression, and a field identifier combination and a corresponding regular expression;
counting the number of application services using the input field identifications or field identification combinations;
if the number of the used fields is not less than a preset number threshold, adding the input field identification or field identification combination and the corresponding regular expression into the general regular expression library;
and if the using number is smaller than the preset number threshold, adding the input field identification or the field identification combination and the corresponding regular expression into the customized regular expression library.
In an optional implementation, the method further comprises:
counting the use number of the application service using each field identification or the field identification combination in the customized regular expression library;
and if the usage number counted in real time is not less than the preset number threshold, removing the counted field identifications or field identification combinations and the corresponding regular expressions from the customized regular expression library, and adding the field identifications or field identification combinations and the corresponding regular expressions into the general regular expression library.
The aforementioned communication bus may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus.
The communication interface is used for communication between the electronic equipment and other equipment.
The Memory may include a Random Access Memory (RAM) or a Non-Volatile Memory (NVM), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.
The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component.
Since the implementation manner and the beneficial effects of the problem solving of each device of the electronic device in the foregoing embodiment can be implemented by referring to each step in the embodiment shown in fig. 2, detailed working processes and beneficial effects of the electronic device provided by the embodiment of the present invention are not described herein again.
In another embodiment of the present invention, there is also provided a computer-readable storage medium, which stores instructions that, when executed on a computer, cause the computer to execute the log parsing method described in any one of the above embodiments.
In yet another embodiment of the present invention, there is also provided a computer program product containing instructions which, when run on a computer, cause the computer to perform the method for log parsing of any of the above embodiments.
As will be appreciated by one of skill in the art, the embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, embodiments of the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
Embodiments of the present application are described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including the preferred embodiment and all changes and modifications that fall within the true scope of the embodiments of the present application.
It is apparent that those skilled in the art can make various changes and modifications to the embodiments of the present application without departing from the spirit and scope of the embodiments of the present application. Thus, if such modifications and variations of the embodiments of the present application fall within the scope of the claims of the embodiments of the present application and their equivalents, the embodiments of the present application are also intended to include such modifications and variations.

Claims (10)

1. A method for parsing a log, the method comprising:
acquiring a current field requirement of the target application service, wherein the current field requirement comprises at least one target field identifier and the sequence of the at least one target field identifier;
searching a stored regular expression library according to the stored regular expression of the target application service and the current field requirement, and acquiring a field identifier to be updated and an updating operation aiming at the stored regular expression, wherein the regular expression library is used for storing the regular expression of the field identifier and the regular expression of the field identifier combination;
updating the stored regular expression according to the field identifier to be updated and the updating operation to generate a regular expression meeting the requirement of the current field;
and analyzing the log of the target application service by adopting a regular expression meeting the requirements of the current field to obtain log analysis content.
2. The method of claim 1, wherein prior to obtaining the current field requirements of the target application service, the method further comprises:
acquiring an initial field requirement of a target application service, wherein the initial field requirement comprises at least one field identifier and the sequence of the at least one field identifier;
searching a stored regular expression library, and acquiring a regular expression of each field identifier in the at least one field identifier;
and combining the regular expressions of each field identifier in the at least one field identifier according to the sequence of the at least one field identifier by adopting an expression combination algorithm to generate the regular expressions meeting the requirements of the initial field and the regular expressions combined by the field identifiers.
3. The method as claimed in claim 2, wherein the step of combining the regular expressions of each field identifier in the at least one field identifier according to the sequence of the at least one field identifier by using an expression combination algorithm to generate the regular expressions meeting the initial field requirements, and the regular expressions of the field identifier combination comprises:
sorting the regular expressions of the at least one field identifier according to the sequence of the at least one field identifier;
and combining adjacent regular expressions in the sorted regular expressions by adding a preset regular expression to generate the regular expression meeting the requirement of the initial field and the regular expression combined by the field identifier, wherein the preset regular expression is a regular expression representing any character matched.
4. The method of claim 1,
according to the field identifier to be updated and the updating operation, updating the stored regular expression to generate a regular expression meeting the requirement of the current field, including:
acquiring a regular expression of the field identifier to be updated;
if the updating operation is an adding operation, adding the regular expression of the field identifier to be updated in the stored regular expression to generate a regular expression meeting the requirement of the current field;
and if the updating operation is a deleting operation, deleting the regular expression of the field identifier to be updated in the stored regular expression, and generating the regular expression meeting the requirement of the current field.
5. The method as claimed in claim 4, wherein adding the regular expression of the field identifier to be updated in the stored regular expression, and generating the regular expression satisfying the current field requirement includes:
searching the regular expression library according to the sequence of the at least one target field identifier, and acquiring a regular expression of the field identifier corresponding to the stored regular expression or a regular expression of the field identifier combination;
and combining the regular expression of the field identifier to be updated with the obtained regular expression of the field identifier or the regular expression of the field identifier combination by adopting an expression combination algorithm to generate the regular expression meeting the requirement of the current field.
6. The method of claim 1, in which the regular expression library comprises a general regular expression library and a custom regular expression library;
before obtaining the current field requirement of the target application service, the method further comprises:
acquiring an input field identifier and a corresponding regular expression, and a field identifier combination and a corresponding regular expression;
counting the number of application services using the input field identifications or field identification combinations;
if the number of the used fields is not less than a preset number threshold, adding the input field identification or field identification combination and the corresponding regular expression into the general regular expression library;
and if the using number is smaller than the preset number threshold, adding the input field identification or the field identification combination and the corresponding regular expression into the customized regular expression library.
7. The method of claim 6, wherein the method further comprises:
counting the use number of the application service using each field identification or the field identification combination in the customized regular expression library;
and if the usage number counted in real time is not less than the preset number threshold, removing the counted field identifications or field identification combinations and the corresponding regular expressions from the customized regular expression library, and adding the field identifications or field identification combinations and the corresponding regular expressions into the general regular expression library.
8. An apparatus for parsing a log, the apparatus comprising: the device comprises an acquisition unit, a generation unit and an analysis unit;
the obtaining unit is configured to obtain a current field requirement of the target application service, where the current field requirement includes at least one target field identifier and an order of the at least one target field identifier;
searching a stored regular expression library according to the stored regular expression of the target application service and the current field requirement, and acquiring a field identifier to be updated and an updating operation aiming at the stored regular expression, wherein the regular expression library is used for storing the regular expression of the field identifier and the regular expression of the field identifier combination;
the generating unit is configured to update the stored regular expression according to the field identifier to be updated and the update operation, and generate a regular expression meeting the requirement of the current field;
and the analysis unit is used for analyzing the log of the target application service by adopting the regular expression meeting the requirement of the current field to obtain log analysis content.
9. An electronic device, characterized in that the electronic device comprises a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory are communicated with each other through the communication bus;
a memory for storing a computer program;
a processor for implementing the method steps of any of claims 1-7 when executing a program stored on a memory.
10. A computer-readable storage medium, characterized in that a computer program is stored in the computer-readable storage medium, which computer program, when being executed by a processor, carries out the method steps of any one of claims 1 to 7.
CN201911190459.1A 2019-11-28 2019-11-28 Log analysis method and device Active CN110990350B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911190459.1A CN110990350B (en) 2019-11-28 2019-11-28 Log analysis method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911190459.1A CN110990350B (en) 2019-11-28 2019-11-28 Log analysis method and device

Publications (2)

Publication Number Publication Date
CN110990350A true CN110990350A (en) 2020-04-10
CN110990350B CN110990350B (en) 2023-06-16

Family

ID=70087779

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911190459.1A Active CN110990350B (en) 2019-11-28 2019-11-28 Log analysis method and device

Country Status (1)

Country Link
CN (1) CN110990350B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111966641A (en) * 2020-08-18 2020-11-20 国家工业信息安全发展研究中心 Universal log normalization model configuration method and device
CN112667672A (en) * 2021-01-06 2021-04-16 北京启明星辰信息安全技术有限公司 Log analysis method and analysis device
CN115543950A (en) * 2022-09-29 2022-12-30 杭州中电安科现代科技有限公司 Data processing system for log normalization
CN112667672B (en) * 2021-01-06 2024-05-10 北京启明星辰信息安全技术有限公司 Log analysis method and analysis device

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH10161916A (en) * 1996-11-28 1998-06-19 Hitachi Ltd Detection of update conflict accompanying duplication of data base
CN105245394A (en) * 2014-07-07 2016-01-13 北京风行在线技术有限公司 Method and equipment for analyzing network access log based on layered approach
WO2016161381A1 (en) * 2015-04-03 2016-10-06 Oracle International Corporation Method and system for implementing a log parser in a log analytics system
CN107590169A (en) * 2017-04-14 2018-01-16 南方科技大学 A kind of preprocess method and system of carrier gateway data
CN108156131A (en) * 2017-10-27 2018-06-12 上海观安信息技术股份有限公司 Webshell detection methods, electronic equipment and computer storage media
US10275449B1 (en) * 2018-02-19 2019-04-30 Sas Institute Inc. Identification and parsing of a log record in a merged log record stream
CN109783330A (en) * 2018-12-10 2019-05-21 北京京东金融科技控股有限公司 Log processing method, display methods and relevant apparatus, system
CN110175161A (en) * 2019-04-25 2019-08-27 平安科技(深圳)有限公司 Method, apparatus, computer equipment and the storage medium of record log
CN110321457A (en) * 2019-04-19 2019-10-11 杭州玳数科技有限公司 Access log resolution rules generation method and device, log analytic method and system

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH10161916A (en) * 1996-11-28 1998-06-19 Hitachi Ltd Detection of update conflict accompanying duplication of data base
CN105245394A (en) * 2014-07-07 2016-01-13 北京风行在线技术有限公司 Method and equipment for analyzing network access log based on layered approach
WO2016161381A1 (en) * 2015-04-03 2016-10-06 Oracle International Corporation Method and system for implementing a log parser in a log analytics system
CN107590169A (en) * 2017-04-14 2018-01-16 南方科技大学 A kind of preprocess method and system of carrier gateway data
CN108156131A (en) * 2017-10-27 2018-06-12 上海观安信息技术股份有限公司 Webshell detection methods, electronic equipment and computer storage media
US10275449B1 (en) * 2018-02-19 2019-04-30 Sas Institute Inc. Identification and parsing of a log record in a merged log record stream
CN109783330A (en) * 2018-12-10 2019-05-21 北京京东金融科技控股有限公司 Log processing method, display methods and relevant apparatus, system
CN110321457A (en) * 2019-04-19 2019-10-11 杭州玳数科技有限公司 Access log resolution rules generation method and device, log analytic method and system
CN110175161A (en) * 2019-04-25 2019-08-27 平安科技(深圳)有限公司 Method, apparatus, computer equipment and the storage medium of record log

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111966641A (en) * 2020-08-18 2020-11-20 国家工业信息安全发展研究中心 Universal log normalization model configuration method and device
CN112667672A (en) * 2021-01-06 2021-04-16 北京启明星辰信息安全技术有限公司 Log analysis method and analysis device
CN112667672B (en) * 2021-01-06 2024-05-10 北京启明星辰信息安全技术有限公司 Log analysis method and analysis device
CN115543950A (en) * 2022-09-29 2022-12-30 杭州中电安科现代科技有限公司 Data processing system for log normalization

Also Published As

Publication number Publication date
CN110990350B (en) 2023-06-16

Similar Documents

Publication Publication Date Title
CN106940679B (en) Data processing method and device
CN108959279B (en) Data processing method, data processing device, readable medium and electronic equipment
CN109241068B (en) Method and device for comparing foreground and background data and terminal equipment
CN105357204B (en) Method and device for generating terminal identification information
CN114422267B (en) Flow detection method, device, equipment and medium
CN112364014B (en) Data query method, device, server and storage medium
CN110990350A (en) Log analysis method and device
CN111400170A (en) Data permission testing method and device
CN110851339A (en) Method and device for reporting buried point data, storage medium and terminal equipment
CN112329954A (en) Article recall method and device, terminal equipment and storage medium
CN112084179A (en) Data processing method, device, equipment and storage medium
CN112115039A (en) Test case generation method, device and equipment
CN106940710B (en) Information pushing method and device
CN107330031B (en) Data storage method and device and electronic equipment
CN112433757A (en) Method and device for determining interface calling relationship
CN110020166B (en) Data analysis method and related equipment
CN111046393A (en) Vulnerability information uploading method and device, terminal equipment and storage medium
CN116204428A (en) Test case generation method and device
CN114791914A (en) User behavior statistical method, device, equipment and medium based on Bitmap
CN111222739B (en) Nuclear power station task allocation method and nuclear power station task allocation system
CN110909288B (en) Service data processing method, device, platform, service end, system and medium
CN112667631A (en) Method, device and equipment for automatically editing service field and storage medium
CN109542986B (en) Element normalization method, device, equipment and storage medium of network data
CN113778977A (en) Data processing method and data processing device
CN112184027A (en) Task progress updating method and device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant