WO2021120782A1 - 日志中关键信息提取方法、装置、终端及存储介质 - Google Patents
日志中关键信息提取方法、装置、终端及存储介质 Download PDFInfo
- Publication number
- WO2021120782A1 WO2021120782A1 PCT/CN2020/118501 CN2020118501W WO2021120782A1 WO 2021120782 A1 WO2021120782 A1 WO 2021120782A1 CN 2020118501 W CN2020118501 W CN 2020118501W WO 2021120782 A1 WO2021120782 A1 WO 2021120782A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- expression
- log
- key information
- extended
- special
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/17—Details of further file system functions
- G06F16/1734—Details of monitoring file system events, e.g. by the use of hooks, filter drivers, logs
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/903—Querying
- G06F16/90335—Query processing
- G06F16/90344—Query processing by using string matching techniques
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Definitions
- This application relates to the field of data processing, in particular to a method, device, terminal and storage medium for extracting key information from a log.
- a log is a file used to record events that occur during the operation of the operating system or other software or messages between different users of the communication software. It is an important part of a system and plays an important role in system troubleshooting and optimization. It plays a very important role and is also an indispensable tool in the security field.
- This application provides a method, a device, a terminal, and a storage medium for extracting key information from a log, so as to solve the problem of excessively low log information processing efficiency in the prior art.
- a technical solution adopted in this application is to provide a method for extracting key information in logs, including: identifying the log category to which the log belongs, and setting the log category in advance; obtaining the extended expression corresponding to the log category, each This type of log corresponds to a pre-built extended expression.
- the extended expression includes at least one grok expression and at least one Jmte expression.
- the Jmte expression is preset according to the key information to be extracted; to determine whether the extended expression is a special expression Formula; if yes, use the preset parsing rules and special expressions corresponding to the special expression to extract the key information from the log; if not, use the extended expression to extract the key information from the log.
- another technical solution adopted by this application is to provide a device for extracting key information in a log, including: an identification module for identifying the log category to which the log belongs, the log category is preset; the acquisition module uses In order to obtain the extended expression corresponding to the log category, each log category corresponds to a pre-built extended expression.
- the extended expression includes at least one grok expression and at least one Jmte expression.
- the Jmte expression is pre-defined according to the key information that needs to be extracted.
- judgment module used to judge whether the extended expression is a special expression
- the first extraction module used when the extended expression is a special expression, use the preset parsing rules and special expressions corresponding to the special expression Extract key information from the log
- the second extraction module is used to extract key information from the log by using the extended expression when the extended expression is not a special expression.
- a terminal including a memory, a processor, and a computer program stored in the memory and capable of running on the processor, where the processor executes the computer program when the computer program is executed.
- the following steps are implemented: identify the log category to which the log belongs, and the log category is preset; obtain the extended expression corresponding to the log category, each log category corresponds to a pre-built extended expression, the extended expression includes at least one grok expression and at least A Jmte expression, the Jmte expression is preset according to the key information that needs to be extracted; it is judged whether the extended expression is a special expression; if it is, the preset parsing rules and special expressions corresponding to the special expression are used from the log Extract key information; if not, use extended expressions to extract key information from the log.
- another technical solution adopted in this application is to provide a storage medium in which a program file capable of implementing the method for extracting key information in the log is stored, and the program file is executed by the processor to implement the following steps: identification The log category to which the log belongs, the log category is preset; the extended expression corresponding to the log category is obtained, and each log category corresponds to a pre-built extended expression.
- the extended expression includes at least one grok expression and at least one Jmte expression,
- the Jmte expression is preset according to the key information to be extracted; it is judged whether the extended expression is a special expression; if it is, the key information is extracted from the log using the preset parsing rules and special expressions corresponding to the special expression; if Otherwise, use extended expressions to extract key information from the log.
- the beneficial effect of this application is that the method for extracting key information in the log of this application uses extended expressions composed of grok expressions and Jmte expressions that can handle specific formats and characters and support specific type conversion, so that each extended expression can be Completely extract all the key information of a log, and solve the problem of complicated configuration and low efficiency when using the main extractor and the auxiliary extractor to extract the same log at the same time.
- the extended expression is a special expression
- use the preset parsing rules corresponding to the special expression and the special expression to extract key information from the log, optimize the extraction process of this type of log, and improve the performance of this type of log. Extraction efficiency.
- FIG. 1 is a schematic flowchart of a method for extracting key information from a log in the first embodiment of the present application
- FIG. 2 is a schematic flowchart of a method for extracting key information from a log in a second embodiment of the present application
- FIG. 3 is a schematic flowchart of a method for extracting key information from a log in a third embodiment of the present application
- FIG. 4 is a schematic diagram of functional modules of an apparatus for extracting key information in a log according to an embodiment of the present application
- FIG. 5 is a schematic structural diagram of a terminal according to an embodiment of the present application.
- FIG. 6 is a schematic structural diagram of a storage medium according to an embodiment of the present application.
- FIG. 1 is a schematic flowchart of a method for extracting key information from a log in a first embodiment of the present application. It should be noted that if there is substantially the same result, the method of the present application is not limited to the sequence of the process shown in FIG. 1. As shown in Figure 1, the method includes steps:
- Step S101 Identify the log category to which the log belongs, and the log category is preset.
- the log category is a category that is divided according to the common characteristics between the logs after the user advances the common characteristics between the logs. For example, logs generated by user access can be classified as access logs, application running logs, and so on. After the log category is divided, when the log to be extracted is obtained, the log category to which the log belongs is identified.
- Step S102 Obtain the extended expression corresponding to the log category, and each log category corresponds to a pre-built extended expression.
- the extended expression includes at least one grok expression and at least one Jmte expression, and the Jmte expression is preset according to the key information to be extracted.
- Jmte Java Minimal Template Engine
- Table 1 shows some Jmte expressions and their uses. For details, please refer to Table 1:
- the extended expression extracts key information in the log in this embodiment.
- the user access log is:
- the extended expression includes at least one grok expression and at least one Jmte expression
- the user can register a corresponding processing program in the Jmte expression for parsing nested formats that are not convenient for parsing through grok expressions.
- the text can also be used to increase the type conversion that grok does not support, and convert the obtained text into a specific type of field, such as a date (date) type field.
- Step S103 Determine whether the extended expression is a special expression. If yes, go to step S104; if not, go to step S105.
- step S103 the special expression refers to an expression among all extended expressions.
- Step S104 Use preset parsing rules and special expressions corresponding to the special expressions to extract key information from the log.
- each special expression corresponds to a type of log, and also corresponds to a preset parsing rule.
- the preset parsing rule is a rule set by the developer based on the common feature information of logs of the same type. Based on the common feature information of the log, corresponding parsing rules are adopted to improve the efficiency of extracting the key information of the log.
- the special expression includes a first special expression.
- the preset parsing rule corresponding to the special expression and the special expression are used from The steps to extract key information from the log include:
- the extended expression is the first special expression
- the first Jmte expression at the end of the first special expression is executed to extract the first key information at the end of the log.
- the end of the first special expression is preset to be used for extraction The first Jmte expression of the first key information.
- the first special expression refers to an extended expression whose end is set to the first Jmte expression.
- the first Jmte expression is executed first to extract the end of the log The first key information.
- the first preset field is set by the developer according to the log of the log category corresponding to the first special expression.
- the logs of this log category are all logs ending with the first preset field. Therefore, when the log category When the end of the log does not include the first preset field, you can confirm that the log is abnormal.
- the grok expression and the remaining Jmte expression are used to extract the key information from the log.
- the first key information does not include the first preset field, it means that the log is abnormal, and the data in it may also be abnormal. Therefore, by stopping the use of extended expressions to extract key information, there is no need Execute other grok expressions or Jmte expressions to reduce resource occupation, and also help developers find log abnormalities early, and improve the efficiency of log batch processing. It should be noted that the Java program called in Jmte can directly recognize and extract the characters at the end of the log. Therefore, the end of the first special expression is preset to be the first Jmte expression.
- the special expression includes a second special expression.
- the preset parsing rule corresponding to the special expression and the special expression are used from The steps to extract key information from the log include:
- the second Jmte expression in the first position of the second special expression is executed to obtain the length of the log, and the first position of the second special expression is preset to the length used to obtain the log The second Jmte expression.
- the second special expression refers to an extended expression whose first position is set as the second Jmte expression.
- the second Jmte expression is executed first to obtain the log information Length, the length of the log refers to the length of all strings in the log.
- the first preset threshold and the second preset threshold are preset by the developer, and the first preset threshold is greater than the second preset threshold.
- the first preset threshold and the second preset threshold are set by the developer after studying many sample logs of the same log category. When the length of the log of the log category is less than the first preset threshold and the second preset threshold When the threshold is within the range, the abnormality of the log can be confirmed.
- the length of the log is obtained, and the length of the log is used to determine whether the log is a normal log, and the judgment is performed before all grok expressions and Jmte expressions are executed, thereby The abnormal logs are filtered out, and there is no need to consume resources to extract their key information, which improves the efficiency of log batch processing.
- Step S105 Use the extended expression to extract key information from the log.
- step S105 when the extended expression is used to extract key information from the log, the grok expression and the Jmte expression in the extended expression are used to extract the corresponding key information from the log, respectively.
- step S104 or step S105 when step S104 or step S105 is performed, if there is a third Jmte expression that extracts fixed-length field information in the extended expression, the third Jmte expression is used to extract fixed-length field information.
- the specific steps include:
- the third Jmte expression is configured by the developer to identify special characters in the log and obtain the length of the field between two adjacent special characters.
- the third Jmte expression can identify space characters, two The string between spaces is a field.
- each third Jmte expression is preset with the target length of the field information to be extracted.
- the required key information is extracted by using the third Jmte expression to match the fixed-length field information, which does not need to match character by character, which greatly reduces the amount of data processing, and makes the extraction of key information in the log more efficient.
- the method for extracting key information in a log in the first embodiment of the present application divides the log into categories, and then sets specific parsing rules for some of the special log categories.
- the extended expressions are made up of grok expressions and can handle specific formats and characters, and support specific types
- the converted Jmte expression composition allows each extended expression to completely extract all the key information of a log, solves the problem of complex configuration and low efficiency when using the main extractor and the auxiliary extractor to extract the same log at the same time, and optimizes the log
- the information extraction process improves the log extraction efficiency.
- Fig. 2 is a schematic flowchart of a method for extracting key information from a log in a second embodiment of the present application. It should be noted that if there is substantially the same result, the method of the present application is not limited to the sequence of the process shown in FIG. 2. As shown in Figure 2, the method includes steps:
- Step S201 Identify the log category to which the log belongs, and the log category is preset.
- step S201 in FIG. 2 is similar to step S101 in FIG. 1. For the sake of brevity, it will not be repeated here.
- Step S202 Obtain an extended expression corresponding to a log category, and each log category corresponds to a pre-built extended expression.
- step S202 in FIG. 2 is similar to step S102 in FIG.
- Step S203 Parse the extended expression to split the extended expression into multiple segments, and each segment corresponds to a grok expression or a Jmte expression.
- step S203 it is first necessary to understand that the extended expression in this embodiment is composed of at least one grok expression and at least one Jmte expression. Therefore, in order to simplify the process of extracting key information, the extended expression is divided in advance. Divided into multiple segments, each segment corresponds to a grok expression or a Jmte expression, and then execute grok expressions or Jmte expressions one by one to extract key information from the log.
- Step S204 Determine whether the extended expression is a special expression. If yes, go to step S205; if not, go to step S206.
- step S204 in FIG. 2 is similar to step S103 in FIG.
- Step S205 Use preset parsing rules and special expressions corresponding to the special expressions to extract key information from the log.
- step S205 in FIG. 2 is similar to step S104 in FIG.
- Step S206 Use extended expressions to extract key information from the log
- step S206 in FIG. 2 is similar to step S105 in FIG.
- extracting key information from the log using the extended expression specifically includes:
- the log is usually generated and stored in the form of text.
- when extracting key information from the log first obtain all the text information of the log.
- the above method can also be used to improve the extraction efficiency of the key information in the log.
- the method for extracting key information in the log of the second embodiment of the present application is based on the first embodiment, by splitting the extended expression into multiple segments, each segment corresponding to a grok expression or a Jmte expression, and then using grok expressions one by one
- the expression or Jmte expression extracts key information from the log, and every time a grok expression or Jmte expression is executed, the extracted key information is removed from the text information in the log, so that the subsequent grok expression or Jmte expression
- the data that the formula needs to match is less and less, and the extraction efficiency is getting higher and higher.
- Fig. 3 is a schematic flowchart of a method for extracting key information from a log in a third embodiment of the present application. It should be noted that if there are substantially the same results, the method of the present application is not limited to the sequence of the process shown in FIG. 3. As shown in Figure 3, the method includes steps:
- Step S301 Identify the log category to which the log belongs, and the log category is preset.
- step S301 in FIG. 3 is similar to step S201 in FIG. 2, and for the sake of brevity, it will not be repeated here.
- Step S302 Obtain the extended expression corresponding to the log category, and each log category corresponds to a pre-built extended expression.
- step S302 in FIG. 3 is similar to step S202 in FIG. 2, and for the sake of brevity, it will not be repeated here.
- Step S303 It is judged whether there is a parsed extended expression in the memory. If it exists, go to step S304; if it does not exist, go to step S305.
- Step S304 directly call the parsed extended expression.
- Step S305 Parse the extended expression to split the extended expression into multiple segments, each segment corresponds to a grok expression or a Jmte expression, and store the parsed extended expression in the memory.
- step S303 to step S305 store the split extended expression in the memory.
- the extended expression needs to be used, it can be directly retrieved from the memory for use, and there is no need to analyze and decompose the extended expression. It further avoids the occupation of system resources, reduces the processing flow, and improves the extraction efficiency.
- the parsed extended expression does not exist in the memory, it is parsed and stored in the memory for subsequent use.
- Step S306 Determine whether the extended expression is a special expression. If yes, go to step S307; if not, go to step S308.
- step S306 in FIG. 3 is similar to step S204 in FIG. 2, and for the sake of brevity, it will not be repeated here.
- Step S307 Use preset parsing rules and special expressions corresponding to the special expressions to extract key information from the log.
- step S307 in FIG. 3 is similar to step S205 in FIG. 2, which is not repeated here for simplicity.
- Step S308 Use extended expressions to extract key information from the log
- step S308 in FIG. 3 is similar to step S206 in FIG. 2, and for the sake of brevity, it will not be repeated here.
- the method for extracting key information from the log in the third embodiment of the present application is based on the second embodiment, and stores the parsed extended expressions by setting a memory, so that in the process of extracting key information from the log, the extension does not need to be extended every time.
- the expression is parsed, making the log extraction faster and more efficient.
- Fig. 4 is a schematic diagram of functional modules of an apparatus for extracting key information from a log in an embodiment of the present application.
- the device 40 includes an identification module 41, an acquisition module 42, a judgment module 43, a first extraction module 44 and a second extraction module 45.
- the identification module 41 is used to identify the log category to which the log belongs, and the log category is preset.
- the obtaining module 42 is used to obtain the extended expression corresponding to the log category.
- Each log category corresponds to a pre-built extended expression.
- the extended expression includes at least one grok expression and at least one Jmte expression.
- the Jmte expression is required
- the key information to be extracted is set in advance.
- the judging module 43 is used to judge whether the extended expression is a special expression.
- the first extraction module 44 is configured to extract key information from the log by using preset parsing rules and special expressions corresponding to the special expression when the extended expression is a special expression.
- the second extraction module 45 is used to extract key information from the log by using the extended expression when the extended expression is not a special expression.
- the first extraction module 44 uses preset parsing rules and special expressions corresponding to the special expression to extract key information from the log.
- the operation may also be: when the extended expression is the first special expression, execute the first special expression.
- the first Jmte expression at the end of a special expression is used to extract the first key information at the end of the log.
- the end of the first special expression is preset to be the first Jmte expression used to extract the first key information; to determine the first key information Whether to include the first preset field; if so, use grok expression and the remaining Jmte expression to extract key information from the log; if not, determine that the log is abnormal and stop extracting key information.
- the first extraction module 44 uses preset parsing rules and special expressions corresponding to the special expression to extract key information from the log.
- the operation may also be: when the extended expression belongs to the second special expression, execute the first The second Jmte expression in the first position of the special expression to obtain the length of the log, the first position of the second special expression is preset to be the second Jmte expression used to obtain the length of the log; to determine whether the length of the log is greater than the first preset The threshold is or less than the second preset threshold; if it is, it is determined that the log is abnormal and the extraction of key information is stopped. If not, the grok expression and the remaining Jmte expressions are used to extract the key information from the log.
- the obtaining module 42 obtains the operation of the extended expression corresponding to the log category, it is also used to parse the extended expression to split the extended expression into multiple segments, and each segment corresponds to a grok expression or a Jmte expression.
- the obtaining module 42 to parse the extended expression it is also used to: determine whether there is a parsed extended expression in the memory; if it exists, directly call the parsed extended expression; if it does not exist, then Perform the operation of parsing the extended expression to split the extended expression into multiple segments, and store the parsed extended expression in the memory.
- the operation of the second extraction module 45 to extract key information from the log using extended expressions may also include: obtaining the text information of the log; using grok expressions or Jmte expressions to extract key information from the text information one by one, and After the key information is extracted, the key information is removed from the text information until the extraction is completed.
- the operation of the first extraction module 44 or the second extraction module 45 to execute the third Jmte expression is specifically: through the third Jmte After the expression recognizes the special characters in the log, obtain the string length of the field between any two adjacent special characters; obtain the target length of the field information to be extracted from the third Jmte expression; extract the field matching the target length, Get field information.
- FIG. 5 is a schematic structural diagram of a terminal according to an embodiment of the application.
- the terminal 50 includes a processor 51, a memory 52 coupled to the processor 51, and a computer program stored on the memory 52 and running on the processor 51.
- the processor 51 implements the foregoing when the computer program is executed. The method for extracting key information from the log in the embodiment.
- the processor 51 may also be referred to as a CPU (Central Processing Unit, central processing unit).
- the processor 51 may be an integrated circuit chip with signal processing capability.
- the processor 51 may also be a general-purpose processor, a digital signal processor (DSP), an application specific integrated circuit (ASIC), a field programmable gate array (FPGA) or other programmable logic device, a discrete gate or transistor logic device, or a discrete hardware component.
- DSP digital signal processor
- ASIC application specific integrated circuit
- FPGA field programmable gate array
- the general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.
- FIG. 6 is a schematic structural diagram of a storage medium according to an embodiment of the application.
- the storage medium of the embodiment of the present application stores a program file 61 that can implement all the above methods.
- the program file 61 can be stored in the above storage medium in the form of a software product, and includes a number of instructions to enable a computer device (which can It is a personal computer, a server, or a network device, etc.) or a processor (processor) that executes all or part of the steps of the methods described in the various embodiments of the present application.
- the aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disks or optical disks and other media that can store program codes. , Or terminal devices such as computers, servers, mobile phones, and tablets.
- the computer-readable storage medium may be non-volatile or volatile
- the disclosed terminal, device, and method may be implemented in other ways.
- the device embodiments described above are merely illustrative, for example, the division of units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components can be combined or integrated. To another system, or some features can be ignored, or not implemented.
- the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
- the functional units in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
- the above-mentioned integrated unit can be implemented in the form of hardware or software functional unit. The above are only implementations of this application, and do not limit the scope of this application. Any equivalent structure or equivalent process transformation made using the content of the description and drawings of this application, or directly or indirectly applied to other related technical fields, The same reasoning is included in the scope of patent protection of this application.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Debugging And Monitoring (AREA)
- Machine Translation (AREA)
Abstract
一种日志中关键信息提取方法、装置、终端及存储介质,其中方法包括:识别日志所属的日志类别,日志类别预先设定(S101);获取日志类别对应的扩展表达式,每种日志类别对应一个预先构建的扩展表达式(S102),扩展表达式包括grok表达式和Jmte表达式,Jmte表达式根据所需提取的关键信息预先设定;判断扩展表达式是否为特殊表达式(S103);若是,则利用与特殊表达式对应的预设解析规则和特殊表达式从日志中提取关键信息(S104);若否,则利用扩展表达式从日志中提取关键信息(S105)。通过利用grok表达式和Jmte表达式组合为扩展表达式,能够批量提取不同格式日志中的文本信息,并且针对于部分日志采用特殊表达式按照预设解析规则提取,进一步提升了处理效率。
Description
本申请要求于2020年07月28日提交中国专利局、申请号为202010737229.9,发明名称为“日志中关键信息提取方法、装置、终端及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
本申请涉及数据处理领域,特别是涉及一种日志中关键信息提取方法、装置、终端及存储介质。
在计算机中,日志是用于记录在操作系统或其他软件运行中发生的事件或在通信软件的不同用户之间的消息的文件,是一个系统的重要组成部分,在系统排错、优化方面起着很重要的作用,同时也是安全领域不可或缺的工具。
为了发掘日志的价值,开发人员经常需要分析大量的日志,其需要提取每条日志中的某些特定的内容,比如IP地址、生成时间等等。虽然日志内容通常遵循一定的模式,但是这种模式往往是隐晦的,是不容易直观获取到的。所以在提取某些特定的内容时,通常会根据提取的内容设计对应的正则表达式,然后依据正则表达式来提取日志中特定的内容。但是,发明人发现现有方案利用正则表达式提取日志中的内容时,需要执行正则表达式来提取每条日志的信息,提取方式呆板、效率低下,且部分情况下还需要设计语法复杂的正则表达式才能提取到需要的信息,或者是同一条日志需要配置多个提取器才能提取出所有需要的信息,而复杂语法的正则表达式以及复杂的配置,均会降低日志的处理效率降低。
发明内容
本申请提供一种日志中关键信息提取方法、装置、终端及存储介质,以解决现有技术中日志信息处理效率过于低下的问题。
为解决上述技术问题,本申请采用的一个技术方案是:提供一种日志中关键信息提取方法,包括:识别日志所属的日志类别,日志类别预先设定;获取日志类别对应的扩展表达式,每种日志类别对应一个预先构建的扩展表达式,扩展表达式包括至少一个grok表达式和至少一个Jmte表达式,Jmte表达式根据所需提取的关键信息预先设定;判断扩展表达式是否为特殊表达式;若是,则利用与特殊表达式对应的预设解析规则和特殊表达式从日志中提取关键信息;若否,则利用扩展表达式从日志中提取关键信息。
为解决上述技术问题,本申请采用的另一个技术方案是:提供一种日志中关键信息提取装置,包括:识别模块,用于识别日志所属的日志类别,日志类别预先设定;获取模块,用于获取日志类别对应的扩展表达式,每种日志类别对应一个预先构建的扩展表达式,扩展表达式包括至少一个grok表达式和至少一个Jmte表达式,Jmte表达式根据所需提取的关键信息预先设定;判断模块,用于判断扩展表达式是否为特殊表达式;第一提取模块,用于当扩展表达式为特殊表达式时,利用与特殊表达式对应的预设解析规则和特殊表达式从日志中提取关键信息;第二提取模块,用于当扩展表达式不为特殊表达式时,利用扩展表达式从日志中提取关键信息。
为解决上述技术问题,本申请采用的再一个技术方案是:提供一种终端,包括存储器、处理器以及存储在存储器上并可在处理器上运行的计算机程序,其中,处理器执行计算机程序时实现以下步骤:识别日志所属的日志类别,日志类别预先设 定;获取日志类别对应的扩展表达式,每种日志类别对应一个预先构建的扩展表达式,扩展表达式包括至少一个grok表达式和至少一个Jmte表达式,Jmte表达式根据所需提取的关键信息预先设定;判断扩展表达式是否为特殊表达式;若是,则利用与特殊表达式对应的预设解析规则和特殊表达式从日志中提取关键信息;若否,则利用扩展表达式从日志中提取关键信息。
为解决上述技术问题,本申请采用的再一个技术方案是:提供一种存储介质,其中,存储有能够实现日志中关键信息提取方法的程序文件,程序文件被处理器执行时实现以下步骤:识别日志所属的日志类别,日志类别预先设定;获取日志类别对应的扩展表达式,每种日志类别对应一个预先构建的扩展表达式,扩展表达式包括至少一个grok表达式和至少一个Jmte表达式,Jmte表达式根据所需提取的关键信息预先设定;判断扩展表达式是否为特殊表达式;若是,则利用与特殊表达式对应的预设解析规则和特殊表达式从日志中提取关键信息;若否,则利用扩展表达式从日志中提取关键信息。
本申请的有益效果是:本申请的日志中关键信息提取方法通过使用由grok表达式与可以处理特定格式和字符、支持特定类型转换的Jmte表达式组成扩展表达式,使得每个扩展表达式可以完整提取一条日志的所有关键信息,解决了使用主提取器和附提取器同时提取同一条日志时配置复杂、效率低下的问题。此外,当扩展表达式为特殊表达式时,利用与该特殊表达式对应的预设解析规则和该特殊表达式从日志中提取关键信息,优化该类日志的提取过程,从而提升该类日志的提取效率。
图1是本申请第一实施例的日志中关键信息提取方法的流程示意图;
图2是本申请第二实施例的日志中关键信息提取方法的流程示意图;
图3是本申请第三实施例的日志中关键信息提取方法的流程示意图;
图4是本申请实施例的日志中关键信息提取装置的功能模块示意图;
图5是本申请实施例的终端的结构示意图;
图6是本申请实施例的存储介质的结构示意图。
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅是本申请的一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
图1是本申请第一实施例的日志中关键信息提取方法的流程示意图。需注意的是,若有实质上相同的结果,本申请的方法并不以图1所示的流程顺序为限。如图1所示,该方法包括步骤:
步骤S101:识别日志所属的日志类别,日志类别预先设定。
在步骤S101中,该日志类别是由用户预先各个日志之间的共同特征后,根据日志之间的共同特征划分的类别。例如,用户访问产生的日志可以归类为访问日志,应用程序的运行日志等。通过划分日志类别后,在获取到待提取的日志时,识别该日志所属的日志类别。
步骤S102:获取日志类别对应的扩展表达式,每种日志类别对应一个预先构建的扩展表达式。
需要说明的是,扩展表达式包括至少一个grok表达式和至少一个Jmte表达式,Jmte表达式根据所需提取的关键信息预先设定。
其中,grok是Logstash(是一个开源的日志收集管理工具)最重要的插件,可 以在grok里预定义正则表达式,利用该正则表达式即可解析日志。Jmte(Java Minimal Template Engine)表达式通过调用Java方法将提取到的字段交给执行引擎保存,具有截取固定长度字符串、url解码、按固定分隔符、拆解字符串、跳过字符、校验特殊字符等作用,下表1展示了部分Jmte表达式及其用途,具体请参阅下表1:
表1
以一个例子说明本实施例中扩展表达式提取日志中的关键信息,例如,用户访问日志为:
2020-07-17 16:26:50.871DEBUG[accesslog]SomeOne 192.168.1.1company.com.cn 200 0.030GET api/search?query=where&filter=group%3A001&limit=10&offset=20&sort=time%3Adesc"Mozilla/5.0(Windows NT 6.1;Win64;x64)AppleWebKit/537.36(KHTML,like Gecko)Chrome/80.0.3987.122Safari/537.36";
其对应的扩展表达式为:
${@DateTime request_time 23}\s+(?<log_level>\w++)\s+\[accesslog\]\s+(?<user>\w++)\s+(?<client_ip>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\s+(?<server>[a-zA-Z]\S++)\s+(?<status>\d++)\s+(?<time_taken>\d++\.\d++)\s+(?<cs_method>[A-Z]++)\s+(?<cs_uri>[^\?\s]++)\??${@URLDecode}${@Ke yValue=&query_param_}${@Skip 1}${@QuotesString user_agent RL};
执行上述扩展表达式,提取的关键信息如下表2所示:
表2
本实施例中,扩展表达式包括至少一个grok表达式和至少一个Jmte表达式,而Jmte表达式中可以由用户注册对应的处理程序,用于解析不便于通过grok表达式解析的有嵌套格式的文本,还可用于增加grok不支持的类型转换,将获取到的文本转换为特定类型的字段,如date(日期)类型的字段。
步骤S103:判断扩展表达式是否为特殊表达式。若是,则执行步骤S104;若否,则执行步骤S105。
在步骤S103中,该特殊表达式是指所有扩展表达式中的一种表达式。
步骤S104:利用与特殊表达式对应的预设解析规则和特殊表达式从日志中提取关键信息。
在步骤S104中,每一个特殊表达式对应一种日志类别,同样也对应一种预设解析规则,预设解析规则是由开发人员根据同一类别的日志的共有特征信息设定的规则,其针对于日志的共有特征信息,采用相应的解析规则来提升提取日志的关键信息的效率。
进一步的,在一些实施例中,该特殊表达式包括第一特殊表达式,当扩展表达式为第一特殊表达式时,所述利用与特殊表达式对应的预设解析规则和特殊表达式从日志中提取关键信息的步骤,具体包括:
1、当扩展表达式为第一特殊表达式时,执行第一特殊表达式末尾的第一Jmte表达式以提取日志末尾的第一关键信息,第一特殊表达式的末尾预设为用于提取第一关键信息的第一Jmte表达式。
具体地,该第一特殊表达式是指末尾设置为第一Jmte表达式的扩展表达式,当日志对应的扩展表达式为第一特殊表达式时,优先执行第一Jmte表达式以提取日志末尾的第一关键信息。
2、判断第一关键信息中是否包括第一预设字段。
具体地,在获取到第一关键信息后,判断该第一关键信息中是否包括第一预设字段。其中,该第一预设字段由开发人员根据与第一特殊表达式对应的日志类别的日志设定,该日志类别的日志均是以第一预设字段结尾的日志,因此,当该日志类 别的日志的末尾不包括第一预设字段时,即可确认该日志出现异常。
3、当第一关键信息中包括第一预设字段时,利用grok表达式和剩余的Jmte表达式从日志中提取关键信息。
4、当第一关键信息中不包括第一预设字段时,判定日志异常并停止提取关键信息。
本实施例中,当第一关键信息中不包括第一预设字段时,说明该日志出现异常,其中的数据也可能会存在异常,因此,通过停止利用扩展表达式提取关键信息,从而不需要执行其他的grok表达式或Jmte表达式,减少对资源的占用,同时也帮助开发人员提早发现日志异常,提升了日志批量处理的效率。需要说明的是,Jmte中调用的Java程序能直接识别出日志末尾的字符并进行提取,因此,该第一特殊表达式的末尾预设为第一Jmte表达式。
进一步的,在一些实施例中,该特殊表达式包括第二特殊表达式,当扩展表达式为第二特殊表达式时,所述利用与特殊表达式对应的预设解析规则和特殊表达式从日志中提取关键信息的步骤,具体包括:
1、当扩展表达式属于第二特殊表达式时,执行第二特殊表达式首位的第二Jmte表达式以获取日志的长度,第二特殊表达式的首位预设为用于获取日志的长度的第二Jmte表达式。
具体地,该第二特殊表达式是指首位设置为第二Jmte表达式的扩展表达式,当日志对应的扩展表达式为第二特殊表达式时,优先执行第二Jmte表达式以获取日志的长度,该日志的长度是指日志中的所有字符串的长度。
2、判断日志的长度是否大于第一预设阈值或小于第二预设阈值。
需要说明的是,第一预设阈值和第二预设阈值由开发人员预先设置,且第一预设阈值大于第二预设阈值。其中,第一预设阈值和第二预设阈值由开发人员对同一日志类别的众多样本日志进行研究后设定的,当该日志类别的日志的长度在不第一预设阈值和第二预设阈值范围内时,即可确认该日志异常。
3、当日志的长度大于第一预设阈值或小于第二预设阈值时,判定日志异常并停止提取关键信息。
4、当日志的长度在第一预设阈值和第二预设阈值之间时(包括与第一预设阈值或第二预设阈值相等时),利用grok表达式和剩余的Jmte表达式从日志中提取关键信息。
本实施例中,当日志对应第二特殊表达式时,获取该日志的长度,并利用日志的长度判断日志是否为正常日志,而在执行所有的grok表达式和Jmte表达式之前进行判断,从而筛选出异常的日志,且不需要耗费资源去提取其关键信息,提升了日志批量处理的效率。
步骤S105:利用扩展表达式从日志中提取关键信息。
在步骤S105中,在使用扩展表达式从日志中提取关键信息时,分别利用扩展表达式中的grok表达式和Jmte表达式从日志中提取对应的关键信息。
进一步的,本实施例中,在执行步骤S104或步骤S105时,若扩展表达式中存在提取固定长度的字段信息的第三Jmte表达式,则利用第三Jmte表达式提取固定长度的字段信息的步骤具体包括:
1、通过第三Jmte表达式识别日志中的特殊字符后,获取任意相邻两个特殊字符间字段的字符串长度。
需要说明的是,该第三Jmte表达式被开发人员配置为用于识别日志中的特殊字符并获取相邻两个特殊字符间字段的长度,例如第三Jmte表达式可以识别空格符, 两个空格符之间的字符串为一个字段。
2、从第三Jmte表达式中获取需要提取的字段信息的目标长度。
具体地,每个第三Jmte表达式均预先设定了需要提取的字段信息的目标长度。
3、提取与目标长度匹配的字段,得到字段信息。
具体地,通过利用第三Jmte表达式匹配固定长度字段信息来提取需要的关键信息,其不需要逐个字符进行匹配,大幅度降低了数据处理量,使得日志中关键信息的提取效率更高。
本申请第一实施例的日志中关键信息提取方法通过将日志在类别上进行划分,再针对于其中的部分特殊的日志类别设定特定的解析规则,当从该特殊的日志类别中提取关键信息时,利用该特定的解析规则和对应的扩展表达式从日志中提取,以提升日志的关键信息的提取效率,并且,扩展表达式是由grok表达式与可以处理特定格式和字符、支持特定类型转换的Jmte表达式组成,使得每个扩展表达式可以完整提取一条日志的所有关键信息,解决了使用主提取器和附提取器同时提取同一条日志时配置复杂、效率低下的问题,优化了日志信息的提取过程,提升了日志提取效率。
图2是本申请第二实施例的日志中关键信息提取方法的流程示意图。需注意的是,若有实质上相同的结果,本申请的方法并不以图2所示的流程顺序为限。如图2所示,该方法包括步骤:
步骤S201:识别日志所属的日志类别,日志类别预先设定。
在本实施例中,图2中的步骤S201和图1中的步骤S101类似,为简约起见,在此不再赘述。
步骤S202:获取日志类别对应的扩展表达式,每种日志类别对应一个预先构建的扩展表达式。
在本实施例中,图2中的步骤S202和图1中的步骤S102类似,为简约起见,在此不再赘述。
步骤S203:解析扩展表达式,以将扩展表达式拆分为多段,每段对应一个grok表达式或一个Jmte表达式。
在步骤S203中,首先需要理解的是,本实施例中的扩展表达式是由至少一个grok表达式和至少一个Jmte表达式组成,因此,为了简化关键信息的提取过程,预先将扩展表达式拆分为多段,每段对应一个grok表达式或一个Jmte表达式,后续再逐个执行grok表达式或Jmte表达式,以从日志中提取关键信息。
步骤S204:判断扩展表达式是否为特殊表达式。若是,则执行步骤S205;若否,则执行步骤S206。
在本实施例中,图2中的步骤S204和图1中的步骤S103类似,为简约起见,在此不再赘述。
步骤S205:利用与特殊表达式对应的预设解析规则和特殊表达式从日志中提取关键信息。
在本实施例中,图2中的步骤S205和图1中的步骤S104类似,为简约起见,在此不再赘述。
步骤S206:利用扩展表达式从日志中提取关键信息
在本实施例中,图2中的步骤S206和图1中的步骤S105类似,为简约起见,在此不再赘述。
进一步的,为了进一步提升提取效率,在一些实施例中,利用扩展表达式从日志中提取关键信息具体包括:
1、获取日志的文本信息。
需要说明的是,日志通常会以文本的形式生成和存储,本实施例中,在从日志中提取关键信息时,先获取日志的所有文本信息。
2、逐个利用grok表达式或Jmte表达式从文本信息中提取关键信息,并在提取到关键信息后从文本信息中移除关键信息,直至提取完成。
具体地,从文本信息中提取关键信息时,首先利用grok表达式或Jmte表达式从文本信息提取关键信息,且将被提取的关键信息从文本信息中删除,再将剩余的文本信息交由下一个grok表达式或Jmte表达式进行提取,因此,每提取一次关键信息,剩余的文本信息中包括的数据减少一部分,从而后续的表达式在提取关键信息时需要处理的数据量更少,使得关键信息的提取速度越来越快,从而提升日志中关键信息的提取效率。
需要说明的是,步骤S205中利用与特殊表达式对应的预设解析规则和特殊表达式从日志中提取关键信息时,同样可以采用上述方式以提升日志中关键信息的提取效率。
本申请第二实施例的日志中关键信息提取方法在第一实施例的基础上,通过将扩展表达式拆分为多段,每段对应一个grok表达式或一个Jmte表达式,再逐个利用grok表达式或Jmte表达式从日志中提取关键信息,并且,每执行一个grok表达式或Jmte表达式,则将被提取的关键信息从日志的文本信息中剔除,从而使得后续的grok表达式或Jmte表达式所需要匹配的数据越来越少,提取的效率越来越高。
图3是本申请第三实施例的日志中关键信息提取方法的流程示意图。需注意的是,若有实质上相同的结果,本申请的方法并不以图3所示的流程顺序为限。如图3所示,该方法包括步骤:
步骤S301:识别日志所属的日志类别,日志类别预先设定。
在本实施例中,图3中的步骤S301和图2中的步骤S201类似,为简约起见,在此不再赘述。
步骤S302:获取日志类别对应的扩展表达式,每种日志类别对应一个预先构建的扩展表达式。
在本实施例中,图3中的步骤S302和图2中的步骤S202类似,为简约起见,在此不再赘述。
步骤S303:判断存储器中是否存在解析好的扩展表达式。若存在,则执行步骤S304;若不存在,则执行步骤S305。
步骤S304:直接调取解析好的扩展表达式。
步骤S305:解析扩展表达式,以将扩展表达式拆分为多段,每段对应一个grok表达式或一个Jmte表达式,且将解析好的扩展表达式存储至存储器。
在步骤S303~步骤S305中,将拆分好的扩展表达式存储至存储器中,当需要使用该扩展表达式时,直接从存储器中调取即可使用,不需要再对扩展表达式进行解析拆分,进一步避免占用系统资源,同时缩减了处理流程,提升了提取效率。并且,当存储器中不存在解析好的扩展表达式时,将其解析后存储至存储器中,以供后续使用。
步骤S306:判断扩展表达式是否为特殊表达式。若是,则执行步骤S307;若否,则执行步骤S308。
在本实施例中,图3中的步骤S306和图2中的步骤S204类似,为简约起见,在此不再赘述。
步骤S307:利用与特殊表达式对应的预设解析规则和特殊表达式从日志中提 取关键信息。
在本实施例中,图3中的步骤S307和图2中的步骤S205类似,为简约起见,在此不再赘述。
步骤S308:利用扩展表达式从日志中提取关键信息
在本实施例中,图3中的步骤S308和图2中的步骤S206类似,为简约起见,在此不再赘述。
本申请第三实施例的日志中关键信息提取方法在第二实施例的基础上,通过设置存储器存储解析好的扩展表达式,从而在日志的关键信息提取过程中,不需要每次都对扩展表达式进行解析,使得日志提取速度更快,效率更高。
图4是本申请实施例的日志中关键信息提取装置的功能模块示意图。如图4所示,该装置40包括识别模块41、获取模块42、判断模块43、第一提取模块44和第二提取模块45。
识别模块41,用于识别日志所属的日志类别,日志类别预先设定。
获取模块42,用于获取日志类别对应的扩展表达式,每种日志类别对应一个预先构建的扩展表达式,扩展表达式包括至少一个grok表达式和至少一个Jmte表达式,Jmte表达式根据所需提取的关键信息预先设定。
判断模块43,用于判断扩展表达式是否为特殊表达式。
第一提取模块44,用于当扩展表达式为特殊表达式时,利用与特殊表达式对应的预设解析规则和特殊表达式从日志中提取关键信息。
第二提取模块45,用于当扩展表达式不为特殊表达式时,利用扩展表达式从日志中提取关键信息。
可选地,第一提取模块44利用与特殊表达式对应的预设解析规则和特殊表达式从日志中提取关键信息的操作还可以为:当扩展表达式为第一特殊表达式时,执行第一特殊表达式末尾的第一Jmte表达式以提取日志末尾的第一关键信息,第一特殊表达式的末尾预设为用于提取第一关键信息的第一Jmte表达式;判断第一关键信息中是否包括第一预设字段;若是,则利用grok表达式和剩余的Jmte表达式从日志中提取关键信息;若否,则判定日志异常并停止提取关键信息。
可选地,第一提取模块44利用与特殊表达式对应的预设解析规则和特殊表达式从日志中提取关键信息的操作还可以为:当扩展表达式属于第二特殊表达式时,执行第二特殊表达式首位的第二Jmte表达式以获取日志的长度,第二特殊表达式的首位预设为用于获取日志的长度的第二Jmte表达式;判断日志的长度是否大于第一预设阈值或小于第二预设阈值;若是,则判定日志异常并停止提取关键信息若否,则利用grok表达式和剩余的Jmte表达式从日志中提取关键信息。
可选地,获取模块42获取日志类别对应的扩展表达式的操作之后,还用于解析扩展表达式,以将扩展表达式拆分为多段,每段对应一个grok表达式或一个Jmte表达式。
可选地,获取模块42解析扩展表达式的操作之前,还用于:判断存储器中是否存在解析好的扩展表达式;若存在,则直接调取解析好的扩展表达式;若不存在,则执行解析扩展表达式,以将扩展表达式拆分为多段的操作,且将解析好的扩展表达式存储至存储器。
可选地,第二提取模块45利用扩展表达式从日志中提取关键信息的操作,还可以为:获取日志的文本信息;逐个利用grok表达式或Jmte表达式从文本信息中提取关键信息,并在提取到关键信息后从文本信息中移除关键信息,直至提取完成。
可选地,当扩展表达式中存在提取固定长度的字段信息的第三Jmte表达式时, 第一提取模块44或第二提取模块45执行第三Jmte表达式的操作具体为:通过第三Jmte表达式识别日志中的特殊字符后,获取任意相邻两个特殊字符间字段的字符串长度;从第三Jmte表达式中获取需要提取的字段信息的目标长度;提取与目标长度匹配的字段,得到字段信息。
请参阅图5,图5为本申请实施例的终端的结构示意图。如图5所示,该终端50包括处理器51、和处理器51耦接的存储器52、存储在存储器52上并可在处理器51上运行的计算机程序,处理器51执行计算机程序时实现上述实施例中的日志中关键信息提取方法。
其中,处理器51还可以称为CPU(Central Processing Unit,中央处理单元)。处理器51可能是一种集成电路芯片,具有信号的处理能力。处理器51还可以是通用处理器、数字信号处理器(DSP)、专用集成电路(ASIC)、现场可编程门阵列(FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。
参阅图6,图6为本申请实施例的存储介质的结构示意图。本申请实施例的存储介质存储有能够实现上述所有方法的程序文件61,其中,该程序文件61可以以软件产品的形式存储在上述存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)或处理器(processor)执行本申请各个实施方式所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质,或者是计算机、服务器、手机、平板等终端设备。所述计算机可读存储介质可以是非易失性,也可以是易失性
在本申请所提供的几个实施例中,应该理解到,所揭露的终端,装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。以上仅为本申请的实施方式,并非因此限制本申请的专利范围,凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本申请的专利保护范围内。
Claims (21)
- 一种日志中关键信息提取方法,其中,包括:识别日志所属的日志类别,所述日志类别预先设定;获取所述日志类别对应的扩展表达式,每种日志类别对应一个预先构建的扩展表达式,所述扩展表达式包括至少一个grok表达式和至少一个Jmte表达式,所述Jmte表达式根据所需提取的关键信息预先设定;判断所述扩展表达式是否为特殊表达式;若是,则利用与所述特殊表达式对应的预设解析规则和所述特殊表达式从所述日志中提取关键信息;若否,则利用所述扩展表达式从所述日志中提取关键信息。
- 根据权利要求1所述的日志中关键信息提取方法,其中,所述利用与所述特殊表达式对应的预设解析规则和所述特殊表达式从所述日志中提取关键信息,包括:当所述扩展表达式为第一特殊表达式时,执行所述第一特殊表达式末尾的第一Jmte表达式以提取所述日志末尾的第一关键信息,所述第一特殊表达式的末尾预设为用于提取第一关键信息的所述第一Jmte表达式;判断所述第一关键信息中是否包括第一预设字段;若是,则利用所述grok表达式和剩余的所述Jmte表达式从所述日志中提取关键信息;若否,则判定所述日志异常并停止提取所述关键信息。
- 根据权利要求1所述的日志中关键信息提取方法,其中,所述利用与所述特殊表达式对应的预设解析规则和所述特殊表达式从所述日志中提取关键信息,包括:当所述扩展表达式属于所述第二特殊表达式时,执行所述第二特殊表达式首位的第二Jmte表达式以获取所述日志的长度,所述第二特殊表达式的首位预设为用于获取所述日志的长度的所述第二Jmte表达式;判断所述日志的长度是否大于第一预设阈值或小于第二预设阈值,所述第一预设阈值大于所述第二预设阈值;若是,则判定所述日志异常并停止提取所述关键信息若否,则利用所述grok表达式和剩余的所述Jmte表达式从所述日志中提取关键信息。
- 根据权利要求1所述的日志中关键信息提取方法,其中,所述获取所述日志类别对应的扩展表达式之后,还包括:解析所述扩展表达式,以将所述扩展表达式拆分为多段,每段对应一个所述grok表达式或一个所述Jmte表达式。
- 根据权利要求4所述的日志中关键信息提取方法,其中,所述解析所述扩展表达式之前,还包括:判断存储器中是否存在解析好的所述扩展表达式;若存在,则直接调取解析好的所述扩展表达式;若不存在,则执行解析所述扩展表达式,以将所述扩展表达式拆分为多段的步骤,且将解析好的所述扩展表达式存储至所述存储器。
- 根据权利要求4所述的日志中关键信息提取方法,其中,所述利用所述扩展表达式从所述日志中提取关键信息,包括:获取所述日志的文本信息;逐个利用所述grok表达式或所述Jmte表达式从所述文本信息中提取所述关键信息,并在提取到所述关键信息后从所述文本信息中移除所述关键信息,直至提取完成。
- 根据权利要求1所述的日志中关键信息提取方法,其中,当所述扩展表达式中存在提取固定长度的字段信息的第三Jmte表达式时,所述方法还包括:通过所述第三Jmte表达式识别所述日志中的特殊字符后,获取任意相邻两个所述特殊字符间字段的字符串长度;从所述第三Jmte表达式中获取需要提取的字段信息的目标长度;提取与所述目标长度匹配的字段,得到所述字段信息。
- 一种日志中关键信息提取装置,其中,包括:识别模块,用于识别日志所属的日志类别,所述日志类别预先设定;获取模块,用于获取所述日志类别对应的扩展表达式,每种日志类别对应一个预先构建的扩展表达式,所述扩展表达式包括至少一个grok表达式和至少一个Jmte表达式,所述Jmte表达式根据所需提取的关键信息预先设定;判断模块,用于判断所述扩展表达式是否为特殊表达式;第一提取模块,用于当所述扩展表达式为特殊表达式时,利用与所述特殊表达式对应的预设解析规则和所述特殊表达式从所述日志中提取关键信息;第二提取模块,用于当所述扩展表达式不为特殊表达式时,利用所述扩展表达式从所述日志中提取关键信息。
- 一种终端,包括存储器、处理器以及存储在存储器上并可在处理器上运行的计算机程序,其中,所述处理器执行所述计算机程序时实现以下步骤:识别日志所属的日志类别,所述日志类别预先设定;获取所述日志类别对应的扩展表达式,每种日志类别对应一个预先构建的扩展表达式,所述扩展表达式包括至少一个grok表达式和至少一个Jmte表达式,所述Jmte表达式根据所需提取的关键信息预先设定;判断所述扩展表达式是否为特殊表达式;若是,则利用与所述特殊表达式对应的预设解析规则和所述特殊表达式从所述日志中提取关键信息;若否,则利用所述扩展表达式从所述日志中提取关键信息。10、根据权利要求9所述的终端,其中,所述利用与所述特殊表达式对应的预设解析规则和所述特殊表达式从所述日志中提取关键信息,包括:当所述扩展表达式为第一特殊表达式时,执行所述第一特殊表达式末尾的第一Jmte表达式以提取所述日志末尾的第一关键信息,所述第一特殊表达式的末尾预设为用于提取第一关键信息的所述第一Jmte表达式;判断所述第一关键信息中是否包括第一预设字段;若是,则利用所述grok表达式和剩余的所述Jmte表达式从所述日志中提取关键信息;若否,则判定所述日志异常并停止提取所述关键信息。
- 根据权利要求9所述的终端,其中,所述利用与所述特殊表达式对应的预设解析规则和所述特殊表达式从所述日志中提取关键信息,包括:当所述扩展表达式属于所述第二特殊表达式时,执行所述第二特殊表达式首位的第二Jmte表达式以获取所述日志的长度,所述第二特殊表达式的首位预设为用于获取所述日志的长度的所述第二Jmte表达式;判断所述日志的长度是否大于第一预设阈值或小于第二预设阈值,所述第一预设阈值大于所述第二预设阈值;若是,则判定所述日志异常并停止提取所述关键信息若否,则利用所述grok表达式和剩余的所述Jmte表达式从所述日志中提取关键信息。
- 根据权利要求9所述的终端,其中,所述获取所述日志类别对应的扩展表达式之后,还包括:解析所述扩展表达式,以将所述扩展表达式拆分为多段,每段对应一个所述grok表达式或一个所述Jmte表达式。
- 根据权利要求12所述的终端,其中,所述解析所述扩展表达式之前,还包括:判断存储器中是否存在解析好的所述扩展表达式;若存在,则直接调取解析好的所述扩展表达式;若不存在,则执行解析所述扩展表达式,以将所述扩展表达式拆分为多段的步骤,且将解析好的所述扩展表达式存储至所述存储器。
- 根据权利要求12所述的终端,其中,所述利用所述扩展表达式从所述日志中提取关键信息,包括:获取所述日志的文本信息;逐个利用所述grok表达式或所述Jmte表达式从所述文本信息中提取所述关键信息,并在提取到所述关键信息后从所述文本信息中移除所述关键信息,直至提取完成。
- 根据权利要求9所述的终端,其中,当所述扩展表达式中存在提取固定长度的字段信息的第三Jmte表达式时,所述方法还包括:通过所述第三Jmte表达式识别所述日志中的特殊字符后,获取任意相邻两个所述特殊字符间字段的字符串长度;从所述第三Jmte表达式中获取需要提取的字段信息的目标长度;提取与所述目标长度匹配的字段,得到所述字段信息。
- 一种存储介质,其中,存储有能够实现日志中关键信息提取方法的程序文件,所述程序文件被处理器执行时实现以下步骤:识别日志所属的日志类别,所述日志类别预先设定;获取所述日志类别对应的扩展表达式,每种日志类别对应一个预先构建的扩展表达式,所述扩展表达式包括至少一个grok表达式和至少一个Jmte表达式,所述Jmte表达式根据所需提取的关键信息预先设定;判断所述扩展表达式是否为特殊表达式;若是,则利用与所述特殊表达式对应的预设解析规则和所述特殊表达式从所述日志中提取关键信息;若否,则利用所述扩展表达式从所述日志中提取关键信息。
- 根据权利要求16所述的存储介质,其中,所述利用与所述特殊表达式对应的预设解析规则和所述特殊表达式从所述日志中提取关键信息,包括:当所述扩展表达式为第一特殊表达式时,执行所述第一特殊表达式末尾的第一Jmte表达式以提取所述日志末尾的第一关键信息,所述第一特殊表达式的末尾预设为用于提取第一关键信息的所述第一Jmte表达式;判断所述第一关键信息中是否包括第一预设字段;若是,则利用所述grok表达式和剩余的所述Jmte表达式从所述日志中提取关 键信息;若否,则判定所述日志异常并停止提取所述关键信息。
- 根据权利要求16所述的存储介质,其中,所述利用与所述特殊表达式对应的预设解析规则和所述特殊表达式从所述日志中提取关键信息,包括:当所述扩展表达式属于所述第二特殊表达式时,执行所述第二特殊表达式首位的第二Jmte表达式以获取所述日志的长度,所述第二特殊表达式的首位预设为用于获取所述日志的长度的所述第二Jmte表达式;判断所述日志的长度是否大于第一预设阈值或小于第二预设阈值,所述第一预设阈值大于所述第二预设阈值;若是,则判定所述日志异常并停止提取所述关键信息若否,则利用所述grok表达式和剩余的所述Jmte表达式从所述日志中提取关键信息。
- 根据权利要求16所述的存储介质,其中,所述获取所述日志类别对应的扩展表达式之后,还包括:解析所述扩展表达式,以将所述扩展表达式拆分为多段,每段对应一个所述grok表达式或一个所述Jmte表达式。
- 根据权利要求19所述的存储介质,其中,所述解析所述扩展表达式之前,还包括:判断存储器中是否存在解析好的所述扩展表达式;若存在,则直接调取解析好的所述扩展表达式;若不存在,则执行解析所述扩展表达式,以将所述扩展表达式拆分为多段的步骤,且将解析好的所述扩展表达式存储至所述存储器。
- 根据权利要求19所述的存储介质,其中,所述利用所述扩展表达式从所述日志中提取关键信息,包括:获取所述日志的文本信息;逐个利用所述grok表达式或所述Jmte表达式从所述文本信息中提取所述关键信息,并在提取到所述关键信息后从所述文本信息中移除所述关键信息,直至提取完成。
- 根据权利要求16所述的存储介质,其中,当所述扩展表达式中存在提取固定长度的字段信息的第三Jmte表达式时,其还包括:通过所述第三Jmte表达式识别所述日志中的特殊字符后,获取任意相邻两个所述特殊字符间字段的字符串长度;从所述第三Jmte表达式中获取需要提取的字段信息的目标长度;提取与所述目标长度匹配的字段,得到所述字段信息。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010737229.9 | 2020-07-28 | ||
CN202010737229.9A CN111881094B (zh) | 2020-07-28 | 2020-07-28 | 日志中关键信息提取方法、装置、终端及存储介质 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021120782A1 true WO2021120782A1 (zh) | 2021-06-24 |
Family
ID=73200814
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2020/118501 WO2021120782A1 (zh) | 2020-07-28 | 2020-09-28 | 日志中关键信息提取方法、装置、终端及存储介质 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN111881094B (zh) |
WO (1) | WO2021120782A1 (zh) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115118582A (zh) * | 2022-06-15 | 2022-09-27 | 合肥移瑞通信技术有限公司 | 日志分析的方法和装置 |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112381519A (zh) * | 2020-11-20 | 2021-02-19 | 北京云族佳科技有限公司 | 一种工作日志的处理方法及装置、可读存储介质 |
CN114818643B (zh) * | 2022-06-21 | 2022-10-04 | 北京必示科技有限公司 | 一种保留特定业务信息的日志模板提取方法及装置 |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110287163A (zh) * | 2019-06-25 | 2019-09-27 | 浙江乾冠信息安全研究院有限公司 | 安全日志采集解析方法、装置、设备及介质 |
US20190361899A1 (en) * | 2017-01-16 | 2019-11-28 | China Unionpay Co., Ltd. | Statement parsing method for database statement |
CN110851414A (zh) * | 2019-11-06 | 2020-02-28 | 云南艾拓信息技术有限公司 | 一种以聚类法进行边界数据分析的方法及其系统 |
CN111339052A (zh) * | 2020-02-28 | 2020-06-26 | 中国银联股份有限公司 | 一种非结构化日志数据处理方法及装置 |
Family Cites Families (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105138593A (zh) * | 2015-07-31 | 2015-12-09 | 山东蚁巡网络科技有限公司 | 一种利用正则表达式自定义提取日志关键信息的方法 |
CN106055585A (zh) * | 2016-05-20 | 2016-10-26 | 北京神州绿盟信息安全科技股份有限公司 | 一种日志解析方法及装置 |
CN106407071A (zh) * | 2016-09-06 | 2017-02-15 | 珠海迈科智能科技股份有限公司 | 一种基于linux的内容服务后台日志自动分析工具 |
US10678669B2 (en) * | 2017-04-21 | 2020-06-09 | Nec Corporation | Field content based pattern generation for heterogeneous logs |
CN109408541A (zh) * | 2018-09-03 | 2019-03-01 | 平安科技(深圳)有限公司 | 报表分解统计方法、系统、计算机设备和存储介质 |
CN109408479B (zh) * | 2018-09-19 | 2023-05-30 | 平安科技(深圳)有限公司 | 日志数据添加方法、系统、计算机设备和存储介质 |
CN109582551B (zh) * | 2018-10-11 | 2022-04-26 | 平安科技(深圳)有限公司 | 日志数据解析方法、装置、计算机设备和存储介质 |
CN110427307A (zh) * | 2019-06-21 | 2019-11-08 | 平安科技(深圳)有限公司 | 日志解析方法、装置、计算机设备及存储介质 |
-
2020
- 2020-07-28 CN CN202010737229.9A patent/CN111881094B/zh active Active
- 2020-09-28 WO PCT/CN2020/118501 patent/WO2021120782A1/zh active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190361899A1 (en) * | 2017-01-16 | 2019-11-28 | China Unionpay Co., Ltd. | Statement parsing method for database statement |
CN110287163A (zh) * | 2019-06-25 | 2019-09-27 | 浙江乾冠信息安全研究院有限公司 | 安全日志采集解析方法、装置、设备及介质 |
CN110851414A (zh) * | 2019-11-06 | 2020-02-28 | 云南艾拓信息技术有限公司 | 一种以聚类法进行边界数据分析的方法及其系统 |
CN111339052A (zh) * | 2020-02-28 | 2020-06-26 | 中国银联股份有限公司 | 一种非结构化日志数据处理方法及装置 |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115118582A (zh) * | 2022-06-15 | 2022-09-27 | 合肥移瑞通信技术有限公司 | 日志分析的方法和装置 |
CN115118582B (zh) * | 2022-06-15 | 2024-04-16 | 合肥移瑞通信技术有限公司 | 日志分析的方法和装置 |
Also Published As
Publication number | Publication date |
---|---|
CN111881094B (zh) | 2023-07-18 |
CN111881094A (zh) | 2020-11-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2021120782A1 (zh) | 日志中关键信息提取方法、装置、终端及存储介质 | |
CN108847977B (zh) | 一种业务数据的监控方法、存储介质和服务器 | |
WO2019227689A1 (zh) | 数据监控方法、装置、计算机设备及存储介质 | |
WO2021164253A1 (zh) | 用户行为实时多维度分析方法、装置及存储介质 | |
CN110377651B (zh) | 批量数据的处理方法、装置、设备及存储介质 | |
CN112162965B (zh) | 一种日志数据处理的方法、装置、计算机设备及存储介质 | |
CN111008020B (zh) | 将逻辑表达式解析为通用查询语句的方法 | |
WO2022127259A1 (zh) | 数据清洗方法、装置、设备及存储介质 | |
CN111581057B (zh) | 一种通用日志解析方法、终端设备及存储介质 | |
CN106569989A (zh) | 一种用于短文本的去重方法及装置 | |
CN112463533A (zh) | 日志数据解析方法、装置、电子装置和存储介质 | |
CN108345648B (zh) | 一种基于列式存储的获取日志信息的方法及装置 | |
CN114125015A (zh) | 一种数据采集方法及系统 | |
CN108287831B (zh) | 一种url分类方法和系统、数据处理方法和系统 | |
CN118093965A (zh) | 一种信息处理方法、装置、设备及存储介质 | |
CN112883088B (zh) | 一种数据处理方法、装置、设备及存储介质 | |
CN110737678B (zh) | 一种数据查找方法、装置、设备和存储介质 | |
CN106557974B (zh) | 一种imix协议数据的处理方法及系统 | |
CN112612832A (zh) | 节点分析方法、装置、设备及存储介质 | |
CN116737926A (zh) | 一种威胁情报文本的分类方法、装置、设备及存储介质 | |
CN115525671A (zh) | 数据查询方法、装置、设备及存储介质 | |
WO2021129849A1 (zh) | 日志处理方法、装置、设备和存储介质 | |
CN115033451A (zh) | 数据生成方法、数据处理方法、装置、电子设备及介质 | |
CN111198900A (zh) | 工业控制网络的数据缓存方法、装置、终端设备及介质 | |
WO2022236973A1 (zh) | 数据回填方法、装置、电子设备及存储介质 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20903980 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20903980 Country of ref document: EP Kind code of ref document: A1 |