WO2020155651A1

WO2020155651A1 - Method and device for storing and querying log information

Info

Publication number: WO2020155651A1
Application number: PCT/CN2019/107127
Authority: WO
Inventors: 李同军
Original assignee: 华为技术有限公司
Priority date: 2019-02-02
Filing date: 2019-09-21
Publication date: 2020-08-06
Also published as: CN109885545A

Abstract

The present application provides a method for storing log information. The method comprises: identifying a first part and a second part of multiple pieces of log information from the multiple pieces of log information, the first part of the multiple pieces of log information being a same part in the multiple pieces of log information which is less than or equal to a first threshold piece of log information, and the second part of the multiple pieces of log information being a different part in the multiple pieces of log information which is less than or equal to a second threshold piece of log information; replacing the second part of the multiple pieces of log information with a placeholder identifier to form a log information template corresponding to the multiple pieces of log information; and storing the log information template corresponding to the multiple pieces of log information and the second part of the multiple pieces of log information into a database, respectively. By means of the technical solution provided by the present application, a storage space occupied by the same part in the multiple pieces of log information can be reduced, and storage resources are saved.

Description

Method and device for storing and querying log information

Technical field

This application relates to the storage field, and more specifically, to a method for storing log information, a method and device for querying log information, and a computer-readable storage medium.

Background technique

The computer log is generated by the operating system, middleware, platform itself or program components developed by the user during the operation of the computer system. It records the operating status and usage of the equipment and the system itself. This description information mainly describes the system. The key operations performed and the errors and exceptions that occurred during the operation of the system. Through the analysis of the system log, the user can understand the problems that frequently or occasionally occur in the operation of the system, so that the operation and maintenance of the system can be improved in a targeted manner, and the safety and efficiency of the system operation can be improved.

In the traditional log information storage solution, the original log information is directly stored in the database. On the one hand, a large amount of repeated information exists in the original log in the traditional log information storage solution, which results in a large amount of repeated information occupying most of the storage resources, resulting in a waste of resources. On the other hand, in the process of retrieving the stored log information by the user, the large amount of repeated information in the original log will make the retrieval efficiency low, and when the user performs data analysis on the large number of retrieved original logs, a large number of repeated parts It will make the user analysis process less real-time, which is not conducive to quickly discovering problems through logs.

Summary of the invention

This application provides a method for storing log information and a method for querying log information, which can reduce the storage space occupied by repeated parts of multiple log information, save storage resources, shorten the time for users to retrieve log information, and improve the log Use value of information.

In a first aspect, a method for storing log information is provided. The method includes: identifying a first part and a second part of the plurality of log information from the plurality of log information, and replacing the plurality of pieces of log information with a placeholder identifier. The second part of the log information forms a log information template corresponding to the multiple pieces of log information, the log information template includes the first part of the multiple pieces of log information, and the log information template corresponding to the multiple pieces of log information and all The second part of the multiple log information is respectively stored in the database.

Wherein, the first part of the multiple pieces of log information is the same part in the multiple pieces of log information that is less than or equal to the first threshold value, and the second part of the multiple pieces of log information is the same portion in the multiple pieces of log information. Different parts of the log information that are less than or equal to the second threshold.

It should be understood that the first threshold may be less than or equal to the total number of pieces of log information, the second threshold may be less than or equal to the total number of pieces of log information, and the first threshold and the second threshold may or may not be equal.

In the above technical solution, the original log information can be processed. Among the multiple pieces of log information, the same part and the different part are identified from multiple pieces of log information less than or equal to the number of the multiple pieces of log information. Only one part of the same part of multiple log information is stored, so that the simplified log information does not need a long string of characters carried in the log information template, which reduces the storage space occupied by the log information and saves the storage in the log service system LF.

In a possible implementation manner, the method further includes: generating an identifier of the log information template corresponding to the multiple pieces of log information.

In the above technical solution, the unique identifier corresponding to each log information template can be calculated, so that the log information can be restored through the log information template identifier in the later process of user retrieval of the log information, so that the user can obtain To the original log information. Not only can we analyze the original log information to understand the frequent or accidental problems of the node in the operation, but also analyze the distribution law of the logo corresponding to the log information template to discover potential system abnormalities, and send an alarm in advance. .

In another possible implementation manner, in the process of retrieving the stored log information through the log service API, the user can search for keywords in the log information template.

In the above technical solution, the log information required by the user is searched by keywords in the log information template. Since the log information template will not be repeated a lot, the log information required by the user can be quickly found, reducing the search time and improving The real-time nature of the user analysis process is conducive to quickly discovering problems through logs.

In another possible implementation manner, the second part of the multiple pieces of log information and the identifier of the log information template are stored in the first space of the database; the log information template and the log information The identification of the template is stored in the second space of the database.

In a second aspect, a method for querying log information is provided. The method is applied to a log information query system. The log information query system includes a database that stores a second part of a plurality of log information and the A log information template corresponding to a piece of log information, the log information template includes a first part of the plurality of log information, and the first part of the plurality of log information is less than or equal to a first threshold in the plurality of log information The same part in the log information, and the second part of the multiple pieces of log information is a different part of the multiple pieces of log information that is less than or equal to a second threshold value, and the method includes:

Receiving a query request, the query request being used to query log information stored in a database; acquiring from the database according to the query request the second part of the plurality of log information stored and corresponding to the plurality of log information Log information template; bring the second part of the multiple pieces of log information into the log information template corresponding to the multiple pieces of log information to obtain corresponding log information, and return the corresponding log information to the client that issued the query request ,

In the above technical solution, the second part of the multiple pieces of log information can be brought into the log information template corresponding to the multiple pieces of log information, and the original log information formed is returned to the user. It enables users to analyze the original log information obtained and understand the problems that frequently or occasionally occur during the operation of the node.

In a possible implementation manner, before the receiving the query request, the method further includes: identifying the first part and the second part of the plurality of log information from the plurality of log information;

Replacing the second part of the multiple pieces of log information with a placeholder identifier to form a log information template corresponding to the multiple pieces of log information;

The log information template corresponding to the multiple pieces of log information and the second part of the multiple pieces of log information are respectively stored in the database.

In another possible implementation manner, the method further includes: generating an identifier of the log information template corresponding to the multiple pieces of log information.

In another possible implementation manner, the method further includes: storing the second part of the plurality of log information and the identifier of the log information template in the first space of the database; and storing the log information The template and the identifier of the log information template are stored in the second space of the database.

In a third aspect, a device for storing log information is provided, and the device includes:

The identification module is configured to identify the first part and the second part of the plurality of log information from the plurality of log information, wherein the first part of the plurality of log information is less than or equal to The same part in the first threshold piece of log information, and the second part of the multiple pieces of log information is a different part in the multiple pieces of log information that is less than or equal to the second threshold piece of log information;

The processing module is configured to replace the second part of the multiple pieces of log information with a placeholder identifier to form a log information template corresponding to the multiple pieces of log information, and the log information template includes the first part of the multiple pieces of log information ；

The storage module is configured to store the log information template corresponding to the multiple pieces of log information and the second part of the multiple pieces of log information to the database respectively.

In a possible implementation manner, the device further includes: a generating module, configured to generate an identifier of the log information template corresponding to the multiple pieces of log information.

In another possible implementation manner, the storage module is specifically configured to: store the second part of the multiple pieces of log information and the identifier of the log information template in the first space of the database; and store the The log information template and the identifier of the log information template are stored in the second space of the database.

In a fourth aspect, a device for querying log information is provided. The device is applied to a log information query system. The log information query system includes a database that stores a second part of a plurality of log information and the A log information template corresponding to a piece of log information, the log information template includes a first part of the plurality of log information, and the first part of the plurality of log information is less than or equal to a first threshold in the plurality of log information The same part in the log information, the second part of the multiple pieces of log information is a different part of the multiple pieces of log information that is less than or equal to a second threshold value, and the device includes:

A receiving module, configured to receive a query request, the query request being used to query log information stored in the database;

The obtaining module is configured to obtain the second part of the plurality of log information stored in the database and the log information template corresponding to the plurality of log information from the database according to the query request; The second part of the information is brought into the log information template corresponding to the multiple pieces of log information to obtain the corresponding log information.

Optionally, the device for querying log information may further include a returning module for returning the corresponding log information to the client that issued the query request

In a possible implementation manner, the device further includes: an identification module, configured to identify the first part and the second part of the plurality of log information from the plurality of log information;

A processing module, configured to replace the second part of the multiple pieces of log information with a placeholder identifier to form a log information template corresponding to the multiple pieces of log information;

In another possible implementation manner, the device further includes: a generating module, configured to generate an identifier of the log information template corresponding to the multiple pieces of log information.

In another possible implementation manner, the storage module is specifically configured to: store the second part of the plurality of log information and the identifier of the log information template in the first space of the database; and store the log information The template and the identifier of the log information template are stored in the second space of the database.

In a fifth aspect, a device for storing log information is provided. The device for storing log information includes at least one computing node, and each computing node includes a processor and a memory. The processor of the at least one computing node is configured to execute the program code stored in the memory, so as to execute the foregoing first aspect or the method in the possible implementation manner of the first aspect.

In a sixth aspect, a device for querying log information is provided. The device for querying log information includes at least one computing node, and each computing node includes a processor and a memory. The processor of the at least one computing node is configured to execute the program code stored in the memory, so as to execute the foregoing second aspect or the method in the possible implementation manner of the second aspect.

In a seventh aspect, the present application provides a non-transitory, non-volatile computer-readable storage medium, which stores instructions in the computer-readable storage medium, and when it runs on at least one computing node, the at least one computing node Perform the foregoing first aspect or the method in the possible implementation of the first aspect.

In an eighth aspect, the present application provides a non-transitory, non-volatile computer-readable storage medium that stores instructions in the computer-readable storage medium, and when it runs on at least one computing node, the at least one computing node Perform the foregoing second aspect or the method in the possible implementation of the second aspect.

In a ninth aspect, the present application provides a computer program product containing instructions that, when it runs on at least one computing node, causes at least one computing node to execute the above-mentioned first aspect or the method in the possible implementation of the first aspect.

In a tenth aspect, this application provides a computer program product containing instructions, which when run on at least one computing node, causes at least one computing node to execute the above-mentioned first aspect or the method in the possible implementation of the first aspect.

On the basis of the implementation manners provided by the above aspects, this application can be further combined to provide more implementation manners.

Description of the drawings

FIG. 1 is a schematic block diagram of a distributed log service system 100 applied to an embodiment of the present application.

Fig. 2 is a schematic flowchart of a method for storing log information provided by an embodiment of the present application.

FIG. 3 is a schematic structural diagram of an apparatus 300 for storing log information provided by an embodiment of the present application.

FIG. 4 is a schematic structural diagram of an apparatus 400 for storing log information provided by an embodiment of the present application.

FIG. 5 is a schematic structural diagram of an apparatus 500 for querying log information provided by an embodiment of the present application.

FIG. 6 is a schematic structural diagram of an apparatus 600 for querying log information provided by an embodiment of the present application.

detailed description

The technical solution in this application will be described below in conjunction with the drawings.

In large-scale distributed scenarios, a dedicated log system or log service is required to collect and store log information from a large number of managed nodes, and provide related retrieval interfaces for users to filter and analyze the log data they care about. The distributed logging system 100 will be described in detail below.

FIG. 1 is a schematic block diagram of a distributed log service system 100 applied to an embodiment of the present application. As shown in FIG. 1, the system 100 may include a server 110 and at least one node.

The embodiment of the present application does not specifically limit the number of at least one node. For ease of description, the embodiment of the present application uses the node 120 and the node 130 as examples for description.

It should be understood that the node 120 and the node 130 may be different virtual machines (virtual machines, VMs) installed on one physical host, or may be VMs installed on different physical hosts. The node 120 and the node 130 may also be physical hosts.

During the operation of at least one node in the log service system 100, its operating system, platform itself, or program components developed by the user can generate corresponding log information, which records the key operations performed by at least one node and the running Errors and exceptions that occurred during the process. At least one node can save the generated log information in a corresponding log file. Through the analysis of the system log, users can not only understand the problems that occur frequently or occasionally in the operation of at least one node, so as to improve the operation and maintenance of at least one node in a targeted manner. It is also possible to discover potential system abnormalities by analyzing the distribution law of the identifiers corresponding to the log information template, and send out alarm reminders in advance.

A log collection agent 121 and a log collection agent 131 are respectively deployed on the node 120 and the node 130. The log collection agent 121 in the node 120 may obtain the log information saved in the log file in the designated directory of the node 120, and report the obtained log information of the node 120 to the server 110 through the message middleware 114. Similarly, the log collection agent 131 in the node 130 can obtain the log information saved in the log file under the specified directory in the node 130, and report the obtained log information of the node 130 to the server 110 through the message middleware 114.

It should be noted that the log collection agent module 121 and the log collection agent module 131 may be modules implemented by software programs.

The data preprocessing module 113 in the log service system 110 may preprocess the reported log information after receiving the log information reported by the node 120 and/or the node 130 respectively. And the processed log information is stored in the distributed database 112 in the server 110.

Specifically, the data preprocessing module 113 in the server 110 may, after receiving the log information of at least one node reported by the message middleware 114, tag the log information and store it in the distributed database 112. So that when searching for related log information, users can search related log information through tag information.

It should be understood that the tag information may be carried by the log information itself, or may be obtained by the log collection agent in at least one node according to the node where the log information is located and the attributes of the application service running on it. The embodiment of this application does not specifically limit the content of the label information, which may include but is not limited to: the storage path of the log file where the log information is located, the internet protocol (IP) address of the node where the log information is located, and the service location of the log information The region information of the public cloud, the service information (for example, service identifier (ID)) to which the log information belongs, the component information (for example, component ID) to which the log information belongs, the tenant ID, etc.

The log service application program interface (application program interface, API) 111 in the server 110 can provide users with retrieval functions. Users can use the log service API 111 to interact with the log service system 100 through a browser or command line tool. .

Specifically, the user can search the log information he needs through the log service API 111, for example, can obtain the log information stored in the distributed database 112 by searching various tag information and log keywords of the log information. In order to understand the frequent or accidental problems of the node in operation according to the obtained log information, the operation of the node can be maintained in a targeted manner, thereby improving the efficiency and efficiency of the node's system operation.

In the traditional log storage solution, the original log information is directly stored in the database, which stores a large amount of repeated information. On the one hand, after the amount of log information exceeds a certain level, the collection of a large amount of log information, cross-node transmission, and the occupation of a large number of database resources will face huge challenges. On the other hand, the direct storage of the original log information will also reduce the use efficiency of the log information. For example, in a fuzzy search scenario, it takes a long time to search in a huge amount of original log information. When using log information for anomaly detection and other related data mining analysis, a large amount of log information needs to be analyzed in a centralized manner. The real-time performance is poor. Conducive to quickly discover problems.

The embodiment of the present application provides a method for storing data, which can process the original log information, and reduce the storage space occupied by the original data by eliminating a large amount of repeated data in the original data.

It should be noted that FIG. 1 is a possible scenario applied to the embodiment of the present application, and the method provided in the embodiment of the present application can also be applied to a scenario where a large amount of repeated unstructured data is stored.

Fig. 2 is a schematic flowchart of a method for storing log information provided by an embodiment of the present application. As shown in Fig. 1, the method may include steps 210-230, and steps 210-230 will be described in detail below.

Step 210: Identify the first part and the second part of the plurality of log information from the plurality of log information.

In the embodiment of the present application, the first part and the second part of the multiple pieces of log information can be identified from the pieces of log information, where the first part may be the same part of the multiple pieces of log information that is less than or equal to the first threshold. The second part is a different part of the multiple pieces of log information that is less than or equal to the second threshold.

It should be understood that the first threshold may be less than or equal to the total number of pieces of log information, and the second threshold may be less than or equal to the total number of pieces of log information.

It should be noted that the first threshold and the second threshold may be equal or unequal, which is not specifically limited in the embodiment of the present application.

Referring to FIG. 1, in this embodiment of the application, it may be the log collection agent of at least one node that compares and identifies pieces of log information that are less than or equal to the first threshold among multiple pieces of log information, or it may be the data preprocessing module 113 in the server 110. Compare and identify pieces of log information that are less than or equal to the first threshold among multiple pieces of log information. As an example, when the processor and/or memory of the log collection agent 121 in the node 120 are sufficient relative to the amount of data it processes, the log collection agent 121 may obtain storage from log files in a specified directory. After the original log information, identify the log information that is less than or equal to the first threshold among the multiple log information, and identify the first part and the second part. As another example, in the case where the processor and/or memory of the log collection agent 121 in the node 120 is relatively limited relative to the amount of data it processes, the data preprocessing module 113 in the server 110 may After the multiple original log information reported by the log collection agent in at least one node, compare and identify the pieces of log information that are less than or equal to the first threshold among the multiple pieces of log information. The threshold log information identifies the first part and the second part above.

Step 220: Replace the second part of the multiple pieces of log information with placeholder identifiers to form a log information template corresponding to the multiple pieces of log information.

The embodiment of the present application may process the second part of the multiple pieces of log information identified from the foregoing. For example, a placeholder identifier may be used to replace the second part of the multiple pieces of log information to form one or more pieces of log information. Corresponding to a log information template, so that the log information template and M pieces of original log information have a 1:M correspondence relationship, thereby eliminating a large number of duplicate parts in multiple pieces of log information. Among them, M is a positive integer greater than 1.

In the embodiment of the present application, the type in the second part of the multiple log information may be distinguished by placeholder identifiers, or the type of the changed part in the second part of the multiple log information may not be distinguished, which is not specifically limited in this application. As an example, when using placeholder identifiers to distinguish the variable types in the second part of multiple log messages, you can use %d as a placeholder identifier for numeric variables, and %s as a string variable Placeholder identifier.

Step 230: Store the log information template corresponding to the multiple pieces of log information and the second part of the multiple pieces of log information to the database respectively.

In the embodiment of the present application, multiple log information can be distinguished and compared, and the first part and the second part of the multiple log information can be identified. A log information template corresponding to multiple log information is stored in one entity of the distributed database 112, and the changed part of each log information is stored in another entity of the distributed database 112.

In the above technical solution, the original log information can be processed. After identifying the changed part and the unchanged part of the multiple log information, only one unchanged part of each log information is stored, and each log information The information is condensed to include its changed parts, the logo of the log information template, and label information related to the log information. As a result, the simplified log information does not need to carry a long string of character sequences in the template, which reduces the storage space occupied by the log information and saves the storage resource consumption in the log service system.

Optionally, in some embodiments, a template identifier corresponding to each log information template can be generated, and each log information can be condensed to include: log information template identifier, variable value, timestamp, tag information, etc. So that the user can obtain the log information template through the log information template identifier included in each log information during the process of retrieving and restoring the log information to the original information. Thereby, the log information can be restored to the original state according to the log information template and the timestamp and variable value included in the log information.

It should be understood that the timestamp included in the reduced log information may be recorded by the log information itself. The label information can be recorded by the log information itself, or it can be obtained by the log collection agent in at least one node according to the attributes of the node where the log information is located and the application service running on it. For details, please refer to the label information above Description, not repeat them here.

In the embodiment of the present application, the feature of a piece of information can be extracted by an information fingerprint extraction algorithm, and the feature of this information can be converted into a set of codes. The code can be used as a unique template identifier corresponding to each log information module, so that the same log information template under different nodes can have the same identifier. As an example and not a limitation, a unique template identifier corresponding to each log information module can be generated through a message-digest (MD5) algorithm.

It should be understood that the MD5 algorithm is a widely used cryptographic hash function, which can generate a 128-bit (16-byte) hash value to ensure complete and consistent information transmission. The implementation process of generating the unique template identifier corresponding to each log information module through the MD5 algorithm will be described in detail below in conjunction with specific embodiments, and will not be repeated here.

In the embodiment of the present application, by using information fingerprint technology to generate a unique template identifier corresponding to each log information template, the same log information template under different nodes can have the same identifier, thereby reducing uniformity for each log information The template allocates the additional overhead and complexity caused by the communication between the nodes of the corresponding identifier.

The following describes in detail the specific implementation process of streamlining the log information described in the above process in combination with the specific log information content. For ease of description, the following is an example of extracting a log information template from three original log information.

It should be understood that the following examples are only to help those skilled in the art understand the embodiments of the present application, and are not intended to limit the embodiments of the present application to the specific numerical values or specific scenarios illustrated. Those skilled in the art can obviously make various equivalent modifications or changes based on the following examples given, and such modifications or changes also fall within the scope of the embodiments of the present application.

Original log information 1: [2018-11-09 13:39:14.696][10.0.26.102][INF0][bulk _- thread-2][BulkHandlerRunable:submits 138]get data from queue timeout!

Original log information 2: [2018-11-09 13:39:15.698][10.0.26.102][INF0][bulk _- thread-5][BulkHandlerRunable:submits 138]get data from queue timeout!

Original log information 3: [2018-11-09 13:39:16.066][10.0.26.102][INF0][bulk _- thread-3][BulkHandlerRunable:submits 138]get data from queue timeout!

Step 1: Identify the changed part and the unchanged part in each of the three original log messages.

In the embodiment of the present application, a pattern extraction algorithm can be used to identify and extract the changed part and the unchanged part of each of the three original log information. As an example and not a limitation, the iterative partitioning log mining (IPLOM) algorithm can be used to identify the changed part and the unchanged part of each original log information in the 3 original log information, and can use the placeholder The identifier replaces the changed part of each original log information in the 3 original log information.

After the pattern extraction algorithm in step 1, the recognition results after comparing the above three original log information are as follows:

The comparison results of the original log information 1 with the original log information 2 and 3 are as follows:

The unchanging part: [10.0.26.102][INF0][bulk _- thread-][BulkHandlerRunable:submits 138]get data from queue timeout!

The changed part: 2, [2018-11-09 13:39:14.696]

Use the placeholder identifier %d to replace the changed numerical part of the original log information 1, and use the tm field to represent the log time field in the original log information 1. The original log information 1 can be expressed as:

tm[10.0.26.102][INF0][bulk _- thread-%d][BulkHandler Runable:submits 138]get data from queue timeout!

Among them, %d=2, tm=[2018-11-09 13:39:14.696]

The comparison result of the original log information 2 with the original log information 1 and 3 is as follows:

The changed part: 5, [2018-11-09 13:39:15.698]

Replace the changed part in the original log information 2 with the placeholder identifier %d, and use the tm field to represent the log time field in the original log information 2. The original log information 2 can be expressed as:

Among them, %d=5, tm=[2018-11-09 13:39:15.698].

The comparison results of the original log information 3 with the original log information 1 and 2 are as follows:

The changed part: 3, [2018-11-09 13:39:16.066]

Replace the changed part in the original log information 3 with the placeholder identifier %d, and use the tm field to represent the log time field in the original log information 3. The original log information 3 can be expressed as:

Among them, %d=3, tm=[2018-11-09 13:39:16.066].

In summary, the log information template identified in the three original log messages is: tm[INF0][bulk _- thread-%d][BulkHandlerRunable:submits 138]get data from queue timeout!

Step 2: Generate a unique identifier corresponding to the log information template.

In a distributed scenario including at least one node, different nodes may have the same log information template, so that the log information template corresponds to the identifier of the template one-to-one. The embodiment of the present application may use digital fingerprint technology to realize a one-to-one correspondence between the log information template and the identifier of the template. For example, the MD5 algorithm is used to generate a unique identifier corresponding to the log information template.

Specifically, the MD5 algorithm calculates the character string of the aforementioned log information template to generate a 128-bit (16-byte) hash value, which can be used as a unique identifier corresponding to the aforementioned log information template.

In the embodiment of this application, a 64-bit (8-byte) hash value can also be selected from a 128-bit (16-byte) hash value generated by the MD5 algorithm, and the 64-bit hash value is used as the actual log The corresponding unique identifier of the information template. There are many implementations for selecting a 64-bit hash value from a 128-bit hash value. As an example, a 64-bit odd part is selected from the 128-bit hash value as the 64-bit hash value. As another example, a portion of 64 even bits is selected from the 128-bit hash value as the 64-bit hash value. As another example, the middle 64 bits are selected from the 128-bit hash value as the 64-bit hash value. As another example, the 128-bit hash value can also be folded and added to obtain a 64-bit hash value.

For example, the tid field may be used to indicate the identification of the aforementioned log information template.

{tid=5085657271133051000

tm[INF0][bulk _- thread-%d][BulkHandlerRunable:submits 138]get data from queue timeout! }

Step 3: Convert the log time field in the original log information.

It should be understood that step 3 is optional.

In order to reduce the storage resources occupied by the character string in the field representing the log time, the log time in the original log information can be converted to convert the log time into the millisecond offset of the base time. As an example, you can use January 1, 1970 as the reference time, and calculate the second offset value of the time recorded in the original log information of each graph relative to the reference time.

For example, tm in the calculated original log information 1 is 1541770754696, tm in the original log information 2 is 1541770755698, and tm in the original log information 3 is 1541770756066.

Step 4: Represent the changed part of each original log information.

In the embodiment of the present application, a field storing the variable value in each original log information may be added. As an example, the mg field is used to represent the variable value list string.

For example, the variable value list string mg in the original log information 1 = (2), the variable value list string mg in the original log information 2 = (5), and the variable value list string mg in the original log information 3 = ( 3).

In the embodiment of the present application, a field storing the number of variable values in each original log information can also be added. As an example, the VL field is used to represent the number of variable values.

For example, the number of variable values in the original log information 1 is VL=1, the number of variable values in the original log information 2 is VL=1, and the number of variable values in the original log information 3 is VL=1.

Step 5: Add tag information of each original log information.

In the embodiment of the present application, relevant tag information can be added to each original log information, so that the required log information can be obtained through various tag information.

For ease of description, the embodiment of the present application takes the label information as the path of the log file where the log information is located and the IP address of the node where the log information is located as an example for description.

Taking the label information as the IP address of the node where the log information is located as an example, the IP in the original log information 1 = 10.0.26.102, the IP in the original log information 2 = 10.0.26.102, and the IP in the original log information 3 = 10.0.26.102.

Taking the label information as the path of the log file where the log information is located as an example, the embodiment of the present application can perform information fingerprint extraction on the path of each log file to form a unique identifier corresponding to the path of each log file, thereby saving storage Resources. The specific information fingerprint extraction and identification process is similar to the calculation process of the log information template identification. For details, please refer to the process of generating the log information template identification, which will not be repeated here.

As an example, the pid field is used to indicate the path of the log file where the log information is located. For example, the pid in the original log information 1 is 864995200973638000, the pid in the original log information 2 is 864995200973638000, and the pid in the original log information 3 is 864995200973638000.

After the above steps 1 to 5, the above three original log information are condensed into: timestamp + template identification + variable + related label information. And the template identifier and the template can be stored in different tables of the distributed database 112 respectively.

As an example, the original log information can be split into three parts and stored separately. For example, Table 1 is used to store the reduced log information, Table 2 is used to store the path of the log file where the log information is located, and Table 3 is used to store three templates of original log information.

The contents stored in Table 1 are as follows:

The simplified log information 1 can be expressed as:

The simplified log information 2 can be expressed as:

The simplified log information 3 can be expressed as:

The contents stored in Table 2 are as follows:

The contents stored in Table 3 are as follows:

It is understandable that in the embodiments of the present application, some or all of the steps in the embodiments of the present application can be performed, and these steps or operations are only examples, and the embodiments of the present application can also perform other operations or variations of various operations. In addition, each step may be executed in a different order presented in the embodiments of the present application, and it may not be necessary to perform all the operations in the embodiments of the present application.

Optionally, in some embodiments, in the process of retrieving the stored log information through the log service API 111 in the embodiments of the present application, the user can search for keywords in the log information template, so as to quickly find Log information that users need to reduce search time.

Optionally, in some embodiments, the embodiments of this application can also restore the reduced log information stored on the log service API 111 side, so that the user can obtain the original log information, and the original log Analysis of log information to understand the frequent or accidental problems of the node during operation.

Specifically, as shown in FIG. 1, in the distributed database 112, the user can input various tag information and log keywords through the log service API 111 to obtain the required log information. The distributed database 112 may obtain the associated log information template and the variable value list string mg according to the log information template identifier tid included in the keyword, and according to the obtained log information template and the variable value list string mg, the variable value The list string mg is automatically brought into the position of the placeholder identifier of the log information template, so as to restore the original log information.

It should be noted that the distributed database 112 in the embodiment of the present application may be a distributed relational database or a non-relational database with a join function. A distributed relational database or a non-relational database with a join function can obtain the associated log information template and the variable value list string mg through the log information template identifier tid.

Optionally, in some embodiments, the user can interact with the log service system 100 using the interface provided by the log service API 111 through a browser.

Specifically, the user may input a query request through a browser, and the query request may include keywords and/or related tag information for the log information to be queried. The database may obtain the stored log information from the table of the database according to the keywords and/or related tag information of the log information in the query request, and may feed back the stored log information to the user. Users can not only analyze the log information to understand the problems that occur frequently or occasionally during the operation of the node, but also analyze the distribution law of the identifier corresponding to the log information template to discover potential system abnormalities and send out alarms in advance.

It should be understood that the keyword included in the query request may be a certain character string in the log information.

As an example, the query request entered by the user is to query the log information of "January 1, 2018". After receiving the query request, the database can determine the value of tm according to the query time "January 1, 2018" . And according to the value of tm, the log information that meets the requirements can be obtained from the log information stored in Table 1 of the database. At the same time, the join function of the database can obtain the log information template corresponding to the log information template identifier in the log information from Table 3, bring the log information stored in Table 1 into the log information template, and generate the original log information, and The original log information can be fed back to the user. As another example, the user can also enter a character string in the log information template in the query request, and the database can obtain the log information template corresponding to the log information template identifier in the log information in Table 3. The join function of the database can also obtain the log information with the log information template identifier from the log information stored in Table 1, and bring the log information stored in Table 1 to the log information template to generate original log information, and The original log information is fed back to the user.

The following takes the query request entered by the user as the log information generated on the query IP address "10.0.26.102" as an example to explain the process of the user querying the log information in detail

The query request entered by the user through the log service API 111 includes: IP = 10.0.26.102. The database 112 can obtain log information with IP = 10.0.26.102 from the log information stored in Table 1 according to IP = 10.0.26.102, as shown below:

The join function of the database 112 can also obtain the log information template corresponding to tid=5085657271133051000 from Table 3 according to the tid=5085657271133051000 in the above log information, as shown below:

The database 112 can restore the time stamp information originally carried in the log information according to the tm in the log information. For example, tm=1541770754696 is reduced to [2018-11-09 13:39:14.696], tm=1541770755698 is reduced to [2018-11-09 13:39:15.698], tm=1541770756066 is reduced to [2018-11 -09 13:39:16.066].

The database 112 may also bring the originally carried time stamp information and the log information that meets the IP=10.0.26.102 obtained from Table 1 into the log information template to form the original log information. Provide the original log information to the front-end interface of Log Service API 111 to restore the original log information.

The original log information provided to the front-end interface of Log Service API 111 is as follows:

Log information 1:

[2018-11-09 13:39:14.696][10.0.26.102][INF0][bulk _- thread-2][BulkHandlerRunable:submits 138]get data from queue timeout!

Log information 2:

[2018-11-09 13:39:15.698][10.0.26.102][INF0][bulk _- thread-5][BulkHandlerRunable:submits 138]get data from queue timeout!

Log information 3:

[2018-11-09 13:39:16.066][10.0.26.102][INF0][bulk _- thread-3][BulkHandlerRunable:submits 138]get data from queue timeout!

The foregoing describes in detail the method for storing log information and the method for querying log information provided by the embodiments of the present application with reference to Figs. 1 to 2, and the device embodiments of the present application will be described in detail below with reference to Figs. 3-4. It should be understood that the description of the method embodiment and the description of the device embodiment correspond to each other, and therefore, the parts that are not described in detail may refer to the previous method embodiment.

FIG. 3 is a schematic structural diagram of an apparatus 300 for storing log information provided by an embodiment of the present application. The device 300 includes: an identification module 310, a processing module 320, and a storage module 330.

The identification module 310 is configured to identify the first part and the second part of the plurality of log information from the plurality of log information, wherein the first part of the plurality of log information is less than or The same part of the pieces of log information equal to the first threshold, and the second part of the pieces of log information is the different part of the pieces of log information that is less than or equal to the second threshold;

The processing module 320 is configured to replace the second part of the multiple pieces of log information with a placeholder identifier to form a log information template corresponding to the multiple pieces of log information;

The storage module 330 is configured to store the log information template corresponding to the multiple pieces of log information and the second part of the multiple pieces of log information to the database respectively.

Optionally, in some embodiments, the apparatus 300 further includes:

The generating module 340 is configured to generate an identifier of the log information template corresponding to the multiple pieces of log information.

Optionally, in some embodiments, the storage module 330 is specifically configured to: store the second part of the plurality of log information and the identifier of the log information template in the first space of the database; The log information template and the identifier of the log information template are stored in the second space of the database.

It should be understood that the apparatus 300 for storing log information in the embodiment of the present application may be implemented by an application-specific integrated circuit (ASIC) or a programmable logic device (PLD), and the above PLD may be Complex programmable logical device (CPLD), field-programmable gate array (FPGA), generic array logic (GAL) or any combination thereof. When the method for storing log information shown in FIG. 2 can also be implemented by software, the device 300 for storing log information and its respective modules may also be software modules.

FIG. 4 is a schematic structural diagram of an apparatus 400 for storing log information provided by an embodiment of the present application. The apparatus 400 for storing log information includes at least one computing node 410. The computing node 410 may include a processing unit 411 and a communication interface 412. The processing unit 411 is used to execute functions defined by various software programs, for example, to implement storage of log information. Features. The communication interface 412 is used to communicate and interact with other computing nodes, and other devices may be other physical servers. Specifically, the communication interface 412 may be a network adapter card.

Optionally, the computing node 410 may further include an input/output interface 413, and the input/output interface 413 is connected to an input/output device for receiving input information and outputting operation results. The input/output interface 413 may be a mouse, a keyboard, a display, or an optical drive. Optionally, the computing node 410 may also include auxiliary storage 414, which is generally also referred to as external storage. The storage medium of the auxiliary storage 414 may be a magnetic medium (for example, a floppy disk, a hard disk, a magnetic tape), an optical medium (for example, an optical disk), or Semiconductor media (such as solid state drives), etc.

Optionally, the computing node 410 may further include a bus 415. Among them, the processing unit 411, the communication interface 412, the input/output interface 413, and the auxiliary memory 414 may be connected via a bus 415. The bus 415 may be a peripheral component interconnect standard (PCI) bus or an extended industry standard architecture (EISA) bus, etc. The bus 415 can be divided into an address bus, a data bus, a control bus, and so on. For ease of representation, only one line is used to represent in FIG. 4, but it does not mean that there is only one bus or one type of bus.

The processing unit 411 may have a variety of specific implementation forms. For example, the processing unit 411 may include a processor 4112 and a memory 4111, and the processor 4112 performs related operations of the embodiment shown in FIG. 2 according to program instructions stored in the memory 4111. The processor 4112 may be a central processing unit (central processing unit, CPU). The processor may also be other general-purpose processors, digital signal processors (digital signal processors, DSP), application specific integrated circuits (ASICs), ready-made programmable gate arrays (field programmable gate arrays, FPGAs) or other Programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. The general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like. Or, the processor 410 adopts one or more integrated circuits to execute related programs to implement the technical solutions provided in the embodiments of the present application.

It should be understood that the processor 4112 of the computing node 410 may run at least one of the identification module 310, the processing module 320, and the storage module 330 shown in FIG. 3 through program instructions stored in the memory 4111.

It should also be understood that the memory 4111 or the auxiliary memory 414 of the computing node 410 may also store the database described in FIG. 2.

The above-mentioned and other operations and/or functions of each unit in the apparatus 400 for storing log information are respectively for implementing the corresponding flow of the method in FIG. 2, and are not repeated here for brevity.

FIG. 5 is a schematic structural diagram of an apparatus 500 for querying log information provided by an embodiment of the present application. The log information query 500 includes: a receiving module 510, an acquiring module 520, and a returning module 530.

The receiving module 510 is configured to receive a query request, where the query request is used to query log information stored in the database;

The obtaining module 520 is configured to obtain the second part of the plurality of log information stored in the database and the log information template corresponding to the plurality of log information from the database according to the query request; The second part brings in the log information template corresponding to the multiple log information to obtain the log information corresponding to the query request;

Optionally, it may further include a return module 530, configured to return the corresponding log information to the client that issued the query request.

The returning module 530 and the receiving module 510 can also be implemented by the same module.

Optionally, in some embodiments, the apparatus 500 further includes:

The identification module 540 is configured to identify the first part and the second part of the plurality of log information from the plurality of log information;

The processing module 550 is configured to replace the second part of the multiple pieces of log information with a placeholder identifier to form a log information template corresponding to the multiple pieces of log information;

The storage module 560 is configured to store the log information template corresponding to the multiple pieces of log information and the second part of the multiple pieces of log information to the database respectively.

Optionally, in some embodiments, the apparatus 500 further includes:

The generating module 570 is configured to generate an identifier of the log information template corresponding to the multiple pieces of log information.

Optionally, in some embodiments, the storage module 560 is specifically configured to: store the second part of the multiple pieces of log information and the identifier of the log information template in the first space of the database; and store the log The information template and the identification of the log information template are stored in the second space of the database.

It should be understood that the device 500 for querying log information in the embodiment of the present application may be implemented by an application-specific integrated circuit (ASIC) or a programmable logic device (PLD), and the above PLD may be Complex programmable logical device (CPLD), field-programmable gate array (FPGA), generic array logic (GAL) or any combination thereof. When the method for querying log information can also be implemented by software, the device 500 for querying log information and its respective modules can also be software modules.

FIG. 6 is a schematic structural diagram of an apparatus 600 for querying log information provided by an embodiment of the present application. The apparatus 600 for querying log information includes at least one computing node 610. The computing node 610 may include a processing unit 611 and a communication interface 612. The processing unit 611 is used to execute functions defined by various software programs, for example, to store log information. Features. The communication interface 412 is used to communicate and interact with other computing nodes, and other devices may be other physical servers. Specifically, the communication interface 612 may be a network adapter card.

Optionally, the computing node 610 may further include an input/output interface 613, and the input/output interface 413 is connected to an input/output device for receiving input information and outputting operation results. The input/output interface 613 can be a mouse, a keyboard, a display, or an optical drive. Optionally, the computing node 610 may also include auxiliary storage 614, which is generally called external storage. The storage medium of the auxiliary storage 614 may be a magnetic medium (for example, a floppy disk, a hard disk, a magnetic tape), an optical medium (for example, an optical disk), or Semiconductor media (such as solid state drives), etc.

Optionally, the computing node 610 may further include a bus 615. Among them, the processing unit 611, the communication interface 612, the input/output interface 613, and the auxiliary memory 614 may be connected through the bus 615. The function and implementation manner of the bus 615 and the bus 415 are similar.

The processing unit 611 may have a variety of specific implementation forms. For example, the processing unit 611 may include a processor 6112 and a memory 6111, and the processor 6112 performs related operations of the foregoing embodiments according to program instructions stored in the memory 6111. The processor 6112 may be a central processing unit (central processing unit, CPU). The processor may also be other general-purpose processors, digital signal processors (digital signal processors, DSP), application specific integrated circuits (ASICs), ready-made programmable gate arrays (field programmable gate arrays, FPGAs) or other Programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. The general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like. Or, the processor 610 adopts one or more integrated circuits to execute related programs to implement the technical solutions provided in the embodiments of the present application.

It should be understood that the processor 6112 of the computing node 610 may run at least one of the receiving module 510, the acquiring module 520, and the returning module 530 shown in FIG. 5 through program instructions stored in the memory 6111.

It should also be understood that the memory 6111 or the auxiliary memory 614 of the computing node 610 may also store the database described in FIG. 2.

The above-mentioned and other operations and/or functions of each unit in the device 600 for querying log information are used to implement the corresponding procedures of the method for querying log information. For brevity, they will not be repeated here.

The foregoing embodiments may be implemented in whole or in part by software, hardware, firmware or any other combination. When implemented by software, the above-mentioned embodiments may be implemented in the form of a computer program product in whole or in part. The computer program product includes one or more computer instructions. When the computer program instructions are loaded or executed on a computer, the processes or functions described in the embodiments of the present application are generated in whole or in part. The computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium. For example, the computer instructions may be transmitted from a website, computer, server, or data center. Transmission to another website, computer, server or data center via wired (such as coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.). The computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server or a data center that includes one or more sets of available media. The usable medium may be a magnetic medium (for example, a floppy disk, a hard disk, and a magnetic tape), an optical medium (for example, a DVD), or a semiconductor medium. The semiconductor medium may be a solid state drive (SSD).

A person of ordinary skill in the art may be aware that the units and algorithm steps of the examples described in combination with the embodiments disclosed herein can be implemented by electronic hardware or a combination of computer software and electronic hardware. Whether these functions are performed by hardware or software depends on the specific application and design constraint conditions of the technical solution. Professional technicians can use different methods for each specific application to achieve the described functions.

Those skilled in the art can clearly understand that, for the convenience and conciseness of description, the specific working process of the above-described system, device, and unit can refer to the corresponding process in the foregoing method embodiment, which is not repeated here.

In the several embodiments provided in this application, it should be understood that the disclosed system, device, and method may be implemented in other ways. For example, the device embodiments described above are merely illustrative. For example, the division of the units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components may be combined or It can be integrated into another system, or some features can be ignored or not implemented. In addition, the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.

The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.

Claims

A method for storing log information, characterized in that the method includes:

The first part and the second part of the plurality of log information are identified from the plurality of log information, wherein the first part of the plurality of log information is a log that is less than or equal to a first threshold in the plurality of log information The same part of the information, the second part of the multiple pieces of log information is a different part of the multiple pieces of log information that is less than or equal to a second threshold value;

Replacing the second part of the multiple pieces of log information with a placeholder identifier to form a log information template corresponding to the multiple pieces of log information, the log information template including the first part of the multiple pieces of log information;

The log information template corresponding to the multiple pieces of log information and the second part of the multiple pieces of log information are respectively stored in a database.
The method of claim 1, wherein the method further comprises:

The identifier of the log information template corresponding to the multiple pieces of log information is generated.
The method according to claim 2, wherein the storing the log information template corresponding to the multiple pieces of log information and the second part of the multiple pieces of log information to a database respectively comprises:

Storing the second part of the plurality of log information and the identifier of the log information template in the first space of the database;

The log information template and the identifier of the log information template are stored in the second space of the database.
A method for querying log information, characterized in that the method is applied to a log information query system, the log information query system includes a database, and the database stores a second part of a plurality of log information and the plurality of logs A log information template corresponding to the information, where the log information template includes a first part of the plurality of log information, and the first part of the plurality of log information is less than or equal to a first threshold in the plurality of log information The second part of the multiple pieces of log information is a different part of the multiple pieces of log information that is less than or equal to a second threshold value in the log information, and the method includes:

Receiving a query request, the query request being used to query log information stored in a database;

Acquiring, from the database, the second part of the plurality of log information stored and the log information template corresponding to the plurality of log information from the database according to the query request;

Bring the second part of the multiple pieces of log information into the log information template corresponding to the multiple pieces of log information to obtain corresponding log information.
The method according to claim 4, characterized in that, before the receiving the query request, the method further comprises:

Identifying the first part of the plurality of log information and the second part of the plurality of log information from the plurality of log information;

Replacing the second part of the multiple pieces of log information with a placeholder identifier to form a log information template corresponding to the multiple pieces of log information;

The log information template corresponding to the multiple pieces of log information and the second part of the multiple pieces of log information are respectively stored in the database.
The method of claim 5, wherein the method further comprises:

The identifier of the log information template corresponding to the multiple pieces of log information is generated.
The method according to claim 6, wherein the method further comprises:

Storing the second part of the plurality of log information and the identifier of the log information template in the first space of the database;

The log information template and the identifier of the log information template are stored in the second space of the database.
A device for storing log information, characterized in that the device includes:

The identification module is configured to identify the first part and the second part of the plurality of log information from the plurality of log information, wherein the first part of the plurality of log information is less than or equal to The same part in the first threshold piece of log information, and the second part of the multiple pieces of log information is a different part in the multiple pieces of log information that is less than or equal to the second threshold piece of log information;

The processing module is configured to replace the second part of the multiple pieces of log information with a placeholder identifier to form a log information template corresponding to the multiple pieces of log information, and the log information template includes the first part of the multiple pieces of log information ；

The storage module is configured to store the log information template corresponding to the multiple pieces of log information and the second part of the multiple pieces of log information to the database respectively.
The device according to claim 8, wherein the device further comprises:

The generating module is used to generate the identifier of the log information template corresponding to the multiple pieces of log information.
The device according to claim 9, wherein the storage module is specifically configured to:

Storing the second part of the plurality of log information and the identifier of the log information template in the first space of the database;

The log information template and the identifier of the log information template are stored in the second space of the database.
A device for querying log information, wherein the device is applied to a log information query system, the log information query system includes a database, and the database stores a second part of a plurality of log information and the plurality of logs A log information template corresponding to the information, where the log information template includes a first part of the plurality of log information, and the first part of the plurality of log information is less than or equal to a first threshold in the plurality of log information The second part of the multiple pieces of log information is a different part of the multiple pieces of log information that is less than or equal to a second threshold value, and the device includes:

A receiving module, configured to receive a query request, the query request being used to query log information stored in the database;

The obtaining module is configured to obtain the second part of the multiple pieces of log information stored in the database and the log information template corresponding to the multiple pieces of log information from the database according to the query request; The second part brings in the log information template corresponding to the multiple pieces of log information to obtain the corresponding log information.
The device according to claim 11, wherein the device further comprises:

An identification module for identifying the first part and the second part of the plurality of log information from the plurality of log information;

A processing module, configured to replace the second part of the multiple pieces of log information with a placeholder identifier to form a log information template corresponding to the multiple pieces of log information;

The storage module is configured to store the log information template corresponding to the multiple pieces of log information and the second part of the multiple pieces of log information to the database respectively.
The device according to claim 12, wherein the device further comprises:

The generating module is used to generate the identifier of the log information template corresponding to the multiple pieces of log information.
The device according to claim 13, wherein the storage module is specifically configured to:

Storing the second part of the plurality of log information and the identifier of the log information template in the first space of the database;

The log information template and the identifier of the log information template are stored in the second space of the database.
A device for storing log information, characterized in that it includes at least one computing node, each computing node includes a processor and a memory, and when the device for storing log information is running, the processor runs the computer in the memory to execute Instructions to perform the method as claimed in any one of claims 1 to 3.
A device for querying log information, characterized in that it includes at least one computing node, each computing node includes a processor and a memory, and when the device for storing log information is running, the processor runs the computer in the memory to execute Instructions to perform the method as claimed in any one of claims 4 to 7.
A computer non-transitory non-volatile computer-readable storage medium, characterized by comprising a computer program, when the computer program is run on at least one computing node, the at least one computing node executes claims 1 to The method of any one of 3.
A computer non-transitory non-volatile computer readable storage medium, characterized by comprising a computer program, when the computer program runs on at least one computing node, the at least one computing node executes claims 4 to 7. The method of any one of 7.