WO2020155651A1 - Method and device for storing and querying log information - Google Patents

Method and device for storing and querying log information Download PDF

Info

Publication number
WO2020155651A1
WO2020155651A1 PCT/CN2019/107127 CN2019107127W WO2020155651A1 WO 2020155651 A1 WO2020155651 A1 WO 2020155651A1 CN 2019107127 W CN2019107127 W CN 2019107127W WO 2020155651 A1 WO2020155651 A1 WO 2020155651A1
Authority
WO
WIPO (PCT)
Prior art keywords
log information
multiple pieces
database
log
template
Prior art date
Application number
PCT/CN2019/107127
Other languages
French (fr)
Chinese (zh)
Inventor
李同军
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2020155651A1 publication Critical patent/WO2020155651A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/13File access structures, e.g. distributed indices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types

Definitions

  • This application relates to the storage field, and more specifically, to a method for storing log information, a method and device for querying log information, and a computer-readable storage medium.
  • the computer log is generated by the operating system, middleware, platform itself or program components developed by the user during the operation of the computer system. It records the operating status and usage of the equipment and the system itself. This description information mainly describes the system. The key operations performed and the errors and exceptions that occurred during the operation of the system. Through the analysis of the system log, the user can understand the problems that frequently or occasionally occur in the operation of the system, so that the operation and maintenance of the system can be improved in a targeted manner, and the safety and efficiency of the system operation can be improved.
  • the original log information is directly stored in the database.
  • a large amount of repeated information exists in the original log in the traditional log information storage solution, which results in a large amount of repeated information occupying most of the storage resources, resulting in a waste of resources.
  • the large amount of repeated information in the original log will make the retrieval efficiency low, and when the user performs data analysis on the large number of retrieved original logs, a large number of repeated parts It will make the user analysis process less real-time, which is not conducive to quickly discovering problems through logs.
  • This application provides a method for storing log information and a method for querying log information, which can reduce the storage space occupied by repeated parts of multiple log information, save storage resources, shorten the time for users to retrieve log information, and improve the log Use value of information.
  • a method for storing log information includes: identifying a first part and a second part of the plurality of log information from the plurality of log information, and replacing the plurality of pieces of log information with a placeholder identifier.
  • the second part of the log information forms a log information template corresponding to the multiple pieces of log information, the log information template includes the first part of the multiple pieces of log information, and the log information template corresponding to the multiple pieces of log information and all The second part of the multiple log information is respectively stored in the database.
  • the first part of the multiple pieces of log information is the same part in the multiple pieces of log information that is less than or equal to the first threshold value
  • the second part of the multiple pieces of log information is the same portion in the multiple pieces of log information. Different parts of the log information that are less than or equal to the second threshold.
  • the first threshold may be less than or equal to the total number of pieces of log information
  • the second threshold may be less than or equal to the total number of pieces of log information
  • the first threshold and the second threshold may or may not be equal.
  • the original log information can be processed.
  • the same part and the different part are identified from multiple pieces of log information less than or equal to the number of the multiple pieces of log information. Only one part of the same part of multiple log information is stored, so that the simplified log information does not need a long string of characters carried in the log information template, which reduces the storage space occupied by the log information and saves the storage in the log service system LF.
  • the method further includes: generating an identifier of the log information template corresponding to the multiple pieces of log information.
  • the unique identifier corresponding to each log information template can be calculated, so that the log information can be restored through the log information template identifier in the later process of user retrieval of the log information, so that the user can obtain To the original log information. Not only can we analyze the original log information to understand the frequent or accidental problems of the node in the operation, but also analyze the distribution law of the logo corresponding to the log information template to discover potential system abnormalities, and send an alarm in advance. .
  • the user in the process of retrieving the stored log information through the log service API, the user can search for keywords in the log information template.
  • the log information required by the user is searched by keywords in the log information template. Since the log information template will not be repeated a lot, the log information required by the user can be quickly found, reducing the search time and improving The real-time nature of the user analysis process is conducive to quickly discovering problems through logs.
  • the second part of the multiple pieces of log information and the identifier of the log information template are stored in the first space of the database; the log information template and the log information The identification of the template is stored in the second space of the database.
  • a method for querying log information is provided.
  • the method is applied to a log information query system.
  • the log information query system includes a database that stores a second part of a plurality of log information and the A log information template corresponding to a piece of log information, the log information template includes a first part of the plurality of log information, and the first part of the plurality of log information is less than or equal to a first threshold in the plurality of log information
  • the same part in the log information, and the second part of the multiple pieces of log information is a different part of the multiple pieces of log information that is less than or equal to a second threshold value
  • the method includes:
  • Receiving a query request the query request being used to query log information stored in a database; acquiring from the database according to the query request the second part of the plurality of log information stored and corresponding to the plurality of log information Log information template; bring the second part of the multiple pieces of log information into the log information template corresponding to the multiple pieces of log information to obtain corresponding log information, and return the corresponding log information to the client that issued the query request ,
  • the second part of the multiple pieces of log information can be brought into the log information template corresponding to the multiple pieces of log information, and the original log information formed is returned to the user. It enables users to analyze the original log information obtained and understand the problems that frequently or occasionally occur during the operation of the node.
  • the method before the receiving the query request, the method further includes: identifying the first part and the second part of the plurality of log information from the plurality of log information;
  • the log information template corresponding to the multiple pieces of log information and the second part of the multiple pieces of log information are respectively stored in the database.
  • the method further includes: generating an identifier of the log information template corresponding to the multiple pieces of log information.
  • the method further includes: storing the second part of the plurality of log information and the identifier of the log information template in the first space of the database; and storing the log information The template and the identifier of the log information template are stored in the second space of the database.
  • a device for storing log information includes:
  • the identification module is configured to identify the first part and the second part of the plurality of log information from the plurality of log information, wherein the first part of the plurality of log information is less than or equal to The same part in the first threshold piece of log information, and the second part of the multiple pieces of log information is a different part in the multiple pieces of log information that is less than or equal to the second threshold piece of log information;
  • the processing module is configured to replace the second part of the multiple pieces of log information with a placeholder identifier to form a log information template corresponding to the multiple pieces of log information, and the log information template includes the first part of the multiple pieces of log information ;
  • the storage module is configured to store the log information template corresponding to the multiple pieces of log information and the second part of the multiple pieces of log information to the database respectively.
  • the device further includes: a generating module, configured to generate an identifier of the log information template corresponding to the multiple pieces of log information.
  • the storage module is specifically configured to: store the second part of the multiple pieces of log information and the identifier of the log information template in the first space of the database; and store the The log information template and the identifier of the log information template are stored in the second space of the database.
  • a device for querying log information is provided.
  • the device is applied to a log information query system.
  • the log information query system includes a database that stores a second part of a plurality of log information and the A log information template corresponding to a piece of log information, the log information template includes a first part of the plurality of log information, and the first part of the plurality of log information is less than or equal to a first threshold in the plurality of log information.
  • the device includes:
  • a receiving module configured to receive a query request, the query request being used to query log information stored in the database
  • the obtaining module is configured to obtain the second part of the plurality of log information stored in the database and the log information template corresponding to the plurality of log information from the database according to the query request; The second part of the information is brought into the log information template corresponding to the multiple pieces of log information to obtain the corresponding log information.
  • the device for querying log information may further include a returning module for returning the corresponding log information to the client that issued the query request
  • the device further includes: an identification module, configured to identify the first part and the second part of the plurality of log information from the plurality of log information;
  • a processing module configured to replace the second part of the multiple pieces of log information with a placeholder identifier to form a log information template corresponding to the multiple pieces of log information
  • the storage module is configured to store the log information template corresponding to the multiple pieces of log information and the second part of the multiple pieces of log information to the database respectively.
  • the device further includes: a generating module, configured to generate an identifier of the log information template corresponding to the multiple pieces of log information.
  • the storage module is specifically configured to: store the second part of the plurality of log information and the identifier of the log information template in the first space of the database; and store the log information The template and the identifier of the log information template are stored in the second space of the database.
  • a device for storing log information includes at least one computing node, and each computing node includes a processor and a memory.
  • the processor of the at least one computing node is configured to execute the program code stored in the memory, so as to execute the foregoing first aspect or the method in the possible implementation manner of the first aspect.
  • a device for querying log information includes at least one computing node, and each computing node includes a processor and a memory.
  • the processor of the at least one computing node is configured to execute the program code stored in the memory, so as to execute the foregoing second aspect or the method in the possible implementation manner of the second aspect.
  • the present application provides a non-transitory, non-volatile computer-readable storage medium, which stores instructions in the computer-readable storage medium, and when it runs on at least one computing node, the at least one computing node Perform the foregoing first aspect or the method in the possible implementation of the first aspect.
  • the present application provides a non-transitory, non-volatile computer-readable storage medium that stores instructions in the computer-readable storage medium, and when it runs on at least one computing node, the at least one computing node Perform the foregoing second aspect or the method in the possible implementation of the second aspect.
  • the present application provides a computer program product containing instructions that, when it runs on at least one computing node, causes at least one computing node to execute the above-mentioned first aspect or the method in the possible implementation of the first aspect.
  • this application provides a computer program product containing instructions, which when run on at least one computing node, causes at least one computing node to execute the above-mentioned first aspect or the method in the possible implementation of the first aspect.
  • FIG. 1 is a schematic block diagram of a distributed log service system 100 applied to an embodiment of the present application.
  • Fig. 2 is a schematic flowchart of a method for storing log information provided by an embodiment of the present application.
  • FIG. 3 is a schematic structural diagram of an apparatus 300 for storing log information provided by an embodiment of the present application.
  • FIG. 4 is a schematic structural diagram of an apparatus 400 for storing log information provided by an embodiment of the present application.
  • FIG. 5 is a schematic structural diagram of an apparatus 500 for querying log information provided by an embodiment of the present application.
  • FIG. 6 is a schematic structural diagram of an apparatus 600 for querying log information provided by an embodiment of the present application.
  • the computer log is generated by the operating system, middleware, platform itself or program components developed by the user during the operation of the computer system. It records the operating status and usage of the equipment and the system itself. This description information mainly describes the system. The key operations performed and the errors and exceptions that occurred during the operation of the system. Through the analysis of the system log, the user can understand the problems that frequently or occasionally occur in the operation of the system, so that the operation and maintenance of the system can be improved in a targeted manner, and the safety and efficiency of the system operation can be improved.
  • a dedicated log system or log service is required to collect and store log information from a large number of managed nodes, and provide related retrieval interfaces for users to filter and analyze the log data they care about.
  • the distributed logging system 100 will be described in detail below.
  • FIG. 1 is a schematic block diagram of a distributed log service system 100 applied to an embodiment of the present application.
  • the system 100 may include a server 110 and at least one node.
  • the embodiment of the present application does not specifically limit the number of at least one node.
  • the embodiment of the present application uses the node 120 and the node 130 as examples for description.
  • the node 120 and the node 130 may be different virtual machines (virtual machines, VMs) installed on one physical host, or may be VMs installed on different physical hosts.
  • the node 120 and the node 130 may also be physical hosts.
  • At least one node in the log service system 100 can generate corresponding log information, which records the key operations performed by at least one node and the running Errors and exceptions that occurred during the process. At least one node can save the generated log information in a corresponding log file.
  • users can not only understand the problems that occur frequently or occasionally in the operation of at least one node, so as to improve the operation and maintenance of at least one node in a targeted manner. It is also possible to discover potential system abnormalities by analyzing the distribution law of the identifiers corresponding to the log information template, and send out alarm reminders in advance.
  • a log collection agent 121 and a log collection agent 131 are respectively deployed on the node 120 and the node 130.
  • the log collection agent 121 in the node 120 may obtain the log information saved in the log file in the designated directory of the node 120, and report the obtained log information of the node 120 to the server 110 through the message middleware 114.
  • the log collection agent 131 in the node 130 can obtain the log information saved in the log file under the specified directory in the node 130, and report the obtained log information of the node 130 to the server 110 through the message middleware 114.
  • log collection agent module 121 and the log collection agent module 131 may be modules implemented by software programs.
  • the data preprocessing module 113 in the log service system 110 may preprocess the reported log information after receiving the log information reported by the node 120 and/or the node 130 respectively. And the processed log information is stored in the distributed database 112 in the server 110.
  • the data preprocessing module 113 in the server 110 may, after receiving the log information of at least one node reported by the message middleware 114, tag the log information and store it in the distributed database 112. So that when searching for related log information, users can search related log information through tag information.
  • the tag information may be carried by the log information itself, or may be obtained by the log collection agent in at least one node according to the node where the log information is located and the attributes of the application service running on it.
  • the embodiment of this application does not specifically limit the content of the label information, which may include but is not limited to: the storage path of the log file where the log information is located, the internet protocol (IP) address of the node where the log information is located, and the service location of the log information
  • IP internet protocol
  • the region information of the public cloud the service information (for example, service identifier (ID)) to which the log information belongs, the component information (for example, component ID) to which the log information belongs, the tenant ID, etc.
  • the log service application program interface (application program interface, API) 111 in the server 110 can provide users with retrieval functions. Users can use the log service API 111 to interact with the log service system 100 through a browser or command line tool. .
  • the user can search the log information he needs through the log service API 111, for example, can obtain the log information stored in the distributed database 112 by searching various tag information and log keywords of the log information.
  • the operation of the node can be maintained in a targeted manner, thereby improving the efficiency and efficiency of the node's system operation.
  • the original log information is directly stored in the database, which stores a large amount of repeated information.
  • the direct storage of the original log information will also reduce the use efficiency of the log information. For example, in a fuzzy search scenario, it takes a long time to search in a huge amount of original log information.
  • log information for anomaly detection and other related data mining analysis a large amount of log information needs to be analyzed in a centralized manner. The real-time performance is poor. Conducive to quickly discover problems.
  • the embodiment of the present application provides a method for storing data, which can process the original log information, and reduce the storage space occupied by the original data by eliminating a large amount of repeated data in the original data.
  • FIG. 1 is a possible scenario applied to the embodiment of the present application, and the method provided in the embodiment of the present application can also be applied to a scenario where a large amount of repeated unstructured data is stored.
  • Fig. 2 is a schematic flowchart of a method for storing log information provided by an embodiment of the present application. As shown in Fig. 1, the method may include steps 210-230, and steps 210-230 will be described in detail below.
  • Step 210 Identify the first part and the second part of the plurality of log information from the plurality of log information.
  • the first part and the second part of the multiple pieces of log information can be identified from the pieces of log information, where the first part may be the same part of the multiple pieces of log information that is less than or equal to the first threshold.
  • the second part is a different part of the multiple pieces of log information that is less than or equal to the second threshold.
  • the first threshold may be less than or equal to the total number of pieces of log information
  • the second threshold may be less than or equal to the total number of pieces of log information
  • first threshold and the second threshold may be equal or unequal, which is not specifically limited in the embodiment of the present application.
  • the log collection agent of at least one node may be the log collection agent of at least one node that compares and identifies pieces of log information that are less than or equal to the first threshold among multiple pieces of log information, or it may be the data preprocessing module 113 in the server 110. Compare and identify pieces of log information that are less than or equal to the first threshold among multiple pieces of log information.
  • the log collection agent 121 may obtain storage from log files in a specified directory. After the original log information, identify the log information that is less than or equal to the first threshold among the multiple log information, and identify the first part and the second part.
  • the data preprocessing module 113 in the server 110 may After the multiple original log information reported by the log collection agent in at least one node, compare and identify the pieces of log information that are less than or equal to the first threshold among the multiple pieces of log information.
  • the threshold log information identifies the first part and the second part above.
  • Step 220 Replace the second part of the multiple pieces of log information with placeholder identifiers to form a log information template corresponding to the multiple pieces of log information.
  • the embodiment of the present application may process the second part of the multiple pieces of log information identified from the foregoing.
  • a placeholder identifier may be used to replace the second part of the multiple pieces of log information to form one or more pieces of log information.
  • M is a positive integer greater than 1.
  • the type in the second part of the multiple log information may be distinguished by placeholder identifiers, or the type of the changed part in the second part of the multiple log information may not be distinguished, which is not specifically limited in this application.
  • placeholder identifiers when using placeholder identifiers to distinguish the variable types in the second part of multiple log messages, you can use %d as a placeholder identifier for numeric variables, and %s as a string variable Placeholder identifier.
  • Step 230 Store the log information template corresponding to the multiple pieces of log information and the second part of the multiple pieces of log information to the database respectively.
  • multiple log information can be distinguished and compared, and the first part and the second part of the multiple log information can be identified.
  • a log information template corresponding to multiple log information is stored in one entity of the distributed database 112, and the changed part of each log information is stored in another entity of the distributed database 112.
  • the original log information can be processed. After identifying the changed part and the unchanged part of the multiple log information, only one unchanged part of each log information is stored, and each log information The information is condensed to include its changed parts, the logo of the log information template, and label information related to the log information.
  • the simplified log information does not need to carry a long string of character sequences in the template, which reduces the storage space occupied by the log information and saves the storage resource consumption in the log service system.
  • a template identifier corresponding to each log information template can be generated, and each log information can be condensed to include: log information template identifier, variable value, timestamp, tag information, etc. So that the user can obtain the log information template through the log information template identifier included in each log information during the process of retrieving and restoring the log information to the original information. Thereby, the log information can be restored to the original state according to the log information template and the timestamp and variable value included in the log information.
  • the timestamp included in the reduced log information may be recorded by the log information itself.
  • the label information can be recorded by the log information itself, or it can be obtained by the log collection agent in at least one node according to the attributes of the node where the log information is located and the application service running on it. For details, please refer to the label information above Description, not repeat them here.
  • the feature of a piece of information can be extracted by an information fingerprint extraction algorithm, and the feature of this information can be converted into a set of codes.
  • the code can be used as a unique template identifier corresponding to each log information module, so that the same log information template under different nodes can have the same identifier.
  • a unique template identifier corresponding to each log information module can be generated through a message-digest (MD5) algorithm.
  • the MD5 algorithm is a widely used cryptographic hash function, which can generate a 128-bit (16-byte) hash value to ensure complete and consistent information transmission.
  • the implementation process of generating the unique template identifier corresponding to each log information module through the MD5 algorithm will be described in detail below in conjunction with specific embodiments, and will not be repeated here.
  • the same log information template under different nodes can have the same identifier, thereby reducing uniformity for each log information
  • the template allocates the additional overhead and complexity caused by the communication between the nodes of the corresponding identifier.
  • Step 1 Identify the changed part and the unchanged part in each of the three original log messages.
  • a pattern extraction algorithm can be used to identify and extract the changed part and the unchanged part of each of the three original log information.
  • the iterative partitioning log mining (IPLOM) algorithm can be used to identify the changed part and the unchanged part of each original log information in the 3 original log information, and can use the placeholder The identifier replaces the changed part of each original log information in the 3 original log information.
  • the recognition results after comparing the above three original log information are as follows:
  • the unchanging part [10.0.26.102][INF0][bulk - thread-][BulkHandlerRunable:submits 138]get data from queue timeout!
  • the original log information 1 can be expressed as:
  • the comparison result of the original log information 2 with the original log information 1 and 3 is as follows:
  • the unchanging part [10.0.26.102][INF0][bulk - thread-][BulkHandlerRunable:submits 138]get data from queue timeout!
  • the original log information 2 can be expressed as:
  • the unchanging part [10.0.26.102][INF0][bulk - thread-][BulkHandlerRunable:submits 138]get data from queue timeout!
  • the original log information 3 can be expressed as:
  • the log information template identified in the three original log messages is: tm[INF0][bulk - thread-%d][BulkHandlerRunable:submits 138]get data from queue timeout!
  • Step 2 Generate a unique identifier corresponding to the log information template.
  • different nodes may have the same log information template, so that the log information template corresponds to the identifier of the template one-to-one.
  • the embodiment of the present application may use digital fingerprint technology to realize a one-to-one correspondence between the log information template and the identifier of the template.
  • the MD5 algorithm is used to generate a unique identifier corresponding to the log information template.
  • the MD5 algorithm calculates the character string of the aforementioned log information template to generate a 128-bit (16-byte) hash value, which can be used as a unique identifier corresponding to the aforementioned log information template.
  • a 64-bit (8-byte) hash value can also be selected from a 128-bit (16-byte) hash value generated by the MD5 algorithm, and the 64-bit hash value is used as the actual log The corresponding unique identifier of the information template.
  • a 64-bit hash value can also be selected from a 128-bit hash value generated by the MD5 algorithm, and the 64-bit hash value is used as the actual log The corresponding unique identifier of the information template.
  • a 64-bit hash value from a 128-bit hash value.
  • a 64-bit odd part is selected from the 128-bit hash value as the 64-bit hash value.
  • a portion of 64 even bits is selected from the 128-bit hash value as the 64-bit hash value.
  • the middle 64 bits are selected from the 128-bit hash value as the 64-bit hash value.
  • the 128-bit hash value can also be folded and added to obtain a 64-bit hash value.
  • the tid field may be used to indicate the identification of the aforementioned log information template.
  • Step 3 Convert the log time field in the original log information.
  • step 3 is optional.
  • the log time in the original log information can be converted to convert the log time into the millisecond offset of the base time.
  • tm in the calculated original log information 1 is 1541770754696
  • tm in the original log information 2 is 1541770755698
  • tm in the original log information 3 is 1541770756066.
  • Step 4 Represent the changed part of each original log information.
  • a field storing the variable value in each original log information may be added.
  • the mg field is used to represent the variable value list string.
  • variable value list string mg in the original log information 1 (2)
  • variable value list string mg in the original log information 2 (5)
  • variable value list string mg in the original log information 3 ( 3).
  • a field storing the number of variable values in each original log information can also be added.
  • the VL field is used to represent the number of variable values.
  • Step 5 Add tag information of each original log information.
  • relevant tag information can be added to each original log information, so that the required log information can be obtained through various tag information.
  • the embodiment of the present application takes the label information as the path of the log file where the log information is located and the IP address of the node where the log information is located as an example for description.
  • the IP in the original log information 1 10.0.26.102
  • the IP in the original log information 2 10.0.26.102
  • the IP in the original log information 3 10.0.26.102.
  • the embodiment of the present application can perform information fingerprint extraction on the path of each log file to form a unique identifier corresponding to the path of each log file, thereby saving storage Resources.
  • the specific information fingerprint extraction and identification process is similar to the calculation process of the log information template identification. For details, please refer to the process of generating the log information template identification, which will not be repeated here.
  • the pid field is used to indicate the path of the log file where the log information is located.
  • the pid in the original log information 1 is 864995200973638000
  • the pid in the original log information 2 is 864995200973638000
  • the pid in the original log information 3 is 864995200973638000.
  • the above three original log information are condensed into: timestamp + template identification + variable + related label information.
  • the template identifier and the template can be stored in different tables of the distributed database 112 respectively.
  • the original log information can be split into three parts and stored separately.
  • Table 1 is used to store the reduced log information
  • Table 2 is used to store the path of the log file where the log information is located
  • Table 3 is used to store three templates of original log information.
  • the simplified log information 1 can be expressed as:
  • the simplified log information 2 can be expressed as:
  • the simplified log information 3 can be expressed as:
  • the user in the process of retrieving the stored log information through the log service API 111 in the embodiments of the present application, the user can search for keywords in the log information template, so as to quickly find Log information that users need to reduce search time.
  • the embodiments of this application can also restore the reduced log information stored on the log service API 111 side, so that the user can obtain the original log information, and the original log Analysis of log information to understand the frequent or accidental problems of the node during operation.
  • the user can input various tag information and log keywords through the log service API 111 to obtain the required log information.
  • the distributed database 112 may obtain the associated log information template and the variable value list string mg according to the log information template identifier tid included in the keyword, and according to the obtained log information template and the variable value list string mg, the variable value The list string mg is automatically brought into the position of the placeholder identifier of the log information template, so as to restore the original log information.
  • the distributed database 112 in the embodiment of the present application may be a distributed relational database or a non-relational database with a join function.
  • a distributed relational database or a non-relational database with a join function can obtain the associated log information template and the variable value list string mg through the log information template identifier tid.
  • the user can interact with the log service system 100 using the interface provided by the log service API 111 through a browser.
  • the user may input a query request through a browser, and the query request may include keywords and/or related tag information for the log information to be queried.
  • the database may obtain the stored log information from the table of the database according to the keywords and/or related tag information of the log information in the query request, and may feed back the stored log information to the user. Users can not only analyze the log information to understand the problems that occur frequently or occasionally during the operation of the node, but also analyze the distribution law of the identifier corresponding to the log information template to discover potential system abnormalities and send out alarms in advance.
  • the keyword included in the query request may be a certain character string in the log information.
  • the query request entered by the user is to query the log information of "January 1, 2018".
  • the database can determine the value of tm according to the query time "January 1, 2018” .
  • the log information that meets the requirements can be obtained from the log information stored in Table 1 of the database.
  • the join function of the database can obtain the log information template corresponding to the log information template identifier in the log information from Table 3, bring the log information stored in Table 1 into the log information template, and generate the original log information, and The original log information can be fed back to the user.
  • the user can also enter a character string in the log information template in the query request, and the database can obtain the log information template corresponding to the log information template identifier in the log information in Table 3.
  • the join function of the database can also obtain the log information with the log information template identifier from the log information stored in Table 1, and bring the log information stored in Table 1 to the log information template to generate original log information, and The original log information is fed back to the user.
  • the following takes the query request entered by the user as the log information generated on the query IP address "10.0.26.102" as an example to explain the process of the user querying the log information in detail
  • the original log information provided to the front-end interface of Log Service API 111 is as follows:
  • FIG. 3 is a schematic structural diagram of an apparatus 300 for storing log information provided by an embodiment of the present application.
  • the device 300 includes: an identification module 310, a processing module 320, and a storage module 330.
  • the identification module 310 is configured to identify the first part and the second part of the plurality of log information from the plurality of log information, wherein the first part of the plurality of log information is less than or The same part of the pieces of log information equal to the first threshold, and the second part of the pieces of log information is the different part of the pieces of log information that is less than or equal to the second threshold;
  • the processing module 320 is configured to replace the second part of the multiple pieces of log information with a placeholder identifier to form a log information template corresponding to the multiple pieces of log information;
  • the storage module 330 is configured to store the log information template corresponding to the multiple pieces of log information and the second part of the multiple pieces of log information to the database respectively.
  • the apparatus 300 further includes:
  • the generating module 340 is configured to generate an identifier of the log information template corresponding to the multiple pieces of log information.
  • the storage module 330 is specifically configured to: store the second part of the plurality of log information and the identifier of the log information template in the first space of the database; The log information template and the identifier of the log information template are stored in the second space of the database.
  • the apparatus 300 for storing log information in the embodiment of the present application may be implemented by an application-specific integrated circuit (ASIC) or a programmable logic device (PLD), and the above PLD may be Complex programmable logical device (CPLD), field-programmable gate array (FPGA), generic array logic (GAL) or any combination thereof.
  • ASIC application-specific integrated circuit
  • PLD programmable logic device
  • CPLD Complex programmable logical device
  • FPGA field-programmable gate array
  • GAL generic array logic
  • the device 300 for storing log information and its respective modules may also be software modules.
  • FIG. 4 is a schematic structural diagram of an apparatus 400 for storing log information provided by an embodiment of the present application.
  • the apparatus 400 for storing log information includes at least one computing node 410.
  • the computing node 410 may include a processing unit 411 and a communication interface 412.
  • the processing unit 411 is used to execute functions defined by various software programs, for example, to implement storage of log information.
  • the communication interface 412 is used to communicate and interact with other computing nodes, and other devices may be other physical servers. Specifically, the communication interface 412 may be a network adapter card.
  • the computing node 410 may further include an input/output interface 413, and the input/output interface 413 is connected to an input/output device for receiving input information and outputting operation results.
  • the input/output interface 413 may be a mouse, a keyboard, a display, or an optical drive.
  • the computing node 410 may also include auxiliary storage 414, which is generally also referred to as external storage.
  • the storage medium of the auxiliary storage 414 may be a magnetic medium (for example, a floppy disk, a hard disk, a magnetic tape), an optical medium (for example, an optical disk), or Semiconductor media (such as solid state drives), etc.
  • the computing node 410 may further include a bus 415.
  • the processing unit 411, the communication interface 412, the input/output interface 413, and the auxiliary memory 414 may be connected via a bus 415.
  • the bus 415 may be a peripheral component interconnect standard (PCI) bus or an extended industry standard architecture (EISA) bus, etc.
  • the bus 415 can be divided into an address bus, a data bus, a control bus, and so on. For ease of representation, only one line is used to represent in FIG. 4, but it does not mean that there is only one bus or one type of bus.
  • the processing unit 411 may have a variety of specific implementation forms.
  • the processing unit 411 may include a processor 4112 and a memory 4111, and the processor 4112 performs related operations of the embodiment shown in FIG. 2 according to program instructions stored in the memory 4111.
  • the processor 4112 may be a central processing unit (central processing unit, CPU).
  • the processor may also be other general-purpose processors, digital signal processors (digital signal processors, DSP), application specific integrated circuits (ASICs), ready-made programmable gate arrays (field programmable gate arrays, FPGAs) or other Programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
  • the general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.
  • the processor 410 adopts one or more integrated circuits to execute related programs to implement the technical solutions provided in the embodiments of the present application.
  • the processor 4112 of the computing node 410 may run at least one of the identification module 310, the processing module 320, and the storage module 330 shown in FIG. 3 through program instructions stored in the memory 4111.
  • the memory 4111 or the auxiliary memory 414 of the computing node 410 may also store the database described in FIG. 2.
  • each unit in the apparatus 400 for storing log information are respectively for implementing the corresponding flow of the method in FIG. 2, and are not repeated here for brevity.
  • FIG. 5 is a schematic structural diagram of an apparatus 500 for querying log information provided by an embodiment of the present application.
  • the log information query 500 includes: a receiving module 510, an acquiring module 520, and a returning module 530.
  • the receiving module 510 is configured to receive a query request, where the query request is used to query log information stored in the database;
  • the obtaining module 520 is configured to obtain the second part of the plurality of log information stored in the database and the log information template corresponding to the plurality of log information from the database according to the query request; The second part brings in the log information template corresponding to the multiple log information to obtain the log information corresponding to the query request;
  • it may further include a return module 530, configured to return the corresponding log information to the client that issued the query request.
  • a return module 530 configured to return the corresponding log information to the client that issued the query request.
  • the returning module 530 and the receiving module 510 can also be implemented by the same module.
  • the apparatus 500 further includes:
  • the identification module 540 is configured to identify the first part and the second part of the plurality of log information from the plurality of log information;
  • the processing module 550 is configured to replace the second part of the multiple pieces of log information with a placeholder identifier to form a log information template corresponding to the multiple pieces of log information;
  • the storage module 560 is configured to store the log information template corresponding to the multiple pieces of log information and the second part of the multiple pieces of log information to the database respectively.
  • the apparatus 500 further includes:
  • the generating module 570 is configured to generate an identifier of the log information template corresponding to the multiple pieces of log information.
  • the storage module 560 is specifically configured to: store the second part of the multiple pieces of log information and the identifier of the log information template in the first space of the database; and store the log The information template and the identification of the log information template are stored in the second space of the database.
  • the device 500 for querying log information in the embodiment of the present application may be implemented by an application-specific integrated circuit (ASIC) or a programmable logic device (PLD), and the above PLD may be Complex programmable logical device (CPLD), field-programmable gate array (FPGA), generic array logic (GAL) or any combination thereof.
  • ASIC application-specific integrated circuit
  • PLD programmable logic device
  • CPLD Complex programmable logical device
  • FPGA field-programmable gate array
  • GAL generic array logic
  • the device 500 for querying log information and its respective modules can also be software modules.
  • FIG. 6 is a schematic structural diagram of an apparatus 600 for querying log information provided by an embodiment of the present application.
  • the apparatus 600 for querying log information includes at least one computing node 610.
  • the computing node 610 may include a processing unit 611 and a communication interface 612.
  • the processing unit 611 is used to execute functions defined by various software programs, for example, to store log information.
  • the communication interface 412 is used to communicate and interact with other computing nodes, and other devices may be other physical servers.
  • the communication interface 612 may be a network adapter card.
  • the computing node 610 may further include an input/output interface 613, and the input/output interface 413 is connected to an input/output device for receiving input information and outputting operation results.
  • the input/output interface 613 can be a mouse, a keyboard, a display, or an optical drive.
  • the computing node 610 may also include auxiliary storage 614, which is generally called external storage.
  • the storage medium of the auxiliary storage 614 may be a magnetic medium (for example, a floppy disk, a hard disk, a magnetic tape), an optical medium (for example, an optical disk), or Semiconductor media (such as solid state drives), etc.
  • the computing node 610 may further include a bus 615.
  • the processing unit 611, the communication interface 612, the input/output interface 613, and the auxiliary memory 614 may be connected through the bus 615.
  • the function and implementation manner of the bus 615 and the bus 415 are similar.
  • the processing unit 611 may have a variety of specific implementation forms.
  • the processing unit 611 may include a processor 6112 and a memory 6111, and the processor 6112 performs related operations of the foregoing embodiments according to program instructions stored in the memory 6111.
  • the processor 6112 may be a central processing unit (central processing unit, CPU).
  • the processor may also be other general-purpose processors, digital signal processors (digital signal processors, DSP), application specific integrated circuits (ASICs), ready-made programmable gate arrays (field programmable gate arrays, FPGAs) or other Programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
  • the general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.
  • the processor 610 adopts one or more integrated circuits to execute related programs to implement the technical solutions provided in the embodiments of the present application.
  • the processor 6112 of the computing node 610 may run at least one of the receiving module 510, the acquiring module 520, and the returning module 530 shown in FIG. 5 through program instructions stored in the memory 6111.
  • the memory 6111 or the auxiliary memory 614 of the computing node 610 may also store the database described in FIG. 2.
  • each unit in the device 600 for querying log information is used to implement the corresponding procedures of the method for querying log information. For brevity, they will not be repeated here.
  • the foregoing embodiments may be implemented in whole or in part by software, hardware, firmware or any other combination.
  • the above-mentioned embodiments may be implemented in the form of a computer program product in whole or in part.
  • the computer program product includes one or more computer instructions.
  • the computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices.
  • the computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium.
  • the computer instructions may be transmitted from a website, computer, server, or data center.
  • the computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server or a data center that includes one or more sets of available media.
  • the usable medium may be a magnetic medium (for example, a floppy disk, a hard disk, and a magnetic tape), an optical medium (for example, a DVD), or a semiconductor medium.
  • the semiconductor medium may be a solid state drive (SSD).
  • the disclosed system, device, and method may be implemented in other ways.
  • the device embodiments described above are merely illustrative.
  • the division of the units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components may be combined or It can be integrated into another system, or some features can be ignored or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.

Abstract

The present application provides a method for storing log information. The method comprises: identifying a first part and a second part of multiple pieces of log information from the multiple pieces of log information, the first part of the multiple pieces of log information being a same part in the multiple pieces of log information which is less than or equal to a first threshold piece of log information, and the second part of the multiple pieces of log information being a different part in the multiple pieces of log information which is less than or equal to a second threshold piece of log information; replacing the second part of the multiple pieces of log information with a placeholder identifier to form a log information template corresponding to the multiple pieces of log information; and storing the log information template corresponding to the multiple pieces of log information and the second part of the multiple pieces of log information into a database, respectively. By means of the technical solution provided by the present application, a storage space occupied by the same part in the multiple pieces of log information can be reduced, and storage resources are saved.

Description

存储、查询日志信息的方法、装置Method and device for storing and querying log information 技术领域Technical field
本申请涉及存储领域,并且更具体地,涉及一种存储日志信息的方法、查询日志信息的方法、装置及计算机可读存储介质。This application relates to the storage field, and more specifically, to a method for storing log information, a method and device for querying log information, and a computer-readable storage medium.
背景技术Background technique
计算机日志是在计算机系统运行过程中由操作系统,中间件,平台自身产生或者由用户开发的程序组件产生的,记录了设备和系统自身的运行状态和使用情况,这些描述信息主要描述了系统所进行的关键操作以及系统在运行过程中所发生的错误和异常等。用户可以通过对系统日志的分析,了解系统在运行中经常出现或者偶然出现的问题,从而可以有针对性地改善对系统的运营维护,进而提高系统运行的安全和效率。The computer log is generated by the operating system, middleware, platform itself or program components developed by the user during the operation of the computer system. It records the operating status and usage of the equipment and the system itself. This description information mainly describes the system. The key operations performed and the errors and exceptions that occurred during the operation of the system. Through the analysis of the system log, the user can understand the problems that frequently or occasionally occur in the operation of the system, so that the operation and maintenance of the system can be improved in a targeted manner, and the safety and efficiency of the system operation can be improved.
传统的日志信息存储方案中,将原始的日志信息直接存储在数据库中。一方面,传统的日志信息存储方案中的原始日志中存在大量重复的信息,从而导致大量重复的信息占用了大多数的存储资源,造成资源上的浪费。另一方面,在用户对存储的日志信息进行检索的过程中,原始日志中大量重复的信息会使得检索效率低下,并且用户在对检索到的大量的原始日志进行数据分析时,大量重复的部分会使得用户分析过程的实时性较差,不利于通过日志快速发现问题。In the traditional log information storage solution, the original log information is directly stored in the database. On the one hand, a large amount of repeated information exists in the original log in the traditional log information storage solution, which results in a large amount of repeated information occupying most of the storage resources, resulting in a waste of resources. On the other hand, in the process of retrieving the stored log information by the user, the large amount of repeated information in the original log will make the retrieval efficiency low, and when the user performs data analysis on the large number of retrieved original logs, a large number of repeated parts It will make the user analysis process less real-time, which is not conducive to quickly discovering problems through logs.
发明内容Summary of the invention
本申请提供一种存储日志信息的方法和查询日志信息的方法,可以减小多条日志信息中重复的部分所占用的存储空间,节省存储资源,并缩短用户对日志信息的检索时间,提高日志信息的使用价值。This application provides a method for storing log information and a method for querying log information, which can reduce the storage space occupied by repeated parts of multiple log information, save storage resources, shorten the time for users to retrieve log information, and improve the log Use value of information.
第一方面,提供了一种存储日志信息的方法,该方法包括:从多条日志信息中识别出所述多条日志信息的第一部分和第二部分,用占位标示符代替所述多条日志信息的第二部分,形成所述多条日志信息对应的日志信息模板,所述日志信息模板包括所述多条日志信息的第一部分,将所述多条日志信息对应的日志信息模板和所述多条日志信息的第二部分分别存储至数据库。In a first aspect, a method for storing log information is provided. The method includes: identifying a first part and a second part of the plurality of log information from the plurality of log information, and replacing the plurality of pieces of log information with a placeholder identifier. The second part of the log information forms a log information template corresponding to the multiple pieces of log information, the log information template includes the first part of the multiple pieces of log information, and the log information template corresponding to the multiple pieces of log information and all The second part of the multiple log information is respectively stored in the database.
其中,所述多条日志信息的第一部分为在所述多条日志信息中小于或等于第一阈值条日志信息中相同的部分,所述多条日志信息的第二部分为在所述多条日志信息中小于或等于第二阈值条日志信息中不相同的部分。Wherein, the first part of the multiple pieces of log information is the same part in the multiple pieces of log information that is less than or equal to the first threshold value, and the second part of the multiple pieces of log information is the same portion in the multiple pieces of log information. Different parts of the log information that are less than or equal to the second threshold.
应理解,第一阈值可以小于或等于多条日志信息的总数量,第二阈值可以小于或等于多条日志信息的总数量,第一阈值和第二阈值可以相等也可以不相等。It should be understood that the first threshold may be less than or equal to the total number of pieces of log information, the second threshold may be less than or equal to the total number of pieces of log information, and the first threshold and the second threshold may or may not be equal.
上述技术方案中,可以对原始日志信息进行处理,在多条日志信息中,从小于或等于多条日志信息数量的多条日志信息中识别出相同的部分和不相同的部分。将多条日志信息中相同的部分只存储一个,从而使得精简之后的日志信息不用日志信息模板中携带的一长串字符序列,减小日志信息所占的存储空间,节约日志服务系统中的存储资源消耗。In the above technical solution, the original log information can be processed. Among the multiple pieces of log information, the same part and the different part are identified from multiple pieces of log information less than or equal to the number of the multiple pieces of log information. Only one part of the same part of multiple log information is stored, so that the simplified log information does not need a long string of characters carried in the log information template, which reduces the storage space occupied by the log information and saves the storage in the log service system LF.
在一种可能的实现方式中,所述方法还包括:生成所述多条日志信息对应的所述日志信息模板的标识。In a possible implementation manner, the method further includes: generating an identifier of the log information template corresponding to the multiple pieces of log information.
上述技术方案中,可以计算出每一个日志信息模板所对应的唯一的标识,从而可以在后期的用户对日志信息进行检索的过程中,通过日志信息模板标识对日志信息进行还原,使得用户可以获取到原始的日志信息。不仅可以通过对原始的日志信息的分析,了解节点在运行中经常出现或者偶然出现的问题,还可以通过分析日志信息模板所对应的标识的分布规律,挖掘潜在的系统异常,并提前发出报警提醒。In the above technical solution, the unique identifier corresponding to each log information template can be calculated, so that the log information can be restored through the log information template identifier in the later process of user retrieval of the log information, so that the user can obtain To the original log information. Not only can we analyze the original log information to understand the frequent or accidental problems of the node in the operation, but also analyze the distribution law of the logo corresponding to the log information template to discover potential system abnormalities, and send an alarm in advance. .
在另一种可能的实现方式中,用户在通过日志服务API对存储的日志信息进行检索的过程中,可以在日志信息模板中进行关键字的搜索。In another possible implementation manner, in the process of retrieving the stored log information through the log service API, the user can search for keywords in the log information template.
上述技术方案中,通过在日志信息模板中通过关键字对用户需要的日志信息进行搜索,由于日志信息模板不会大量重复,因此,可以快速查找到用户需要的日志信息,减少搜索的时间,提高用户分析过程的实时性,有利于通过日志快速发现问题。In the above technical solution, the log information required by the user is searched by keywords in the log information template. Since the log information template will not be repeated a lot, the log information required by the user can be quickly found, reducing the search time and improving The real-time nature of the user analysis process is conducive to quickly discovering problems through logs.
在另一种可能的实现方式中,将所述多条日志信息的第二部分和所述日志信息模板的标识存储在所述数据库的第一空间;将所述日志信息模板和所述日志信息模板的标识存储在所述数据库的第二空间。In another possible implementation manner, the second part of the multiple pieces of log information and the identifier of the log information template are stored in the first space of the database; the log information template and the log information The identification of the template is stored in the second space of the database.
第二方面,提供了一种查询日志信息的方法,所述方法应用于日志信息查询系统,所述日志信息查询系统包括数据库,所述数据库存储有多条日志信息的第二部分和所述多条日志信息对应的日志信息模板,所述日志信息模板包括所述多条日志信息的第一部分,所述多条日志信息的第一部分为在所述多条日志信息中小于或等于第一阈值条日志信息中相同的部分,所述多条日志信息的第二部分为在所述多条日志信息中小于或等于第二阈值条日志信息中不相同的部分,所述方法包括:In a second aspect, a method for querying log information is provided. The method is applied to a log information query system. The log information query system includes a database that stores a second part of a plurality of log information and the A log information template corresponding to a piece of log information, the log information template includes a first part of the plurality of log information, and the first part of the plurality of log information is less than or equal to a first threshold in the plurality of log information The same part in the log information, and the second part of the multiple pieces of log information is a different part of the multiple pieces of log information that is less than or equal to a second threshold value, and the method includes:
接收查询请求,所述查询请求用于查询数据库中存储的日志信息;根据所述查询请求从所述数据库中获取存储的所述多条日志信息的第二部分和所述多条日志信息对应的日志信息模板;将所述多条日志信息的第二部分带入所述多条日志信息对应的日志信息模板获得对应的日志信息,向发出所述查询请求的客户端返回所述对应的日志信息,Receiving a query request, the query request being used to query log information stored in a database; acquiring from the database according to the query request the second part of the plurality of log information stored and corresponding to the plurality of log information Log information template; bring the second part of the multiple pieces of log information into the log information template corresponding to the multiple pieces of log information to obtain corresponding log information, and return the corresponding log information to the client that issued the query request ,
上述技术方案中,可以将多条日志信息的第二部分带入多条日志信息对应的日志信息模板,将形成的原始日志信息返回至用户。可以使得用户对获取到的原始的日志信息进行分析,了解节点在运行中经常出现或者偶然出现的问题。In the above technical solution, the second part of the multiple pieces of log information can be brought into the log information template corresponding to the multiple pieces of log information, and the original log information formed is returned to the user. It enables users to analyze the original log information obtained and understand the problems that frequently or occasionally occur during the operation of the node.
在一种可能的实现方式中,在所述接收查询请求之前,所述方法还包括:从多条日志信息中识别出所述多条日志信息的所述第一部分和所述第二部分;In a possible implementation manner, before the receiving the query request, the method further includes: identifying the first part and the second part of the plurality of log information from the plurality of log information;
用占位标示符代替所述多条日志信息的第二部分,形成所述多条日志信息对应的日志信息模板;Replacing the second part of the multiple pieces of log information with a placeholder identifier to form a log information template corresponding to the multiple pieces of log information;
将所述多条日志信息对应的日志信息模板和所述多条日志信息的第二部分分别存储至所述数据库。The log information template corresponding to the multiple pieces of log information and the second part of the multiple pieces of log information are respectively stored in the database.
在另一种可能的实现方式中,所述方法还包括:生成所述多条日志信息对应的所述日志信息模板的标识。In another possible implementation manner, the method further includes: generating an identifier of the log information template corresponding to the multiple pieces of log information.
在另一种可能的实现方式中,所述方法还包括:将所述多条日志信息的第二部分和所述日志信息模板的标识存储在所述数据库的第一空间;将所述日志信息模板和所 述日志信息模板的标识存储在所述数据库的第二空间。In another possible implementation manner, the method further includes: storing the second part of the plurality of log information and the identifier of the log information template in the first space of the database; and storing the log information The template and the identifier of the log information template are stored in the second space of the database.
第三方面,提供了一种存储日志信息的装置,该装置包括:In a third aspect, a device for storing log information is provided, and the device includes:
识别模块,用于从多条日志信息中识别出所述多条日志信息的第一部分和第二部分,其中,所述多条日志信息的第一部分为在所述多条日志信息中小于或等于第一阈值条日志信息中相同的部分,所述多条日志信息的第二部分为在所述多条日志信息中小于或等于第二阈值条日志信息中不相同的部分;The identification module is configured to identify the first part and the second part of the plurality of log information from the plurality of log information, wherein the first part of the plurality of log information is less than or equal to The same part in the first threshold piece of log information, and the second part of the multiple pieces of log information is a different part in the multiple pieces of log information that is less than or equal to the second threshold piece of log information;
处理模块,用于通过占位标示符代替所述多条日志信息的第二部分,形成所述多条日志信息对应的日志信息模板,所述日志信息模板包括所述多条日志信息的第一部分;The processing module is configured to replace the second part of the multiple pieces of log information with a placeholder identifier to form a log information template corresponding to the multiple pieces of log information, and the log information template includes the first part of the multiple pieces of log information ;
存储模块,用于将所述多条日志信息对应的日志信息模板和所述多条日志信息的第二部分分别存储至数据库。The storage module is configured to store the log information template corresponding to the multiple pieces of log information and the second part of the multiple pieces of log information to the database respectively.
在一种可能的实现方式中,所述装置还包括:生成模块,用于生成所述多条日志信息对应的所述日志信息模板的标识。In a possible implementation manner, the device further includes: a generating module, configured to generate an identifier of the log information template corresponding to the multiple pieces of log information.
在另一种可能的实现方式中,所述存储模块具体用于:将所述多条日志信息的第二部分和所述日志信息模板的标识存储在所述数据库的第一空间;将所述日志信息模板和所述日志信息模板的标识存储在所述数据库的第二空间。In another possible implementation manner, the storage module is specifically configured to: store the second part of the multiple pieces of log information and the identifier of the log information template in the first space of the database; and store the The log information template and the identifier of the log information template are stored in the second space of the database.
第四方面,提供了一种查询日志信息的装置,所述装置应用于日志信息查询系统,所述日志信息查询系统包括数据库,所述数据库存储有多条日志信息的第二部分和所述多条日志信息对应的日志信息模板,所述日志信息模板包括所述多条日志信息的第一部分,所述多条日志信息的第一部分为在所述多条日志信息中小于或等于第一阈值条日志信息中相同的部分,所述多条日志信息的第二部分为在所述多条日志信息中小于或等于第二阈值条日志信息中不相同的部分,所述装置包括:In a fourth aspect, a device for querying log information is provided. The device is applied to a log information query system. The log information query system includes a database that stores a second part of a plurality of log information and the A log information template corresponding to a piece of log information, the log information template includes a first part of the plurality of log information, and the first part of the plurality of log information is less than or equal to a first threshold in the plurality of log information The same part in the log information, the second part of the multiple pieces of log information is a different part of the multiple pieces of log information that is less than or equal to a second threshold value, and the device includes:
接收模块,用于接收查询请求,所述查询请求用于查询数据库中存储的日志信息;A receiving module, configured to receive a query request, the query request being used to query log information stored in the database;
获取模块,用于根据所述查询请求从所述数据库中获取存储的所述多条日志信息的第二部分和所述多条日志信息对应的日志信息模板;还用于将所述多条日志信息的第二部分带入所述多条日志信息对应的日志信息模板获得对应的日志信息。The obtaining module is configured to obtain the second part of the plurality of log information stored in the database and the log information template corresponding to the plurality of log information from the database according to the query request; The second part of the information is brought into the log information template corresponding to the multiple pieces of log information to obtain the corresponding log information.
可选的,所述查询日志信息的装置还可以包括返回模块,用于向发出所述查询请求的客户端返回所述对应的日志信息Optionally, the device for querying log information may further include a returning module for returning the corresponding log information to the client that issued the query request
在一种可能的实现方式中,所述装置还包括:识别模块,用于从多条日志信息中识别出所述多条日志信息的所述第一部分和所述第二部分;In a possible implementation manner, the device further includes: an identification module, configured to identify the first part and the second part of the plurality of log information from the plurality of log information;
处理模块,用于通过占位标示符代替所述多条日志信息的第二部分,形成所述多条日志信息对应的日志信息模板;A processing module, configured to replace the second part of the multiple pieces of log information with a placeholder identifier to form a log information template corresponding to the multiple pieces of log information;
存储模块,用于将所述多条日志信息对应的日志信息模板和所述多条日志信息的第二部分分别存储至数据库。The storage module is configured to store the log information template corresponding to the multiple pieces of log information and the second part of the multiple pieces of log information to the database respectively.
在另一种可能的实现方式中,所述装置还包括:生成模块,用于生成所述多条日志信息对应的所述日志信息模板的标识。In another possible implementation manner, the device further includes: a generating module, configured to generate an identifier of the log information template corresponding to the multiple pieces of log information.
在另一种可能的实现方式中,存储模块具体用于:将所述多条日志信息的第二部分和所述日志信息模板的标识存储在所述数据库的第一空间;将所述日志信息模板和所述日志信息模板的标识存储在所述数据库的第二空间。In another possible implementation manner, the storage module is specifically configured to: store the second part of the plurality of log information and the identifier of the log information template in the first space of the database; and store the log information The template and the identifier of the log information template are stored in the second space of the database.
第五方面,提供了一种存储日志信息的装置。存储日志信息的装置包括至少一个计算节点,每个计算节点包括处理器和存储器。这至少一个计算节点的处理器用于执行存储器中存储的程序代码,以执行上述第一方面或第一方面可能的实现方式中的方法。In a fifth aspect, a device for storing log information is provided. The device for storing log information includes at least one computing node, and each computing node includes a processor and a memory. The processor of the at least one computing node is configured to execute the program code stored in the memory, so as to execute the foregoing first aspect or the method in the possible implementation manner of the first aspect.
第六方面,提供了一种查询日志信息的装置。查询日志信息的装置包括至少一个计算节点,每个计算节点包括处理器和存储器。这至少一个计算节点的处理器用于执行存储器中存储的程序代码,以执行上述第二方面或第二方面可能的实现方式中的方法。In a sixth aspect, a device for querying log information is provided. The device for querying log information includes at least one computing node, and each computing node includes a processor and a memory. The processor of the at least one computing node is configured to execute the program code stored in the memory, so as to execute the foregoing second aspect or the method in the possible implementation manner of the second aspect.
第七方面,本申请提供一种非瞬态的非易失性计算机可读存储介质,该计算机可读存储介质中存储有指令,当其在至少一个计算节点上运行时,使得至少一个计算节点执行上述第一方面或第一方面可能的实现方式中的方法。In a seventh aspect, the present application provides a non-transitory, non-volatile computer-readable storage medium, which stores instructions in the computer-readable storage medium, and when it runs on at least one computing node, the at least one computing node Perform the foregoing first aspect or the method in the possible implementation of the first aspect.
第八方面,本申请提供一种非瞬态的非易失性计算机可读存储介质,该计算机可读存储介质中存储有指令,当其在至少一个计算节点上运行时,使得至少一个计算节点执行上述第二方面或第二方面可能的实现方式中的方法。In an eighth aspect, the present application provides a non-transitory, non-volatile computer-readable storage medium that stores instructions in the computer-readable storage medium, and when it runs on at least one computing node, the at least one computing node Perform the foregoing second aspect or the method in the possible implementation of the second aspect.
第九方面,本申请提供了一种包含指令的计算机程序产品,当其在至少一个计算节点上运行时,使得至少一个计算节点执行上述第一方面或第一方面可能的实现方式中的方法。In a ninth aspect, the present application provides a computer program product containing instructions that, when it runs on at least one computing node, causes at least one computing node to execute the above-mentioned first aspect or the method in the possible implementation of the first aspect.
第十方面,本申请提供了一种包含指令的计算机程序产品,当其在至少一个计算节点上运行时,使得至少一个计算节点执行上述第一方面或第一方面可能的实现方式中的方法。In a tenth aspect, this application provides a computer program product containing instructions, which when run on at least one computing node, causes at least one computing node to execute the above-mentioned first aspect or the method in the possible implementation of the first aspect.
本申请在上述各方面提供的实现方式的基础上,还可以进行进一步组合以提供更多实现方式。On the basis of the implementation manners provided by the above aspects, this application can be further combined to provide more implementation manners.
附图说明Description of the drawings
图1是应用于本申请实施例的一种分布式日志服务系统100的示意性框图。FIG. 1 is a schematic block diagram of a distributed log service system 100 applied to an embodiment of the present application.
图2是本申请实施例提供的一种存储日志信息的方法的示意性流程图。Fig. 2 is a schematic flowchart of a method for storing log information provided by an embodiment of the present application.
图3是本申请实施例提供的一种存储日志信息的装置300的结构性示意性图。FIG. 3 is a schematic structural diagram of an apparatus 300 for storing log information provided by an embodiment of the present application.
图4是本申请实施例提供的一种存储日志信息的装置400的结构性示意性图。FIG. 4 is a schematic structural diagram of an apparatus 400 for storing log information provided by an embodiment of the present application.
图5是本申请实施例提供的一种查询日志信息的装置500的结构性示意性图。FIG. 5 is a schematic structural diagram of an apparatus 500 for querying log information provided by an embodiment of the present application.
图6是本申请实施例提供的一种查询日志信息的装置600的结构性示意性图。FIG. 6 is a schematic structural diagram of an apparatus 600 for querying log information provided by an embodiment of the present application.
具体实施方式detailed description
下面将结合附图,对本申请中的技术方案进行描述。The technical solution in this application will be described below in conjunction with the drawings.
计算机日志是在计算机系统运行过程中由操作系统,中间件,平台自身产生或者由用户开发的程序组件产生的,记录了设备和系统自身的运行状态和使用情况,这些描述信息主要描述了系统所进行的关键操作以及系统在运行过程中所发生的错误和异常等。用户可以通过对系统日志的分析,了解系统在运行中经常出现或者偶然出现的问题,从而可以有针对性地改善对系统的运营维护,进而提高系统运行的安全和效率。The computer log is generated by the operating system, middleware, platform itself or program components developed by the user during the operation of the computer system. It records the operating status and usage of the equipment and the system itself. This description information mainly describes the system. The key operations performed and the errors and exceptions that occurred during the operation of the system. Through the analysis of the system log, the user can understand the problems that frequently or occasionally occur in the operation of the system, so that the operation and maintenance of the system can be improved in a targeted manner, and the safety and efficiency of the system operation can be improved.
在大规模分布式场景下,需要有专门的日志系统或者日志服务负责收集、存储来 自大量被管节点的日志信息,并提供相关的检索接口供用户过滤分析他们关心的日志数据。下面对分布式的日志系统100进行详细描述。In large-scale distributed scenarios, a dedicated log system or log service is required to collect and store log information from a large number of managed nodes, and provide related retrieval interfaces for users to filter and analyze the log data they care about. The distributed logging system 100 will be described in detail below.
图1是应用于本申请实施例的一种分布式日志服务系统100的示意性框图。如图1所示,系统100可以包括服务端110、至少一个节点。FIG. 1 is a schematic block diagram of a distributed log service system 100 applied to an embodiment of the present application. As shown in FIG. 1, the system 100 may include a server 110 and at least one node.
本申请实施例对至少一个节点的数量不做具体限定,为了便于描述,本申请实施例以节点120、节点130为例进行描述。The embodiment of the present application does not specifically limit the number of at least one node. For ease of description, the embodiment of the present application uses the node 120 and the node 130 as examples for description.
应理解,节点120、节点130可以是一台物理主机上安装的不同虚拟机(virtual machine,VM),也可以是不同物理主机上安装的VM。节点120、节点130也可以是物理主机。It should be understood that the node 120 and the node 130 may be different virtual machines (virtual machines, VMs) installed on one physical host, or may be VMs installed on different physical hosts. The node 120 and the node 130 may also be physical hosts.
日志服务系统100中的至少一个节点在运行过程中,其操作系统、平台自身或者由用户开发的程序组件可以产生相应的日志信息,该日志信息记载了至少一个节点所进行的关键操作以及在运行过程中所发生的错误和异常等。至少一个节点可以将产生的日志信息保存在相应的日志文件中。用户通过对系统日志的分析,不仅可以了解至少一个节点在运行中经常出现或者偶然出现的问题,从而有针对性地对至少一个节点的运营维护进行改善。还可以通过分析日志信息模板所对应的标识的分布规律,挖掘潜在的系统异常,并提前发出报警提醒。During the operation of at least one node in the log service system 100, its operating system, platform itself, or program components developed by the user can generate corresponding log information, which records the key operations performed by at least one node and the running Errors and exceptions that occurred during the process. At least one node can save the generated log information in a corresponding log file. Through the analysis of the system log, users can not only understand the problems that occur frequently or occasionally in the operation of at least one node, so as to improve the operation and maintenance of at least one node in a targeted manner. It is also possible to discover potential system abnormalities by analyzing the distribution law of the identifiers corresponding to the log information template, and send out alarm reminders in advance.
节点120和节点130上分别部署有日志采集代理121、日志采集代理131。节点120中的日志采集代理121可以获取节点120中指定目录下的日志文件中保存的日志信息,并通过消息中间件114将获取到的节点120的日志信息上报至服务端110。同样的,节点130中的日志采集代理131可以获取节点130中指定目录下的日志文件中保存的日志信息,并通过消息中间件114将获取到的节点130的日志信息上报至服务端110。A log collection agent 121 and a log collection agent 131 are respectively deployed on the node 120 and the node 130. The log collection agent 121 in the node 120 may obtain the log information saved in the log file in the designated directory of the node 120, and report the obtained log information of the node 120 to the server 110 through the message middleware 114. Similarly, the log collection agent 131 in the node 130 can obtain the log information saved in the log file under the specified directory in the node 130, and report the obtained log information of the node 130 to the server 110 through the message middleware 114.
需要说明的是,日志采集代理模块121、日志采集代理模块131可以是软件程序实现的模块。It should be noted that the log collection agent module 121 and the log collection agent module 131 may be modules implemented by software programs.
日志服务系统110中的数据预处理模块113可以在接收到节点120和/或节点130分别上报的日志信息之后,可以对上报的日志信息进行预处理。并将处理之后的日志信息存储在服务端110中的分布式数据库112中。The data preprocessing module 113 in the log service system 110 may preprocess the reported log information after receiving the log information reported by the node 120 and/or the node 130 respectively. And the processed log information is stored in the distributed database 112 in the server 110.
具体的,服务端110中的数据预处理模块113可以在接收到消息中间件114上报的至少一个节点的日志信息之后,对日志信息加上标签信息之后存储至分布式数据库112中。以便于用户在检索相关的日志信息时,可以通过标签信息对相关的日志信息进行搜索。Specifically, the data preprocessing module 113 in the server 110 may, after receiving the log information of at least one node reported by the message middleware 114, tag the log information and store it in the distributed database 112. So that when searching for related log information, users can search related log information through tag information.
应理解,标签信息可以是由日志信息自身携带的,也可以是至少一个节点中的日志采集代理根据日志信息所在的节点及上面运行的应用服务的属性获取到的。本申请实施例对标签信息的内容不做具体限定,可以包括但不限于:日志信息所在的日志文件的保存路径,日志信息所在节点的互联网协议(internet protocol,IP)地址,日志信息所属服务所在的公有云的区域(region)信息,日志信息所属的服务信息(例如,服务标识(identifier,ID)),日志信息所属的组件信息(例如,组件ID),租户ID等。It should be understood that the tag information may be carried by the log information itself, or may be obtained by the log collection agent in at least one node according to the node where the log information is located and the attributes of the application service running on it. The embodiment of this application does not specifically limit the content of the label information, which may include but is not limited to: the storage path of the log file where the log information is located, the internet protocol (IP) address of the node where the log information is located, and the service location of the log information The region information of the public cloud, the service information (for example, service identifier (ID)) to which the log information belongs, the component information (for example, component ID) to which the log information belongs, the tenant ID, etc.
服务端110中的日志服务应用程序接口(application program interface,API) 111可以对用户提供检索功能,用户可以通过浏览器或命令行工具,使用日志服务API 111提供的接口与日志服务系统100进行交互。The log service application program interface (application program interface, API) 111 in the server 110 can provide users with retrieval functions. Users can use the log service API 111 to interact with the log service system 100 through a browser or command line tool. .
具体的,用户可以通过日志服务API 111对其需要的日志信息进行搜索,例如,可以对日志信息的各种标签信息以及日志关键字,获取存储在分布式数据库112中的日志信息。以便于根据获取到的日志信息,了解节点在运行中经常出现或偶然出现的问题,可以有针对性的对节点的运营进行维护,从而提高节点的系统运行效率和效率。Specifically, the user can search the log information he needs through the log service API 111, for example, can obtain the log information stored in the distributed database 112 by searching various tag information and log keywords of the log information. In order to understand the frequent or accidental problems of the node in operation according to the obtained log information, the operation of the node can be maintained in a targeted manner, thereby improving the efficiency and efficiency of the node's system operation.
传统的日志存储方案中,由于是直接将原始的日志信息直接存储在数据库中,其中存储大量重复的信息。一方面,在日志信息数量超过一定程度之后,大量日志信息的采集,跨节点的传输,大量数据库的资源的占用都将面临巨大的挑战。另一方面,原始日志信息的直接存储也会降低对该日志信息的使用效率。例如,在模糊搜索场景下,在巨量原始日志信息中搜索耗时很长,在使用日志信息做异常检测以及其它相关的数据挖掘分析时,需要集中对大量日志信息进行分析,实时性差,不利于快速发现问题。In the traditional log storage solution, the original log information is directly stored in the database, which stores a large amount of repeated information. On the one hand, after the amount of log information exceeds a certain level, the collection of a large amount of log information, cross-node transmission, and the occupation of a large number of database resources will face huge challenges. On the other hand, the direct storage of the original log information will also reduce the use efficiency of the log information. For example, in a fuzzy search scenario, it takes a long time to search in a huge amount of original log information. When using log information for anomaly detection and other related data mining analysis, a large amount of log information needs to be analyzed in a centralized manner. The real-time performance is poor. Conducive to quickly discover problems.
本申请实施例提供了一种存储数据的方法,可以对原始的日志信息进行处理,通过消除原始数据中大量重复的数据,减小原始的数据所占的存储空间。The embodiment of the present application provides a method for storing data, which can process the original log information, and reduce the storage space occupied by the original data by eliminating a large amount of repeated data in the original data.
需要说明的是,图1为应用于本申请实施例的一种可能的场景,本申请实施例提供的方法还可以应用到大量重复的非结构化的数据进行存储的场景中。It should be noted that FIG. 1 is a possible scenario applied to the embodiment of the present application, and the method provided in the embodiment of the present application can also be applied to a scenario where a large amount of repeated unstructured data is stored.
图2是本申请实施例提供的一种存储日志信息的方法的示意性流程图。如图1所示,该方法可以包括步骤210-230,下面分别对步骤210-230进行详细描述。Fig. 2 is a schematic flowchart of a method for storing log information provided by an embodiment of the present application. As shown in Fig. 1, the method may include steps 210-230, and steps 210-230 will be described in detail below.
步骤210:从多条日志信息中识别出所述多条日志信息的第一部分和第二部分。Step 210: Identify the first part and the second part of the plurality of log information from the plurality of log information.
本申请实施例中可以从多条日志信息中识别出多条日志信息的第一部分和第二部分,其中第一部分可以是多条日志信息中小于或等于第一阈值条日志信息中相同的部分,第二部分为在多条日志信息中小于或等于第二阈值条日志信息中不相同的部分。In the embodiment of the present application, the first part and the second part of the multiple pieces of log information can be identified from the pieces of log information, where the first part may be the same part of the multiple pieces of log information that is less than or equal to the first threshold. The second part is a different part of the multiple pieces of log information that is less than or equal to the second threshold.
应理解,第一阈值可以小于或等于多条日志信息的总数量,第二阈值可以小于或等于多条日志信息的总数量。It should be understood that the first threshold may be less than or equal to the total number of pieces of log information, and the second threshold may be less than or equal to the total number of pieces of log information.
需要说明的是,第一阈值和第二阈值可以相等也可以不相等,本申请实施例对此不作具体限定。It should be noted that the first threshold and the second threshold may be equal or unequal, which is not specifically limited in the embodiment of the present application.
参见图1,本申请实施例中可以是至少一个节点的日志采集代理对多条日志信息中小于或等于第一阈值条日志信息进行对比识别,也可以是服务端110中的数据预处理模块113对多条日志信息中小于或等于第一阈值条日志信息进行对比识别。作为一个示例,在节点120中的日志采集代理121的处理器和/或内存相对于其处理的数据量而言较充足的情况下,日志采集代理121可以在从指定目录的日志文件中获取存储的原始的日志信息之后,对多条日志信息中小于或等于第一阈值条日志信息进行识别,识别出上述第一部分和第二部分。作为另一个示例,在节点120中的日志采集代理121的处理器和/或内存相对于其处理的数据量而言较受限的情况下,服务端110中的数据预处理模块113可以在接收到至少一个节点中的日志采集代理上报的多个原始的日志信息之后,对多条日志信息中小于或等于第一阈值条日志信息进行对比识别,通过对比多条日志信息中小于或等于第一阈值条日志信息,识别出上述第一部分和第二部分。Referring to FIG. 1, in this embodiment of the application, it may be the log collection agent of at least one node that compares and identifies pieces of log information that are less than or equal to the first threshold among multiple pieces of log information, or it may be the data preprocessing module 113 in the server 110. Compare and identify pieces of log information that are less than or equal to the first threshold among multiple pieces of log information. As an example, when the processor and/or memory of the log collection agent 121 in the node 120 are sufficient relative to the amount of data it processes, the log collection agent 121 may obtain storage from log files in a specified directory. After the original log information, identify the log information that is less than or equal to the first threshold among the multiple log information, and identify the first part and the second part. As another example, in the case where the processor and/or memory of the log collection agent 121 in the node 120 is relatively limited relative to the amount of data it processes, the data preprocessing module 113 in the server 110 may After the multiple original log information reported by the log collection agent in at least one node, compare and identify the pieces of log information that are less than or equal to the first threshold among the multiple pieces of log information. The threshold log information identifies the first part and the second part above.
步骤220:用占位标示符代替多条日志信息的第二部分,形成所述多条日志信息 对应的日志信息模板。Step 220: Replace the second part of the multiple pieces of log information with placeholder identifiers to form a log information template corresponding to the multiple pieces of log information.
本申请实施例可以对从上述识别出的所述多条日志信息的第二部分进行处理,例如,可以通过用占位标示符替换多条日志信息中的第二部分,从而形成一个多条日志对应一个日志信息模板,使得日志信息模板与M条原始日志信息之间为1:M的对应关系,从而消除多条日志信息中大量重复的部分。其中,M为大于1的正整数。The embodiment of the present application may process the second part of the multiple pieces of log information identified from the foregoing. For example, a placeholder identifier may be used to replace the second part of the multiple pieces of log information to form one or more pieces of log information. Corresponding to a log information template, so that the log information template and M pieces of original log information have a 1:M correspondence relationship, thereby eliminating a large number of duplicate parts in multiple pieces of log information. Among them, M is a positive integer greater than 1.
本申请实施例中可以通过占位标示符区分多条日志信息的第二部分中的类型,也可以不用区分多条日志信息的第二部分中变化部分的类型,本申请对此不作具体限定。作为一个示例,在通过占位标示符区分多条日志信息的第二部分中的变量类型时,可以用%d作为表示数值型变量的占位标示符,用%s作为表示字符串型的变量占位标示符。In the embodiment of the present application, the type in the second part of the multiple log information may be distinguished by placeholder identifiers, or the type of the changed part in the second part of the multiple log information may not be distinguished, which is not specifically limited in this application. As an example, when using placeholder identifiers to distinguish the variable types in the second part of multiple log messages, you can use %d as a placeholder identifier for numeric variables, and %s as a string variable Placeholder identifier.
步骤230:将所述多条日志信息对应的日志信息模板和所述多条日志信息的第二部分分别存储至数据库。Step 230: Store the log information template corresponding to the multiple pieces of log information and the second part of the multiple pieces of log information to the database respectively.
将本申请实施例中可以在对多个日志信息进行区分对比,识别出多个日志信息的第一部分和第二部分。将多个日志信息对应的一个日志信息模板存储在分布式数据库112的一个实体中,并将每个日志信息中变化的部分存储在分布式数据库112的另一个实体中。In the embodiment of the present application, multiple log information can be distinguished and compared, and the first part and the second part of the multiple log information can be identified. A log information template corresponding to multiple log information is stored in one entity of the distributed database 112, and the changed part of each log information is stored in another entity of the distributed database 112.
上述技术方案中,可以对原始日志信息进行处理,在识别出多个日志信息中变化的部分和不变的部分之后,将每个日志信息中不变的部分只存储一个,并将每个日志信息精简成包括其变化的部分以及日志信息模板标识、该日志信息相关的标签信息等。从而使得精简之后的日志信息不用携带模板中的一长串字符序列,减小日志信息所占的存储空间,节约日志服务系统中的存储资源消耗。In the above technical solution, the original log information can be processed. After identifying the changed part and the unchanged part of the multiple log information, only one unchanged part of each log information is stored, and each log information The information is condensed to include its changed parts, the logo of the log information template, and label information related to the log information. As a result, the simplified log information does not need to carry a long string of character sequences in the template, which reduces the storage space occupied by the log information and saves the storage resource consumption in the log service system.
可选的,在一些实施例中,可以生成每一个日志信息模板所对应的模板标识,并将每一个日志信息精简为包括:日志信息模板标识、变量值、时间戳、标签信息等。以便于用户在对日志信息进行检索并还原成原始信息的过程中,可以通过每一个日志信息中包括的日志信息模板标识,获取到日志信息模板。从而可以根据日志信息模板以及日志信息中包括的时间戳、变量值将日志信息恢复成原始的状态。Optionally, in some embodiments, a template identifier corresponding to each log information template can be generated, and each log information can be condensed to include: log information template identifier, variable value, timestamp, tag information, etc. So that the user can obtain the log information template through the log information template identifier included in each log information during the process of retrieving and restoring the log information to the original information. Thereby, the log information can be restored to the original state according to the log information template and the timestamp and variable value included in the log information.
应理解,精简之后的日志信息中包括的时间戳可以是日志信息本身记录的。标签信息可以是由日志信息自身记的,也可以是至少一个节点中的日志采集代理根据日志信息所在的节点及上面运行的应用服务的属性获取到的,具体的请参考上文中对标签信息的描述,此处不再赘述。It should be understood that the timestamp included in the reduced log information may be recorded by the log information itself. The label information can be recorded by the log information itself, or it can be obtained by the log collection agent in at least one node according to the attributes of the node where the log information is located and the application service running on it. For details, please refer to the label information above Description, not repeat them here.
本申请实施例可以通过信息指纹提取算法提取一个信息的特征,并根据这个信息的特征将其转换成一组代码。该代码可以作为每一个日志信息模块所对应的唯一的模板标识,这样可以使得不同的节点下相同的日志信息模板具有相同的标识。作为一个示例而非限定,可以通过摘要信息(message-digest,MD5)算法生成每一个日志信息模块所对应的唯一的模板标识。In the embodiment of the present application, the feature of a piece of information can be extracted by an information fingerprint extraction algorithm, and the feature of this information can be converted into a set of codes. The code can be used as a unique template identifier corresponding to each log information module, so that the same log information template under different nodes can have the same identifier. As an example and not a limitation, a unique template identifier corresponding to each log information module can be generated through a message-digest (MD5) algorithm.
应理解,MD5算法是一种被广泛使用的密码散列函数,可以产生出一个128位(16字节)的散列值(hash value),用于确保信息传输完整一致。下面会结合具体的实施例对通过MD5算法生成出每一个日志信息模块所对应的唯一的模板标识的实现过程进行详细描述,此处不再赘述。It should be understood that the MD5 algorithm is a widely used cryptographic hash function, which can generate a 128-bit (16-byte) hash value to ensure complete and consistent information transmission. The implementation process of generating the unique template identifier corresponding to each log information module through the MD5 algorithm will be described in detail below in conjunction with specific embodiments, and will not be repeated here.
本申请实施例中,通过采用信息指纹技术生成每一个日志信息模板所对应的唯一 的模板标识,可以使得不同的节点下相同的日志信息模板具有相同的标识,从而可以减少统一为每一个日志信息模板分配其对应的标识的跨节点之间的通信所带来的额外开销和复杂度。In the embodiment of the present application, by using information fingerprint technology to generate a unique template identifier corresponding to each log information template, the same log information template under different nodes can have the same identifier, thereby reducing uniformity for each log information The template allocates the additional overhead and complexity caused by the communication between the nodes of the corresponding identifier.
下面结合具体的日志信息内容,对上述过程中描述的对日志信息进行精简的具体实现过程进行详细描述。为了便于描述,下面以从3个原始日志信息中提取出一个日志信息模板为例进行说明。The following describes in detail the specific implementation process of streamlining the log information described in the above process in combination with the specific log information content. For ease of description, the following is an example of extracting a log information template from three original log information.
应理解,下面的例子仅仅是为了帮助本领域技术人员理解本申请实施例,而非要将本申请实施例限于所例示的具体数值或具体场景。本领域技术人员根据所给出的下面的例子,显然可以进行各种等价的修改或变化,这样的修改或变化也落入本申请实施例的范围内。It should be understood that the following examples are only to help those skilled in the art understand the embodiments of the present application, and are not intended to limit the embodiments of the present application to the specific numerical values or specific scenarios illustrated. Those skilled in the art can obviously make various equivalent modifications or changes based on the following examples given, and such modifications or changes also fall within the scope of the embodiments of the present application.
原始日志信息1:[2018-11-09 13:39:14.696][10.0.26.102][INF0][bulk -thread-2][BulkHandlerRunable:submits 138]get data from queue timeout! Original log information 1: [2018-11-09 13:39:14.696][10.0.26.102][INF0][bulk - thread-2][BulkHandlerRunable:submits 138]get data from queue timeout!
原始日志信息2:[2018-11-09 13:39:15.698][10.0.26.102][INF0][bulk -thread-5][BulkHandlerRunable:submits 138]get data from queue timeout! Original log information 2: [2018-11-09 13:39:15.698][10.0.26.102][INF0][bulk - thread-5][BulkHandlerRunable:submits 138]get data from queue timeout!
原始日志信息3:[2018-11-09 13:39:16.066][10.0.26.102][INF0][bulk -thread-3][BulkHandlerRunable:submits 138]get data from queue timeout! Original log information 3: [2018-11-09 13:39:16.066][10.0.26.102][INF0][bulk - thread-3][BulkHandlerRunable:submits 138]get data from queue timeout!
步骤1:识别出3个原始日志信息中每个原始日志信息中变化的部分和不变的部分。Step 1: Identify the changed part and the unchanged part in each of the three original log messages.
本申请实施例中可以通过模式提取算法识别并提取出3个原始日志信息中每个原始日志信息中变化的部分和不变的部分。作为示例而非限定,可以通过迭代分区日志挖掘(iterative partitioning log mining,IPLOM)算法对识别出3个原始日志信息中每个原始日志信息中变化的部分和不变的部分,并可以用占位标示符替换3个原始日志信息中每个原始日志信息变化的部分。In the embodiment of the present application, a pattern extraction algorithm can be used to identify and extract the changed part and the unchanged part of each of the three original log information. As an example and not a limitation, the iterative partitioning log mining (IPLOM) algorithm can be used to identify the changed part and the unchanged part of each original log information in the 3 original log information, and can use the placeholder The identifier replaces the changed part of each original log information in the 3 original log information.
经过步骤1的模式提取算法,对上述3个原始日志信息进行对比之后的识别结果如下所示:After the pattern extraction algorithm in step 1, the recognition results after comparing the above three original log information are as follows:
原始日志信息1相对于原始日志信息2、3的比较结果如下:The comparison results of the original log information 1 with the original log information 2 and 3 are as follows:
不变的部分:[10.0.26.102][INF0][bulk -thread-][BulkHandlerRunable:submits 138]get data from queue timeout! The unchanging part: [10.0.26.102][INF0][bulk - thread-][BulkHandlerRunable:submits 138]get data from queue timeout!
变化的部分:2,[2018-11-09 13:39:14.696]The changed part: 2, [2018-11-09 13:39:14.696]
用占位标示符%d替换原始日志信息1中变化的数值部分,用tm字段表示原始日志信息1中日志时间的字段,日志原始日志信息1可以表示为:Use the placeholder identifier %d to replace the changed numerical part of the original log information 1, and use the tm field to represent the log time field in the original log information 1. The original log information 1 can be expressed as:
tm[10.0.26.102][INF0][bulk -thread-%d][BulkHandler Runable:submits 138]get data from queue timeout! tm[10.0.26.102][INF0][bulk - thread-%d][BulkHandler Runable:submits 138]get data from queue timeout!
其中,%d=2,tm=[2018-11-09 13:39:14.696]Among them, %d=2, tm=[2018-11-09 13:39:14.696]
原始日志信息2相对于原始日志信息1、3的比较结果如下:The comparison result of the original log information 2 with the original log information 1 and 3 is as follows:
不变的部分:[10.0.26.102][INF0][bulk -thread-][BulkHandlerRunable:submits 138]get data from queue timeout! The unchanging part: [10.0.26.102][INF0][bulk - thread-][BulkHandlerRunable:submits 138]get data from queue timeout!
变化的部分:5,[2018-11-09 13:39:15.698]The changed part: 5, [2018-11-09 13:39:15.698]
用占位标示符%d替换原始日志信息2中变化的部分,用tm字段表示原始日志信 息2中日志时间的字段,原始日志信息2可以表示为:Replace the changed part in the original log information 2 with the placeholder identifier %d, and use the tm field to represent the log time field in the original log information 2. The original log information 2 can be expressed as:
tm[10.0.26.102][INF0][bulk -thread-%d][BulkHandler Runable:submits 138]get data from queue timeout! tm[10.0.26.102][INF0][bulk - thread-%d][BulkHandler Runable:submits 138]get data from queue timeout!
其中,%d=5,tm=[2018-11-09 13:39:15.698]。Among them, %d=5, tm=[2018-11-09 13:39:15.698].
原始日志信息3相对于原始日志信息1、2的比较结果如下:The comparison results of the original log information 3 with the original log information 1 and 2 are as follows:
不变的部分:[10.0.26.102][INF0][bulk -thread-][BulkHandlerRunable:submits 138]get data from queue timeout! The unchanging part: [10.0.26.102][INF0][bulk - thread-][BulkHandlerRunable:submits 138]get data from queue timeout!
变化的部分:3,[2018-11-09 13:39:16.066]The changed part: 3, [2018-11-09 13:39:16.066]
用占位标示符%d替换原始日志信息3中变化的部分,用tm字段表示原始日志信息3中日志时间的字段,原始日志信息3可以表示为:Replace the changed part in the original log information 3 with the placeholder identifier %d, and use the tm field to represent the log time field in the original log information 3. The original log information 3 can be expressed as:
tm[10.0.26.102][INF0][bulk -thread-%d][BulkHandler Runable:submits 138]get data from queue timeout! tm[10.0.26.102][INF0][bulk - thread-%d][BulkHandler Runable:submits 138]get data from queue timeout!
其中,%d=3,tm=[2018-11-09 13:39:16.066]。Among them, %d=3, tm=[2018-11-09 13:39:16.066].
综上所述,3个原始日志信息中识别出的日志信息模板为:tm[INF0][bulk -thread-%d][BulkHandlerRunable:submits 138]get data from queue timeout! In summary, the log information template identified in the three original log messages is: tm[INF0][bulk - thread-%d][BulkHandlerRunable:submits 138]get data from queue timeout!
步骤2:生成日志信息模板对应的唯一的标识。Step 2: Generate a unique identifier corresponding to the log information template.
在包括至少一个节点的分布式场景下,不同的节点可能存在相同的日志信息模板,使得日志信息模板与该模板的标识一一对应。本申请实施例可以使用数字指纹技术来实现日志信息模板与该模板的标识一一对应,例如,采用MD5算法生成出日志信息模板对应的唯一的标识。In a distributed scenario including at least one node, different nodes may have the same log information template, so that the log information template corresponds to the identifier of the template one-to-one. The embodiment of the present application may use digital fingerprint technology to realize a one-to-one correspondence between the log information template and the identifier of the template. For example, the MD5 algorithm is used to generate a unique identifier corresponding to the log information template.
具体的,MD5算法上述日志信息模板的字符串进行计算,生成128位(16字节)的散列值,该散列值可以作为上述日志信息模板对应的唯一的标识。Specifically, the MD5 algorithm calculates the character string of the aforementioned log information template to generate a 128-bit (16-byte) hash value, which can be used as a unique identifier corresponding to the aforementioned log information template.
本申请实施例中还可以从通过MD5算法生成一个128位(16字节)的散列值中选择出64位(8字节)散列值,并将64位的散列值作为实际的日志信息模板的对应的唯一的标识。从128位散列值中选择64位散列值的实现方式有多种,作为一个示例,从128位散列值中选择出64位的奇数位部分作为64位散列值。作为另一个示例,从128位散列值中选择出64个偶数位的部分作为64位散列值。作为另一个示例,从128位散列值中选择出中间的64位作为64位散列值。作为另一个示例,还可以将128位散列值折叠后相加得到64位散列值。In the embodiment of this application, a 64-bit (8-byte) hash value can also be selected from a 128-bit (16-byte) hash value generated by the MD5 algorithm, and the 64-bit hash value is used as the actual log The corresponding unique identifier of the information template. There are many implementations for selecting a 64-bit hash value from a 128-bit hash value. As an example, a 64-bit odd part is selected from the 128-bit hash value as the 64-bit hash value. As another example, a portion of 64 even bits is selected from the 128-bit hash value as the 64-bit hash value. As another example, the middle 64 bits are selected from the 128-bit hash value as the 64-bit hash value. As another example, the 128-bit hash value can also be folded and added to obtain a 64-bit hash value.
例如,可以用tid字段表示上述日志信息模板的标识。For example, the tid field may be used to indicate the identification of the aforementioned log information template.
{tid=5085657271133051000{tid=5085657271133051000
tm[INF0][bulk -thread-%d][BulkHandlerRunable:submits 138]get data from queue timeout!} tm[INF0][bulk - thread-%d][BulkHandlerRunable:submits 138]get data from queue timeout! }
步骤3:转换原始日志信息中日志时间的字段。Step 3: Convert the log time field in the original log information.
应理解,步骤3是可选的。It should be understood that step 3 is optional.
为了减少表示日志时间的字段中的字符串所占用的存储资源,可以对原始日志信息中的日志时间进行转换,将日志时间转换成基准时间的毫秒偏移数。作为一个示例,可以以1970年1月1日作为基准时间,计算每个图原始日志信息中记录的时间相对于 基准时间的秒数偏移值。In order to reduce the storage resources occupied by the character string in the field representing the log time, the log time in the original log information can be converted to convert the log time into the millisecond offset of the base time. As an example, you can use January 1, 1970 as the reference time, and calculate the second offset value of the time recorded in the original log information of each graph relative to the reference time.
例如,计算出的原始日志信息1中的tm=1541770754696,原始日志信息2中的tm=1541770755698,原始日志信息3中的tm=1541770756066。For example, tm in the calculated original log information 1 is 1541770754696, tm in the original log information 2 is 1541770755698, and tm in the original log information 3 is 1541770756066.
步骤4:表示每一个原始日志信息中的变化的部分。Step 4: Represent the changed part of each original log information.
本申请实施例中可以增加存放每一个原始日志信息中变量值的字段,作为一个示例,用mg字段表示变量值列表字符串。In the embodiment of the present application, a field storing the variable value in each original log information may be added. As an example, the mg field is used to represent the variable value list string.
例如,原始日志信息1中的变量值列表字符串mg=(2),原始日志信息2中的变量值列表字符串mg=(5),原始日志信息3中的变量值列表字符串mg=(3)。For example, the variable value list string mg in the original log information 1 = (2), the variable value list string mg in the original log information 2 = (5), and the variable value list string mg in the original log information 3 = ( 3).
本申请实施例中还可以增加存放每一个原始日志信息中变量值的个数的字段,作为一个示例,用VL字段表示变量值的个数。In the embodiment of the present application, a field storing the number of variable values in each original log information can also be added. As an example, the VL field is used to represent the number of variable values.
例如,原始日志信息1中的变量值的个数VL=1,原始日志信息2中的变量值的个数VL=1,原始日志信息3中的变量值的个数VL=1。For example, the number of variable values in the original log information 1 is VL=1, the number of variable values in the original log information 2 is VL=1, and the number of variable values in the original log information 3 is VL=1.
步骤5:添加每一个原始日志信息的标签信息。Step 5: Add tag information of each original log information.
本申请实施例中可以对每一个原始日志信息添加相关的标签信息,以便于通过各种标签信息获取到需要的日志信息。In the embodiment of the present application, relevant tag information can be added to each original log information, so that the required log information can be obtained through various tag information.
为了便于描述,本申请实施例以标签信息为日志信息所在的日志文件的路径、日志信息所在节点的IP地址为例进行说明。For ease of description, the embodiment of the present application takes the label information as the path of the log file where the log information is located and the IP address of the node where the log information is located as an example for description.
以标签信息为日志信息所在节点的IP地址为例,原始日志信息1中的IP=10.0.26.102,原始日志信息2中的IP=10.0.26.102,原始日志信息3中的IP=10.0.26.102。Taking the label information as the IP address of the node where the log information is located as an example, the IP in the original log information 1 = 10.0.26.102, the IP in the original log information 2 = 10.0.26.102, and the IP in the original log information 3 = 10.0.26.102.
以标签信息为日志信息所在的日志文件的路径为例,本申请实施例可以对每一个日志文件的路径进行信息指纹提取,形成每一个日志文件的路径相对应的唯一的标识,从而可以节省存储资源。具体的信息指纹提取并生成标识的过程与日志信息模板标识的计算过程类似,具体的请参考日志信息模板标识的生成过程,此处不再赘述。Taking the label information as the path of the log file where the log information is located as an example, the embodiment of the present application can perform information fingerprint extraction on the path of each log file to form a unique identifier corresponding to the path of each log file, thereby saving storage Resources. The specific information fingerprint extraction and identification process is similar to the calculation process of the log information template identification. For details, please refer to the process of generating the log information template identification, which will not be repeated here.
作为一个示例,用pid字段表示日志信息所在的日志文件的路径。例如,原始日志信息1中的pid=864995200973638000,原始日志信息2中的pid=864995200973638000,原始日志信息3中的pid=864995200973638000。As an example, the pid field is used to indicate the path of the log file where the log information is located. For example, the pid in the original log information 1 is 864995200973638000, the pid in the original log information 2 is 864995200973638000, and the pid in the original log information 3 is 864995200973638000.
经过上述步骤1-步骤5,将上述3个原始日志信息精简成:时间戳+模板标识+变量+相关标签信息。并可以将模板标识和模板分别存储在分布式数据库112的不同的表中。After the above steps 1 to 5, the above three original log information are condensed into: timestamp + template identification + variable + related label information. And the template identifier and the template can be stored in different tables of the distributed database 112 respectively.
作为一个示例,可以将原始日志信息拆分成3个部分分别存储。例如,表1用于存储精简之后的日志信息,表2用于存储日志信息所在的日志文件的路径,表3用于存储3个原始日志信息的模板。As an example, the original log information can be split into three parts and stored separately. For example, Table 1 is used to store the reduced log information, Table 2 is used to store the path of the log file where the log information is located, and Table 3 is used to store three templates of original log information.
表1中存储的内容如下所示:The contents stored in Table 1 are as follows:
精简之后的日志信息1可以表示为:The simplified log information 1 can be expressed as:
Figure PCTCN2019107127-appb-000001
Figure PCTCN2019107127-appb-000001
Figure PCTCN2019107127-appb-000002
Figure PCTCN2019107127-appb-000002
精简之后的日志信息2可以表示为:The simplified log information 2 can be expressed as:
Figure PCTCN2019107127-appb-000003
Figure PCTCN2019107127-appb-000003
精简之后的日志信息3可以表示为:The simplified log information 3 can be expressed as:
Figure PCTCN2019107127-appb-000004
Figure PCTCN2019107127-appb-000004
表2中存储的内容如下所示:The contents stored in Table 2 are as follows:
Figure PCTCN2019107127-appb-000005
Figure PCTCN2019107127-appb-000005
表3中存储的内容如下所示:The contents stored in Table 3 are as follows:
Figure PCTCN2019107127-appb-000006
Figure PCTCN2019107127-appb-000006
可以理解的,本申请实施例中,可以执行本申请实施例中的部分或全部步骤,这些步骤或操作仅是示例,本申请实施例还可以执行其它操作或者各种操作的变形。此外,各个步骤可以按照本申请实施例呈现的不同的顺序来执行,并且有可能并非要执行本申请实施例中的全部操作。It is understandable that in the embodiments of the present application, some or all of the steps in the embodiments of the present application can be performed, and these steps or operations are only examples, and the embodiments of the present application can also perform other operations or variations of various operations. In addition, each step may be executed in a different order presented in the embodiments of the present application, and it may not be necessary to perform all the operations in the embodiments of the present application.
可选的,在一些实施例中,本申请实施例中用户在通过日志服务API 111对存储的日志信息进行检索的过程中,可以在日志信息模板中进行关键字的搜索,从而可快 速查找到用户需要的日志信息,减少搜索的时间。Optionally, in some embodiments, in the process of retrieving the stored log information through the log service API 111 in the embodiments of the present application, the user can search for keywords in the log information template, so as to quickly find Log information that users need to reduce search time.
可选的,在一些实施例中,本申请实施例还可以在日志服务API 111端对存储的精简之后的日志信息进行还原,以便于用户可以获取到原始的日志信息,并可以通过对原始的日志信息的分析,了解节点在运行中经常出现或者偶然出现的问题。Optionally, in some embodiments, the embodiments of this application can also restore the reduced log information stored on the log service API 111 side, so that the user can obtain the original log information, and the original log Analysis of log information to understand the frequent or accidental problems of the node during operation.
具体的,如图1所示,在分布式数据库112中,用户可以通过日志服务API 111端输入各种标签信息以及日志关键字来获取到需要的日志信息。分布式数据库112可以根据关键字中包括的日志信息模板标识tid获取相关联的日志信息模板以及变量值列表字符串mg,并根据获取到的日志信息模板以及变量值列表字符串mg,将变量值列表字符串mg自动带入日志信息模板的占位标示符的位置,从而还原出原始的日志信息。Specifically, as shown in FIG. 1, in the distributed database 112, the user can input various tag information and log keywords through the log service API 111 to obtain the required log information. The distributed database 112 may obtain the associated log information template and the variable value list string mg according to the log information template identifier tid included in the keyword, and according to the obtained log information template and the variable value list string mg, the variable value The list string mg is automatically brought into the position of the placeholder identifier of the log information template, so as to restore the original log information.
需要说明的是,本申请实施例中的分布式数据库112可以是分布式的关系型数据库,也可以是带有join功能的非关系型数据库。分布式的关系型数据库或带有join功能的非关系型数据库可以通过日志信息模板标识tid获取相关联的日志信息模板以及变量值列表字符串mg。It should be noted that the distributed database 112 in the embodiment of the present application may be a distributed relational database or a non-relational database with a join function. A distributed relational database or a non-relational database with a join function can obtain the associated log information template and the variable value list string mg through the log information template identifier tid.
可选的,在一些实施例中,用户可以通过浏览器使用日志服务API 111提供的接口与日志服务系统100进行交互。Optionally, in some embodiments, the user can interact with the log service system 100 using the interface provided by the log service API 111 through a browser.
具体的,用户可以通过浏览器输入查询请求,该查询请求中可以包括用于需要查询的日志信息的关键字和/或相关的标签信息。数据库可以根据查询请求中的日志信息的关键字和/或相关的标签信息从数据库的表中获取存储的日志信息,并可以将存储的日志信息反馈给用户。用户不仅可以通过对日志信息的分析,了解节点在运行中经常出现或者偶然出现的问题,还可以通过分析日志信息模板所对应的标识的分布规律,挖掘潜在的系统异常,并提前发出报警提醒。Specifically, the user may input a query request through a browser, and the query request may include keywords and/or related tag information for the log information to be queried. The database may obtain the stored log information from the table of the database according to the keywords and/or related tag information of the log information in the query request, and may feed back the stored log information to the user. Users can not only analyze the log information to understand the problems that occur frequently or occasionally during the operation of the node, but also analyze the distribution law of the identifier corresponding to the log information template to discover potential system abnormalities and send out alarms in advance.
应理解,查询请求中包括的关键字可以是日志信息中的某一个字符串。It should be understood that the keyword included in the query request may be a certain character string in the log information.
作为一个示例,用户输入的查询请求为查询“2018年1月1日”的日志信息,数据库可以在接收到查询请求之后,可以根据查询的时间“2018年1月1日”确定tm的取值。并可以根据tm的取值从数据库的表1中存储的日志信息中获取符合要求的日志信息。同时,数据库的join功能可以从表3中获取该日志信息中的日志信息模板标识所对应的日志信息模板,将表1中存储的日志信息带入到日志信息模板中,生成原始日志信息,并可以将原始日志信息反馈给用户。作为另一个示例,用户还可以在查询请求中输入日志信息模板中的一个字符串,数据库可以在表3中获取该日志信息中的日志信息模板标识所对应的日志信息模板。数据库的join功能还可以从表1中存储的日志信息中获取具有该日志信息模板标识的日志信息,将表1中存储的日志信息带入到日志信息模板中,生成原始日志信息,并可以将原始日志信息反馈给用户。As an example, the query request entered by the user is to query the log information of "January 1, 2018". After receiving the query request, the database can determine the value of tm according to the query time "January 1, 2018" . And according to the value of tm, the log information that meets the requirements can be obtained from the log information stored in Table 1 of the database. At the same time, the join function of the database can obtain the log information template corresponding to the log information template identifier in the log information from Table 3, bring the log information stored in Table 1 into the log information template, and generate the original log information, and The original log information can be fed back to the user. As another example, the user can also enter a character string in the log information template in the query request, and the database can obtain the log information template corresponding to the log information template identifier in the log information in Table 3. The join function of the database can also obtain the log information with the log information template identifier from the log information stored in Table 1, and bring the log information stored in Table 1 to the log information template to generate original log information, and The original log information is fed back to the user.
下面以用户输入的查询请求为查询IP地址为“10.0.26.102”上产生的日志信息作为示例,对用户查询日志信息的过程进行详细的解释说明The following takes the query request entered by the user as the log information generated on the query IP address "10.0.26.102" as an example to explain the process of the user querying the log information in detail
用户通过日志服务API 111输入的查询请求中包括:IP=10.0.26.102。数据库112可以根据IP=10.0.26.102,从表1中存储的日志信息中获取到IP=10.0.26.102的日志信息,如下所示:The query request entered by the user through the log service API 111 includes: IP = 10.0.26.102. The database 112 can obtain log information with IP = 10.0.26.102 from the log information stored in Table 1 according to IP = 10.0.26.102, as shown below:
Figure PCTCN2019107127-appb-000007
Figure PCTCN2019107127-appb-000007
Figure PCTCN2019107127-appb-000008
Figure PCTCN2019107127-appb-000008
数据库112的join功能还可以根据上述日志信息中的为tid=5085657271133051000,从表3中获取tid=5085657271133051000对应的日志信息模板,如下所示:The join function of the database 112 can also obtain the log information template corresponding to tid=5085657271133051000 from Table 3 according to the tid=5085657271133051000 in the above log information, as shown below:
Figure PCTCN2019107127-appb-000009
Figure PCTCN2019107127-appb-000009
数据库112可以根据日志信息中的tm还原出日志信息中原始携带的时间戳信息。例如,将tm=1541770754696还原成[2018-11-09 13:39:14.696],将tm=1541770755698还原成[2018-11-09 13:39:15.698],将tm=1541770756066还原成[2018-11-09 13:39:16.066]。The database 112 can restore the time stamp information originally carried in the log information according to the tm in the log information. For example, tm=1541770754696 is reduced to [2018-11-09 13:39:14.696], tm=1541770755698 is reduced to [2018-11-09 13:39:15.698], tm=1541770756066 is reduced to [2018-11 -09 13:39:16.066].
数据库112还可以将原始携带的时间戳信息以及上述从表1中获取到的符合IP=10.0.26.102的日志信息带入到日志信息模板中,形成原始日志信息。将原始日志信息提供给日志服务API 111的前端界面,以实现对原始日志信息的还原。The database 112 may also bring the originally carried time stamp information and the log information that meets the IP=10.0.26.102 obtained from Table 1 into the log information template to form the original log information. Provide the original log information to the front-end interface of Log Service API 111 to restore the original log information.
提供给日志服务API 111的前端界面的原始日志信息如下所示:The original log information provided to the front-end interface of Log Service API 111 is as follows:
日志信息1:Log information 1:
[2018-11-09 13:39:14.696][10.0.26.102][INF0][bulk -thread-2][BulkHandlerRunable:submits 138]get data from queue timeout! [2018-11-09 13:39:14.696][10.0.26.102][INF0][bulk - thread-2][BulkHandlerRunable:submits 138]get data from queue timeout!
日志信息2:Log information 2:
[2018-11-09 13:39:15.698][10.0.26.102][INF0][bulk -thread-5][BulkHandlerRunable:submits 138]get data from queue timeout! [2018-11-09 13:39:15.698][10.0.26.102][INF0][bulk - thread-5][BulkHandlerRunable:submits 138]get data from queue timeout!
日志信息3:Log information 3:
[2018-11-09 13:39:16.066][10.0.26.102][INF0][bulk -thread-3][BulkHandlerRunable:submits 138]get data from queue timeout! [2018-11-09 13:39:16.066][10.0.26.102][INF0][bulk - thread-3][BulkHandlerRunable:submits 138]get data from queue timeout!
上文结合图1至图2,详细描述了本申请实施例提供存储日志信息的方法以及查询日志信息的方法,下面将结合图3-图4,详细描述本申请的装置实施例。应理解,方法实施例的描述与装置实施例的描述相互对应,因此,未详细描述的部分可以参见前面方法实施例。The foregoing describes in detail the method for storing log information and the method for querying log information provided by the embodiments of the present application with reference to Figs. 1 to 2, and the device embodiments of the present application will be described in detail below with reference to Figs. 3-4. It should be understood that the description of the method embodiment and the description of the device embodiment correspond to each other, and therefore, the parts that are not described in detail may refer to the previous method embodiment.
图3是本申请实施例提供的一种存储日志信息的装置300的结构性示意性图。该装置300包括:识别模块310、处理模块320、存储模块330。FIG. 3 is a schematic structural diagram of an apparatus 300 for storing log information provided by an embodiment of the present application. The device 300 includes: an identification module 310, a processing module 320, and a storage module 330.
识别模块310,用于从多条日志信息中识别出所述多条日志信息的第一部分和第二部分,其中,所述多条日志信息的第一部分为在所述多条日志信息中小于或等于第一阈值条日志信息中相同的部分,所述多条日志信息的第二部分为在所述多条日志信息中小于或等于第二阈值条日志信息中不相同的部分;The identification module 310 is configured to identify the first part and the second part of the plurality of log information from the plurality of log information, wherein the first part of the plurality of log information is less than or The same part of the pieces of log information equal to the first threshold, and the second part of the pieces of log information is the different part of the pieces of log information that is less than or equal to the second threshold;
处理模块320,用于通过占位标示符代替所述多条日志信息的第二部分,形成所述多条日志信息对应的日志信息模板;The processing module 320 is configured to replace the second part of the multiple pieces of log information with a placeholder identifier to form a log information template corresponding to the multiple pieces of log information;
存储模块330,用于将所述多条日志信息对应的日志信息模板和所述多条日志信息的第二部分分别存储至数据库。The storage module 330 is configured to store the log information template corresponding to the multiple pieces of log information and the second part of the multiple pieces of log information to the database respectively.
可选的,在一些实施例中,所述装置300还包括:Optionally, in some embodiments, the apparatus 300 further includes:
生成模块340,用于生成所述多条日志信息对应的所述日志信息模板的标识。The generating module 340 is configured to generate an identifier of the log information template corresponding to the multiple pieces of log information.
可选的,在一些实施例中,所述存储模块330具体用于:将所述多条日志信息的第二部分和所述日志信息模板的标识存储在所述数据库的第一空间;将所述日志信息模板和所述日志信息模板的标识存储在所述数据库的第二空间。Optionally, in some embodiments, the storage module 330 is specifically configured to: store the second part of the plurality of log information and the identifier of the log information template in the first space of the database; The log information template and the identifier of the log information template are stored in the second space of the database.
应理解的是,本申请实施例的存储日志信息的装置300可以通过专用集成电路(application-specific integrated circuit,ASIC)实现,或可编程逻辑器件(programmable logic device,PLD)实现,上述PLD可以是复杂程序逻辑器件(complex programmable logical device,CPLD),现场可编程门阵列(field-programmable gate array,FPGA),通用阵列逻辑(generic array logic,GAL)或其任意组合。也可以通过软件实现图2所示的存储日志信息的方法时,存储日志信息的装置300及其各个模块也可以为软件模块。It should be understood that the apparatus 300 for storing log information in the embodiment of the present application may be implemented by an application-specific integrated circuit (ASIC) or a programmable logic device (PLD), and the above PLD may be Complex programmable logical device (CPLD), field-programmable gate array (FPGA), generic array logic (GAL) or any combination thereof. When the method for storing log information shown in FIG. 2 can also be implemented by software, the device 300 for storing log information and its respective modules may also be software modules.
图4是本申请实施例提供的一种存储日志信息的装置400的结构性示意性图。存储日志信息的装置400包括至少一个计算节点410,计算节点410可以包括处理单元411和通信接口412,处理单元411用于执行各种软件程序所定义的功能,例如,用于实现存储日志信息的功能。通信接口412用于与其他计算节点进行通信交互,其他设备可以是其它物理服务器,具体地,通信接口412可以是网络适配卡。FIG. 4 is a schematic structural diagram of an apparatus 400 for storing log information provided by an embodiment of the present application. The apparatus 400 for storing log information includes at least one computing node 410. The computing node 410 may include a processing unit 411 and a communication interface 412. The processing unit 411 is used to execute functions defined by various software programs, for example, to implement storage of log information. Features. The communication interface 412 is used to communicate and interact with other computing nodes, and other devices may be other physical servers. Specifically, the communication interface 412 may be a network adapter card.
可选地,该计算节点410还可以包括输入/输出接口413,输入/输出接口413连接有输入/输出设备,用于接收输入的信息,输出操作结果。输入/输出接口413可以为鼠标、键盘、显示器、或者光驱等。可选地,该计算节点410还可以包括辅助存储器414,一般也称为外存,辅助存储器414的存储介质可以是磁性介质(例如,软盘、硬盘、磁带)、光介质(例如光盘)、或者半导体介质(例如固态硬盘)等。Optionally, the computing node 410 may further include an input/output interface 413, and the input/output interface 413 is connected to an input/output device for receiving input information and outputting operation results. The input/output interface 413 may be a mouse, a keyboard, a display, or an optical drive. Optionally, the computing node 410 may also include auxiliary storage 414, which is generally also referred to as external storage. The storage medium of the auxiliary storage 414 may be a magnetic medium (for example, a floppy disk, a hard disk, a magnetic tape), an optical medium (for example, an optical disk), or Semiconductor media (such as solid state drives), etc.
可选的,计算节点410还可以包括总线415。其中,处理单元411、通信接口412、输入/输出接口413、辅助存储器414可以通过总线415连接。总线415可以是外设部件互连标准(peripheral component interconnect,PCI)总线或扩展工业标准结构(extended industry standard architecture,EISA)总线等。总线415可以分为地址总线、数据总线、控制总线等。为便于表示,图4中仅用一条线表示,但并不表示仅有一根总线或一种类型的总线。Optionally, the computing node 410 may further include a bus 415. Among them, the processing unit 411, the communication interface 412, the input/output interface 413, and the auxiliary memory 414 may be connected via a bus 415. The bus 415 may be a peripheral component interconnect standard (PCI) bus or an extended industry standard architecture (EISA) bus, etc. The bus 415 can be divided into an address bus, a data bus, a control bus, and so on. For ease of representation, only one line is used to represent in FIG. 4, but it does not mean that there is only one bus or one type of bus.
处理单元411可以有多种具体实现形式,例如处理单元411可以包括处理器4112和存储器4111,处理器4112根据存储器4111中存储的程序指令执行图2所示的实施例的相关操作。处理器4112可以为中央处理单元(central processing unit,CPU)。该处理器还可以是其它通用处理器、数字信号处理器(digital signal processor,DSP)、专用集成电路(application specific integrated circuit,ASIC)、现成可编程门阵列(field programmable gate Array,FPGA)或者其它可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。或者该处理器410采用一个或多个集成电路,用于执行相关程序,以实现本申请实施例所提供的技术方案。The processing unit 411 may have a variety of specific implementation forms. For example, the processing unit 411 may include a processor 4112 and a memory 4111, and the processor 4112 performs related operations of the embodiment shown in FIG. 2 according to program instructions stored in the memory 4111. The processor 4112 may be a central processing unit (central processing unit, CPU). The processor may also be other general-purpose processors, digital signal processors (digital signal processors, DSP), application specific integrated circuits (ASICs), ready-made programmable gate arrays (field programmable gate arrays, FPGAs) or other Programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. The general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like. Or, the processor 410 adopts one or more integrated circuits to execute related programs to implement the technical solutions provided in the embodiments of the present application.
应理解,计算节点410的处理器4112可以通过存储在存储器4111中的程序指令来运行如图3所示的识别模块310、处理模块320、存储模块330中的至少一个。It should be understood that the processor 4112 of the computing node 410 may run at least one of the identification module 310, the processing module 320, and the storage module 330 shown in FIG. 3 through program instructions stored in the memory 4111.
还应理解,计算节点410的存储器4111或辅助存储器414还可以存储有图2所述的数据库。It should also be understood that the memory 4111 or the auxiliary memory 414 of the computing node 410 may also store the database described in FIG. 2.
存储日志信息的装置400中的各个单元的上述和其它操作和/或功能分别为了实现图2中的方法的相应流程,为了简洁,在此不再赘述。The above-mentioned and other operations and/or functions of each unit in the apparatus 400 for storing log information are respectively for implementing the corresponding flow of the method in FIG. 2, and are not repeated here for brevity.
图5是本申请实施例提供的一种查询日志信息的装置500的结构性示意性图。该查询日志信息的500包括:接收模块510、获取模块520、返回模块530。FIG. 5 is a schematic structural diagram of an apparatus 500 for querying log information provided by an embodiment of the present application. The log information query 500 includes: a receiving module 510, an acquiring module 520, and a returning module 530.
接收模块510,用于接收查询请求,所述查询请求用于查询数据库中存储的日志信息;The receiving module 510 is configured to receive a query request, where the query request is used to query log information stored in the database;
获取模块520,用于根据所述查询请求从所述数据库中获取存储的所述多条日志信息的第二部分和所述多条日志信息对应的日志信息模板;将所述多条日志信息的第二部分带入所述多条日志信息对应的日志信息模板获得该查询请求对应的日志信息;The obtaining module 520 is configured to obtain the second part of the plurality of log information stored in the database and the log information template corresponding to the plurality of log information from the database according to the query request; The second part brings in the log information template corresponding to the multiple log information to obtain the log information corresponding to the query request;
可选的,还可以包括返回模块530,用于向发出该查询请求的客户端返回所述对应的日志信息。Optionally, it may further include a return module 530, configured to return the corresponding log information to the client that issued the query request.
返回模块530和接收模块510也可以由同一模块实现。The returning module 530 and the receiving module 510 can also be implemented by the same module.
可选的,在一些实施例中,所述装置500还包括:Optionally, in some embodiments, the apparatus 500 further includes:
识别模块540,用于从多条日志信息中识别出所述多条日志信息的所述第一部分和所述第二部分;The identification module 540 is configured to identify the first part and the second part of the plurality of log information from the plurality of log information;
处理模块550,用于用占位标示符代替所述多条日志信息的第二部分,形成所述多条日志信息对应的日志信息模板;The processing module 550 is configured to replace the second part of the multiple pieces of log information with a placeholder identifier to form a log information template corresponding to the multiple pieces of log information;
存储模块560,用于将所述多条日志信息对应的日志信息模板和所述多条日志信息的第二部分分别存储至数据库。The storage module 560 is configured to store the log information template corresponding to the multiple pieces of log information and the second part of the multiple pieces of log information to the database respectively.
可选的,在一些实施例中,所述装置500还包括:Optionally, in some embodiments, the apparatus 500 further includes:
生成模块570,用于生成所述多条日志信息对应的所述日志信息模板的标识。The generating module 570 is configured to generate an identifier of the log information template corresponding to the multiple pieces of log information.
可选的,在一些实施例中,存储模块560具体用于:将所述多条日志信息的第二部分和所述日志信息模板的标识存储在所述数据库的第一空间;将所述日志信息模板和所述日志信息模板的标识存储在所述数据库的第二空间。Optionally, in some embodiments, the storage module 560 is specifically configured to: store the second part of the multiple pieces of log information and the identifier of the log information template in the first space of the database; and store the log The information template and the identification of the log information template are stored in the second space of the database.
应理解的是,本申请实施例的查询日志信息的装置500可以通过专用集成电路(application-specific integrated circuit,ASIC)实现,或可编程逻辑器件(programmable logic device,PLD)实现,上述PLD可以是复杂程序逻辑器件(complex programmable logical device,CPLD),现场可编程门阵列(field-programmable gate array,FPGA),通用阵列逻辑(generic array logic,GAL)或其任意组合。也可以通过软件实现查询日志信息的方法时,查询日志信息的装置500及其各个模块也可以为软件模块。It should be understood that the device 500 for querying log information in the embodiment of the present application may be implemented by an application-specific integrated circuit (ASIC) or a programmable logic device (PLD), and the above PLD may be Complex programmable logical device (CPLD), field-programmable gate array (FPGA), generic array logic (GAL) or any combination thereof. When the method for querying log information can also be implemented by software, the device 500 for querying log information and its respective modules can also be software modules.
图6是本申请实施例提供的一种查询日志信息的装置600的结构性示意性图。查询日志信息的装置600包括至少一个计算节点610,计算节点610可以包括处理单元611和通信接口612,处理单元611用于执行各种软件程序所定义的功能,例如,用于实现存储日志信息的功能。通信接口412用于与其他计算节点进行通信交互,其他设备可以是其它物理服务器,具体地,通信接口612可以是网络适配卡。FIG. 6 is a schematic structural diagram of an apparatus 600 for querying log information provided by an embodiment of the present application. The apparatus 600 for querying log information includes at least one computing node 610. The computing node 610 may include a processing unit 611 and a communication interface 612. The processing unit 611 is used to execute functions defined by various software programs, for example, to store log information. Features. The communication interface 412 is used to communicate and interact with other computing nodes, and other devices may be other physical servers. Specifically, the communication interface 612 may be a network adapter card.
可选地,该计算节点610还可以包括输入/输出接口613,输入/输出接口413连接有输入/输出设备,用于接收输入的信息,输出操作结果。输入/输出接口613可以为鼠标、键盘、显示器、或者光驱等。可选地,该计算节点610还可以包括辅助存储器614,一般也称为外存,辅助存储器614的存储介质可以是磁性介质(例如,软盘、硬盘、磁带)、光介质(例如光盘)、或者半导体介质(例如固态硬盘)等。Optionally, the computing node 610 may further include an input/output interface 613, and the input/output interface 413 is connected to an input/output device for receiving input information and outputting operation results. The input/output interface 613 can be a mouse, a keyboard, a display, or an optical drive. Optionally, the computing node 610 may also include auxiliary storage 614, which is generally called external storage. The storage medium of the auxiliary storage 614 may be a magnetic medium (for example, a floppy disk, a hard disk, a magnetic tape), an optical medium (for example, an optical disk), or Semiconductor media (such as solid state drives), etc.
可选的,计算节点610还可以包括总线615。其中,处理单元611、通信接口612、输入/输出接口613、辅助存储器614可以通过总线615连接。总线615与总线415的功能和实现方式类似。Optionally, the computing node 610 may further include a bus 615. Among them, the processing unit 611, the communication interface 612, the input/output interface 613, and the auxiliary memory 614 may be connected through the bus 615. The function and implementation manner of the bus 615 and the bus 415 are similar.
处理单元611可以有多种具体实现形式,例如处理单元611可以包括处理器6112和存储器6111,处理器6112根据存储器6111中存储的程序指令执行上述实施例的相关操作。处理器6112可以为中央处理单元(central processing unit,CPU)。该处理器还可以是其它通用处理器、数字信号处理器(digital signal processor,DSP)、专用集成电路(application specific integrated circuit,ASIC)、现成可编程门阵列(field programmable gate Array,FPGA)或者其它可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。或者该处理器610采用一个或多个集成电路,用于执行相关程序,以实现本申请实施例所提供的技术方案。The processing unit 611 may have a variety of specific implementation forms. For example, the processing unit 611 may include a processor 6112 and a memory 6111, and the processor 6112 performs related operations of the foregoing embodiments according to program instructions stored in the memory 6111. The processor 6112 may be a central processing unit (central processing unit, CPU). The processor may also be other general-purpose processors, digital signal processors (digital signal processors, DSP), application specific integrated circuits (ASICs), ready-made programmable gate arrays (field programmable gate arrays, FPGAs) or other Programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. The general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like. Or, the processor 610 adopts one or more integrated circuits to execute related programs to implement the technical solutions provided in the embodiments of the present application.
应理解,计算节点610的处理器6112可以通过存储在存储器6111中的程序指令 来运行如图5所示的接收模块510、获取模块520、返回模块530中的至少一个。It should be understood that the processor 6112 of the computing node 610 may run at least one of the receiving module 510, the acquiring module 520, and the returning module 530 shown in FIG. 5 through program instructions stored in the memory 6111.
还应理解,计算节点610的存储器6111或辅助存储器614还可以存储有图2所述的数据库。It should also be understood that the memory 6111 or the auxiliary memory 614 of the computing node 610 may also store the database described in FIG. 2.
查询日志信息的装置600中的各个单元的上述和其它操作和/或功能分别为了实现上述查询日志信息的方法的相应流程,为了简洁,在此不再赘述。The above-mentioned and other operations and/or functions of each unit in the device 600 for querying log information are used to implement the corresponding procedures of the method for querying log information. For brevity, they will not be repeated here.
上述实施例,可以全部或部分地通过软件、硬件、固件或其他任意组合来实现。当使用软件实现时,上述实施例可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载或执行所述计算机程序指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以为通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集合的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质。半导体介质可以是固态硬盘(solid state drive,SSD)。The foregoing embodiments may be implemented in whole or in part by software, hardware, firmware or any other combination. When implemented by software, the above-mentioned embodiments may be implemented in the form of a computer program product in whole or in part. The computer program product includes one or more computer instructions. When the computer program instructions are loaded or executed on a computer, the processes or functions described in the embodiments of the present application are generated in whole or in part. The computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium. For example, the computer instructions may be transmitted from a website, computer, server, or data center. Transmission to another website, computer, server or data center via wired (such as coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.). The computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server or a data center that includes one or more sets of available media. The usable medium may be a magnetic medium (for example, a floppy disk, a hard disk, and a magnetic tape), an optical medium (for example, a DVD), or a semiconductor medium. The semiconductor medium may be a solid state drive (SSD).
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能。A person of ordinary skill in the art may be aware that the units and algorithm steps of the examples described in combination with the embodiments disclosed herein can be implemented by electronic hardware or a combination of computer software and electronic hardware. Whether these functions are performed by hardware or software depends on the specific application and design constraint conditions of the technical solution. Professional technicians can use different methods for each specific application to achieve the described functions.
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。Those skilled in the art can clearly understand that, for the convenience and conciseness of description, the specific working process of the above-described system, device, and unit can refer to the corresponding process in the foregoing method embodiment, which is not repeated here.
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。In the several embodiments provided in this application, it should be understood that the disclosed system, device, and method may be implemented in other ways. For example, the device embodiments described above are merely illustrative. For example, the division of the units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components may be combined or It can be integrated into another system, or some features can be ignored or not implemented. In addition, the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.

Claims (18)

  1. 一种存储日志信息的方法,其特征在于,所述方法包括:A method for storing log information, characterized in that the method includes:
    从多条日志信息中识别出所述多条日志信息的第一部分和第二部分,其中,所述多条日志信息的第一部分为在所述多条日志信息中小于或等于第一阈值条日志信息中相同的部分,所述多条日志信息的第二部分为在所述多条日志信息中小于或等于第二阈值条日志信息中不相同的部分;The first part and the second part of the plurality of log information are identified from the plurality of log information, wherein the first part of the plurality of log information is a log that is less than or equal to a first threshold in the plurality of log information The same part of the information, the second part of the multiple pieces of log information is a different part of the multiple pieces of log information that is less than or equal to a second threshold value;
    用占位标示符代替所述多条日志信息的第二部分,形成所述多条日志信息对应的日志信息模板,所述日志信息模板包括所述多条日志信息的第一部分;Replacing the second part of the multiple pieces of log information with a placeholder identifier to form a log information template corresponding to the multiple pieces of log information, the log information template including the first part of the multiple pieces of log information;
    将所述多条日志信息对应的日志信息模板和所述多条日志信息的第二部分分别存储至数据库。The log information template corresponding to the multiple pieces of log information and the second part of the multiple pieces of log information are respectively stored in a database.
  2. 根据权利要求1所述的方法,其特征在于,所述方法还包括:The method of claim 1, wherein the method further comprises:
    生成所述多条日志信息对应的所述日志信息模板的标识。The identifier of the log information template corresponding to the multiple pieces of log information is generated.
  3. 根据权利要求2所述的方法,其特征在于,所述将所述多条日志信息对应的日志信息模板和所述多条日志信息的第二部分分别存储至数据库,包括:The method according to claim 2, wherein the storing the log information template corresponding to the multiple pieces of log information and the second part of the multiple pieces of log information to a database respectively comprises:
    将所述多条日志信息的第二部分和所述日志信息模板的标识存储在所述数据库的第一空间;Storing the second part of the plurality of log information and the identifier of the log information template in the first space of the database;
    将所述日志信息模板和所述日志信息模板的标识存储在所述数据库的第二空间。The log information template and the identifier of the log information template are stored in the second space of the database.
  4. 一种查询日志信息的方法,其特征在于,所述方法应用于日志信息查询系统,所述日志信息查询系统包括数据库,所述数据库存储有多条日志信息的第二部分和所述多条日志信息对应的日志信息模板,所述日志信息模板包括所述多条日志信息的第一部分,所述多条日志信息的第一部分为在所述多条日志信息中小于或等于第一阈值条日志信息中相同的部分,所述多条日志信息的第二部分为在所述多条日志信息中小于或等于第二阈值条日志信息中不相同的部分,所述方法包括:A method for querying log information, characterized in that the method is applied to a log information query system, the log information query system includes a database, and the database stores a second part of a plurality of log information and the plurality of logs A log information template corresponding to the information, where the log information template includes a first part of the plurality of log information, and the first part of the plurality of log information is less than or equal to a first threshold in the plurality of log information The second part of the multiple pieces of log information is a different part of the multiple pieces of log information that is less than or equal to a second threshold value in the log information, and the method includes:
    接收查询请求,所述查询请求用于查询数据库中存储的日志信息;Receiving a query request, the query request being used to query log information stored in a database;
    根据所述查询请求从所述数据库中获取存储的所述多条日志信息的第二部分和所述多条日志信息对应的日志信息模板;Acquiring, from the database, the second part of the plurality of log information stored and the log information template corresponding to the plurality of log information from the database according to the query request;
    将所述多条日志信息的第二部分带入所述多条日志信息对应的日志信息模板获得对应的日志信息。Bring the second part of the multiple pieces of log information into the log information template corresponding to the multiple pieces of log information to obtain corresponding log information.
  5. 根据权利要求4所述的方法,其特征在于,在所述接收查询请求之前,所述方法还包括:The method according to claim 4, characterized in that, before the receiving the query request, the method further comprises:
    从多条日志信息中识别出所述多条日志信息的所述第一部分和所述多条日志信息的所述第二部分;Identifying the first part of the plurality of log information and the second part of the plurality of log information from the plurality of log information;
    用占位标示符代替所述多条日志信息的所述第二部分,形成所述多条日志信息对应的日志信息模板;Replacing the second part of the multiple pieces of log information with a placeholder identifier to form a log information template corresponding to the multiple pieces of log information;
    将所述多条日志信息对应的日志信息模板和所述多条日志信息的第二部分分别存储至所述数据库。The log information template corresponding to the multiple pieces of log information and the second part of the multiple pieces of log information are respectively stored in the database.
  6. 根据权利要求5所述的方法,其特征在于,所述方法还包括:The method of claim 5, wherein the method further comprises:
    生成所述多条日志信息对应的所述日志信息模板的标识。The identifier of the log information template corresponding to the multiple pieces of log information is generated.
  7. 根据权利要求6所述的方法,其特征在于,所述方法还包括:The method according to claim 6, wherein the method further comprises:
    将所述多条日志信息的第二部分和所述日志信息模板的标识存储在所述数据库的第一空间;Storing the second part of the plurality of log information and the identifier of the log information template in the first space of the database;
    将所述日志信息模板和所述日志信息模板的标识存储在所述数据库的第二空间。The log information template and the identifier of the log information template are stored in the second space of the database.
  8. 一种存储日志信息的装置,其特征在于,所述装置包括:A device for storing log information, characterized in that the device includes:
    识别模块,用于从多条日志信息中识别出所述多条日志信息的第一部分和第二部分,其中,所述多条日志信息的第一部分为在所述多条日志信息中小于或等于第一阈值条日志信息中相同的部分,所述多条日志信息的第二部分为在所述多条日志信息中小于或等于第二阈值条日志信息中不相同的部分;The identification module is configured to identify the first part and the second part of the plurality of log information from the plurality of log information, wherein the first part of the plurality of log information is less than or equal to The same part in the first threshold piece of log information, and the second part of the multiple pieces of log information is a different part in the multiple pieces of log information that is less than or equal to the second threshold piece of log information;
    处理模块,用于通过占位标示符代替所述多条日志信息的第二部分,形成所述多条日志信息对应的日志信息模板,所述日志信息模板包括所述多条日志信息的第一部分;The processing module is configured to replace the second part of the multiple pieces of log information with a placeholder identifier to form a log information template corresponding to the multiple pieces of log information, and the log information template includes the first part of the multiple pieces of log information ;
    存储模块,用于将所述多条日志信息对应的日志信息模板和所述多条日志信息的第二部分分别存储至数据库。The storage module is configured to store the log information template corresponding to the multiple pieces of log information and the second part of the multiple pieces of log information to the database respectively.
  9. 根据权利要求8所述的装置,其特征在于,所述装置还包括:The device according to claim 8, wherein the device further comprises:
    生成模块,用于生成所述多条日志信息对应的所述日志信息模板的标识。The generating module is used to generate the identifier of the log information template corresponding to the multiple pieces of log information.
  10. 根据权利要求9所述的装置,其特征在于,所述存储模块具体用于:The device according to claim 9, wherein the storage module is specifically configured to:
    将所述多条日志信息的第二部分和所述日志信息模板的标识存储在所述数据库的第一空间;Storing the second part of the plurality of log information and the identifier of the log information template in the first space of the database;
    将所述日志信息模板和所述日志信息模板的标识存储在所述数据库的第二空间。The log information template and the identifier of the log information template are stored in the second space of the database.
  11. 一种查询日志信息的装置,其特征在于,所述装置应用于日志信息查询系统,所述日志信息查询系统包括数据库,所述数据库存储有多条日志信息的第二部分和所述多条日志信息对应的日志信息模板,所述日志信息模板包括所述多条日志信息的第一部分,所述多条日志信息的第一部分为在所述多条日志信息中小于或等于第一阈值条日志信息中相同的部分,所述多条日志信息的第二部分为在所述多条日志信息中小于或等于第二阈值条日志信息中不相同的部分,所述装置包括:A device for querying log information, wherein the device is applied to a log information query system, the log information query system includes a database, and the database stores a second part of a plurality of log information and the plurality of logs A log information template corresponding to the information, where the log information template includes a first part of the plurality of log information, and the first part of the plurality of log information is less than or equal to a first threshold in the plurality of log information The second part of the multiple pieces of log information is a different part of the multiple pieces of log information that is less than or equal to a second threshold value, and the device includes:
    接收模块,用于接收查询请求,所述查询请求用于查询数据库中存储的日志信息;A receiving module, configured to receive a query request, the query request being used to query log information stored in the database;
    获取模块,用于根据所述查询请求从所述数据库中获取存储的所述多条日志信息的第二部分和所述多条日志信息对应的日志信息模板;将所述多条日志信息的第二部分带入所述多条日志信息对应的日志信息模板获得对应的日志信息。The obtaining module is configured to obtain the second part of the multiple pieces of log information stored in the database and the log information template corresponding to the multiple pieces of log information from the database according to the query request; The second part brings in the log information template corresponding to the multiple pieces of log information to obtain the corresponding log information.
  12. 根据权利要求11所述的装置,其特征在于,所述装置还包括:The device according to claim 11, wherein the device further comprises:
    识别模块,用于从多条日志信息中识别出所述多条日志信息的所述第一部分和所述第二部分;An identification module for identifying the first part and the second part of the plurality of log information from the plurality of log information;
    处理模块,用于用占位标示符代替所述多条日志信息的第二部分,形成所述多条日志信息对应的日志信息模板;A processing module, configured to replace the second part of the multiple pieces of log information with a placeholder identifier to form a log information template corresponding to the multiple pieces of log information;
    存储模块,用于将所述多条日志信息对应的日志信息模板和所述多条日志信息的第二部分分别存储至数据库。The storage module is configured to store the log information template corresponding to the multiple pieces of log information and the second part of the multiple pieces of log information to the database respectively.
  13. 根据权利要求12所述的装置,其特征在于,所述装置还包括:The device according to claim 12, wherein the device further comprises:
    生成模块,用于生成所述多条日志信息对应的所述日志信息模板的标识。The generating module is used to generate the identifier of the log information template corresponding to the multiple pieces of log information.
  14. 根据权利要求13所述的装置,其特征在于,存储模块具体用于:The device according to claim 13, wherein the storage module is specifically configured to:
    将所述多条日志信息的第二部分和所述日志信息模板的标识存储在所述数据库的第一空间;Storing the second part of the plurality of log information and the identifier of the log information template in the first space of the database;
    将所述日志信息模板和所述日志信息模板的标识存储在所述数据库的第二空间。The log information template and the identifier of the log information template are stored in the second space of the database.
  15. 一种存储日志信息的装置,其特征在于,包括至少一个计算节点,每个计算节点包括处理器和存储器,所述存储日志信息的装置运行时,所述处理器运行所述存储器中的计算机执行指令以执行如权利要求1至3中任一项中所述的方法。A device for storing log information, characterized in that it includes at least one computing node, each computing node includes a processor and a memory, and when the device for storing log information is running, the processor runs the computer in the memory to execute Instructions to perform the method as claimed in any one of claims 1 to 3.
  16. 一种查询日志信息的装置,其特征在于,包括至少一个计算节点,每个计算节点包括处理器和存储器,所述存储日志信息的装置运行时,所述处理器运行所述存储器中的计算机执行指令以执行如权利要求4至7中任一项中所述的方法。A device for querying log information, characterized in that it includes at least one computing node, each computing node includes a processor and a memory, and when the device for storing log information is running, the processor runs the computer in the memory to execute Instructions to perform the method as claimed in any one of claims 4 to 7.
  17. 一种计算机非瞬态的非易失性计算机可读存储介质,其特征在于,包括计算机程序,当该计算机程序在至少一个计算节点上运行时,使得该至少一个计算节点执行如权利要求1至3中任一项所述的方法。A computer non-transitory non-volatile computer-readable storage medium, characterized by comprising a computer program, when the computer program is run on at least one computing node, the at least one computing node executes claims 1 to The method of any one of 3.
  18. 一种计算机非瞬态的非易失性计算机可读存储介质,其特征在于,包括计算机程序,当该计算机程序在至少一个计算节点上运行时,使得该至少一个计算节点执行如权利要求4至7中任一项所述的方法。A computer non-transitory non-volatile computer readable storage medium, characterized by comprising a computer program, when the computer program runs on at least one computing node, the at least one computing node executes claims 4 to 7. The method of any one of 7.
PCT/CN2019/107127 2019-02-02 2019-09-21 Method and device for storing and querying log information WO2020155651A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910106607.0A CN109885545A (en) 2019-02-02 2019-02-02 It stores, the method, apparatus of inquiry log information
CN201910106607.0 2019-02-02

Publications (1)

Publication Number Publication Date
WO2020155651A1 true WO2020155651A1 (en) 2020-08-06

Family

ID=66927813

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/107127 WO2020155651A1 (en) 2019-02-02 2019-09-21 Method and device for storing and querying log information

Country Status (2)

Country Link
CN (1) CN109885545A (en)
WO (1) WO2020155651A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116644039A (en) * 2023-05-25 2023-08-25 安徽继远软件有限公司 Automatic acquisition and analysis method for online capacity operation log based on big data

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109885545A (en) * 2019-02-02 2019-06-14 华为技术有限公司 It stores, the method, apparatus of inquiry log information
CN112152823B (en) * 2019-06-26 2022-09-02 北京易真学思教育科技有限公司 Website operation error monitoring method and device and computer storage medium
CN113472555B (en) * 2020-03-30 2022-09-23 华为技术有限公司 Fault detection method, system, device, server and storage medium
CN113420003A (en) * 2021-06-30 2021-09-21 中国航空油料有限责任公司 Method, device, equipment and medium for processing data interaction log
CN116894021A (en) * 2023-05-24 2023-10-17 北京优特捷信息技术有限公司 Log data storage method, query method, device, equipment and medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120124047A1 (en) * 2010-11-17 2012-05-17 Eric Hubbard Managing log entries
US20140019458A1 (en) * 2012-07-16 2014-01-16 International Business Machines Corporation Automatically generating a log parser given a sample log
CN105474200A (en) * 2013-04-30 2016-04-06 微软技术许可有限责任公司 Hydration and dehydration with placeholders
CN109885545A (en) * 2019-02-02 2019-06-14 华为技术有限公司 It stores, the method, apparatus of inquiry log information

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7676465B2 (en) * 2006-07-05 2010-03-09 Yahoo! Inc. Techniques for clustering structurally similar web pages based on page features
US20110219046A1 (en) * 2010-03-05 2011-09-08 Igor Nesmyanovich System, method and computer program product for managing data storage and rule-driven communications for a plurality of tenants
CN103544176B (en) * 2012-07-13 2018-08-10 百度在线网络技术(北京)有限公司 Method and apparatus for generating the page structure template corresponding to multiple pages
CN104331487B (en) * 2014-11-13 2019-03-12 上海携程商务有限公司 The processing method and processing device of log
CN108241658B (en) * 2016-12-24 2021-09-07 北京亿阳信通科技有限公司 Log pattern discovery method and system
CN108694206A (en) * 2017-04-11 2018-10-23 富士通株式会社 Information processing method and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120124047A1 (en) * 2010-11-17 2012-05-17 Eric Hubbard Managing log entries
US20140019458A1 (en) * 2012-07-16 2014-01-16 International Business Machines Corporation Automatically generating a log parser given a sample log
CN105474200A (en) * 2013-04-30 2016-04-06 微软技术许可有限责任公司 Hydration and dehydration with placeholders
CN109885545A (en) * 2019-02-02 2019-06-14 华为技术有限公司 It stores, the method, apparatus of inquiry log information

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116644039A (en) * 2023-05-25 2023-08-25 安徽继远软件有限公司 Automatic acquisition and analysis method for online capacity operation log based on big data
CN116644039B (en) * 2023-05-25 2023-12-19 安徽继远软件有限公司 Automatic acquisition and analysis method for online capacity operation log based on big data

Also Published As

Publication number Publication date
CN109885545A (en) 2019-06-14

Similar Documents

Publication Publication Date Title
WO2020155651A1 (en) Method and device for storing and querying log information
WO2021068547A1 (en) Log schema extraction method and apparatus
US11204707B2 (en) Scalable binning for big data deduplication
JP2017123168A (en) Method for making entity mention in short text associated with entity in semantic knowledge base, and device
CN110795257A (en) Method, device and equipment for processing multi-cluster operation records and storage medium
WO2020087082A1 (en) Trace and span sampling and analysis for instrumented software
EP3767483A1 (en) Method, device, system, and server for image retrieval, and storage medium
Wang et al. Loguad: log unsupervised anomaly detection based on word2vec
US20160098390A1 (en) Command history analysis apparatus and command history analysis method
WO2021164253A1 (en) Method and device for real-time multidimensional analysis of user behaviors, and storage medium
CN109933502B (en) Electronic device, user operation record processing method and storage medium
WO2023284132A1 (en) Method and system for analyzing cloud platform logs, device, and medium
CN107357794B (en) Method and device for optimizing data storage structure of key value database
WO2021109724A1 (en) Log anomaly detection method and apparatus
CN114461644A (en) Data acquisition method and device, electronic equipment and storage medium
US20210019297A1 (en) Service data processing
CN112612832A (en) Node analysis method, device, equipment and storage medium
WO2016199411A1 (en) Log display device, log display method and log display program
US10614102B2 (en) Method and system for creating entity records using existing data sources
WO2023097521A1 (en) Data model generation method and apparatus
CN116010480A (en) Time sequence database auditing method and system
CN115114297A (en) Data lightweight storage and search method and device, electronic equipment and storage medium
CN115168310A (en) Log management method, device, medium and system in LDAP system
WO2021159668A1 (en) Robot dialogue method and apparatus, computer device and storage medium
CN113760849A (en) Log processing method, system, electronic device and computer readable storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19913841

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19913841

Country of ref document: EP

Kind code of ref document: A1