CN113760847A

CN113760847A - Log data processing method, device, equipment and storage medium

Info

Publication number: CN113760847A
Application number: CN202110122310.0A
Authority: CN
Inventors: 梁秋实; 王行行; 桂创华
Original assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Wodong Tianjun Information Technology Co Ltd
Current assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Wodong Tianjun Information Technology Co Ltd
Priority date: 2021-01-28
Filing date: 2021-01-28
Publication date: 2021-12-07

Abstract

The embodiment of the invention provides a log data processing method, a device, equipment and a storage medium, wherein a to-be-stored log carrying label information and timestamp information is sent to a target data node of a log storage system through a gateway according to routing information, the target data node writes a plurality of to-be-stored logs into a first storage unit of the target data node according to the label information and the timestamp information carried by the to-be-stored log and time sequence, and generates an inverted index file according to the label information of the plurality of to-be-stored logs, and the inverted index file is used for positioning the to-be-queried target log according to the inverted index file and the time information in a log querying stage; and then transferring the multiple ordered logs to be stored in the first storage unit to a second storage unit in a data block mode, and storing each data block in the second storage unit according to a time sequence. The invention reduces the processing amount of the log storage process, ensures the performance of log data storage and query, reduces the cost, improves the processing efficiency and is suitable for the processing process of mass log data.

Description

Log data processing method, device, equipment and storage medium

Technical Field

The embodiment of the invention relates to the field of big data, in particular to a log data processing method, a device, equipment and a storage medium.

Background

Network equipment, a system, a service program and the like generate massive logs during operation, and each log records the description of relevant information in the operation process of the network equipment, the system and the service program, so that massive log data needs to be managed, including the storage and query of the log data.

In the prior art, log storage and query are usually performed in two ways, the first way is that, when log storage is performed, word segmentation is required to be performed on log content, an index is established on a keyword, an index file is generated, and when log query is performed, a target log is queried based on the keyword and the index file; in the second mode, the log content is not subjected to word segmentation and an index file is not generated when the log is stored, the log in a certain time period is inquired when the log is inquired, and then substrings are searched from the log, so that a target log containing the character strings is obtained.

The existing log storage and query mode wastes system performance or system storage space in the storage and/or query process, has high cost, and cannot store and query mass log data with low cost, high efficiency and high performance.

Disclosure of Invention

Embodiments of the present invention provide a method, an apparatus, a device, and a storage medium for processing log data, so as to reduce the cost of a massive log data processing process, improve the processing efficiency, and improve the performance of log data storage and query.

In a first aspect, an embodiment of the present invention provides a log data processing method, which is applied to any data node of a log storage system including multiple data nodes, where the data node includes a first storage unit and a second storage unit, and the method includes:

receiving a plurality of logs to be stored sent by a gateway according to routing information, wherein the logs to be stored carry tag information and timestamp information; the routing information is determined by the gateway according to the label information and the timestamp information of the log to be stored;

writing the plurality of logs to be stored into the first storage unit according to the tag information and the timestamp information and according to time sequence, and generating an inverted index file according to the tag information of the plurality of logs to be stored, wherein the inverted index file is used for positioning a target log to be queried according to the inverted index file and the time information in a log querying stage;

migrating the plurality of ordered logs to be stored in the first storage unit to the second storage unit in a data block mode; and storing each data block in the second storage unit according to a time sequence.

In a second aspect, an embodiment of the present invention provides a log data processing method, which is applied to a gateway, and the method includes:

receiving a log storage request, wherein the log storage request comprises a log to be stored, and the log to be stored carries tag information and timestamp information;

determining the routing information of the log to be stored according to the label information and the timestamp information of the log to be stored;

and sending the log to be stored to a corresponding target data node according to the routing information.

In a third aspect, an embodiment of the present invention provides a log data processing method, which is applied to any data node of a log storage system including multiple data nodes, where the data node includes a second storage unit, and multiple logs are stored in the second storage unit in a data block form, where each data block is sorted according to a time sequence, and multiple logs in each data block are sorted according to tag information and the time sequence, and the method includes:

receiving a log query request sent by a gateway according to routing information, wherein the log query request comprises tag information, keywords and time information of a log to be queried;

determining a corresponding target data block from the second storage unit according to the time information of the log to be queried;

searching a target log according to the label information and the key words of the log to be inquired and the inverted index file of the target data block; the inverted index file is generated in advance according to label information of a plurality of logs included in the target data block;

and sending the target log to the gateway.

In a fourth aspect, an embodiment of the present invention provides a log data processing method, which is applied to a gateway, and the method includes:

receiving a log query request, wherein the log query request comprises tag information, keywords and time information of a log to be queried;

determining the routing information of the log to be queried according to the label information of the log to be queried;

and sending the log query request to a corresponding target data node according to the routing information.

In a fifth aspect, an embodiment of the present invention provides a log data processing apparatus, which is applied to any data node of a log storage system including multiple data nodes, where the data node includes a first storage unit and a second storage unit, and the apparatus includes:

the receiving module is used for receiving a plurality of logs to be stored, which are sent by the gateway according to the routing information and carry the label information and the timestamp information; the routing information is determined by the gateway according to the label information and the timestamp information of the log to be stored;

the first storage module is used for writing the logs to be stored into the first storage unit according to the tag information and the timestamp information and according to a time sequence;

the index module is used for generating an inverted index file according to the label information of the logs to be stored, and the inverted index file is used for positioning a target log to be queried according to the inverted index file and the time information in a log query stage;

the second storage module is used for migrating the plurality of ordered logs to be stored in the first storage unit to the second storage unit in a data block form; and storing each data block in the second storage unit according to a time sequence.

In a sixth aspect, an embodiment of the present invention provides a log data processing apparatus, which is applied to a gateway, and the apparatus includes:

the log storage device comprises a receiving module, a storing module and a processing module, wherein the receiving module is used for receiving a log storage request, the log storage request comprises a log to be stored, and the log to be stored carries tag information and timestamp information;

the routing module is used for determining the routing information of the log to be stored according to the label information and the timestamp information of the log to be stored;

and the sending module is used for sending the log to be stored to the corresponding target data node according to the routing information.

A seventh aspect of the present invention provides a log data processing apparatus, which is applied to any data node of a log storage system including multiple data nodes, where the data node includes a second storage unit, and multiple logs are stored in the second storage unit in the form of data blocks, where the data blocks are sorted according to a time sequence, and the multiple logs in each data block are sorted according to tag information and the time sequence, and the apparatus includes:

the receiving module is used for receiving a log query request sent by the gateway according to the routing information, wherein the log query request comprises tag information, keywords and time information of a log to be queried;

the query module is used for determining a corresponding target data block from the second storage unit according to the time information of the log to be queried; searching a target log according to the label information and the key words of the log to be inquired and the inverted index file of the target data block; the inverted index file is generated in advance according to label information of a plurality of logs included in the target data block;

and the sending module is used for sending the target log to the gateway.

In an eighth aspect, an embodiment of the present invention provides a log data processing apparatus, which is applied to a gateway, and the apparatus includes:

the system comprises a receiving module, a query module and a query module, wherein the receiving module is used for receiving a log query request which comprises tag information, key words and time information of a log to be queried;

the routing module is used for determining the routing information of the log to be queried according to the label information of the log to be queried;

and the sending module is used for sending the log query request to a corresponding target data node according to the routing information.

In a ninth aspect, an embodiment of the present invention provides a data node, including:

at least one processor;

and a memory, the memory comprising: a first storage unit and a second storage unit;

the memory further stores computer-executable instructions;

the at least one processor executing the computer-executable instructions stored by the memory causes the at least one processor to perform the method of the first aspect, or the method of the third aspect.

In a tenth aspect, an embodiment of the present invention provides a gateway, including:

at least one processor;

and a memory storing computer execution instructions;

the at least one processor executing the computer-executable instructions stored by the memory causes the at least one processor to perform the method of the second aspect, or the method of the fourth aspect.

In an eleventh aspect, the present invention provides a computer-readable storage medium, in which computer-executable instructions are stored, and when a processor executes the computer-executable instructions, the method according to the first aspect or the method according to the third aspect is implemented.

In a twelfth aspect, the embodiment of the present invention provides a computer-readable storage medium, in which computer-executable instructions are stored, and when a processor executes the computer-executable instructions, the method according to the second aspect or the method according to the fourth aspect is implemented.

In a thirteenth aspect, an embodiment of the present invention provides a computer program product, which includes computer executable instructions, and when the computer executable instructions are executed by a processor, the method according to the first aspect or the method according to the third aspect is implemented.

In a fourteenth aspect, an embodiment of the present invention provides a computer program product, which includes computer executable instructions, and when the computer executable instructions are executed by a processor, the computer executable instructions implement the method according to the second aspect, or the method according to the fourth aspect.

According to the log data processing method, the device, the equipment and the storage medium provided by the embodiment of the invention, after a received log storage request is received through a gateway, routing information of a log to be stored is determined, the log to be stored carrying label information and timestamp information is sent to a target data node of a log storage system according to the routing information, after a plurality of logs to be stored sent by the gateway according to the routing information are received by the target data node, the plurality of logs to be stored are written into a first storage unit of the target data node according to the label information and the timestamp information carried by the log to be stored, an inverted index file is generated according to the label information of the plurality of logs to be stored, and the inverted index file is used for positioning the target log to be queried according to the inverted index file and the timestamp information in a log querying stage; then, transferring a plurality of ordered logs to be stored in the first storage unit to a second storage unit in a data block form, and performing persistent storage in the second storage unit; wherein each data block in the second storage unit is stored according to time sequence. The embodiment can reduce the processing amount in the log storage process, the processing amount increased in the log query process is within an acceptable range, and meanwhile, the log data storage and query performance can be ensured by sequencing the logs in a time sequence and constructing the inverted index based on the log label information, the cost in the storage and query processes is reduced, the storage space overhead is reduced, the processing efficiency is improved, and the method is suitable for the processing process of mass log data.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure.

Fig. 1a is a schematic view of an application scenario of a log data processing method in a log storage phase according to an embodiment of the present invention;

fig. 1b is a schematic view of an application scenario of the log data processing method in the log query phase according to the embodiment of the present invention;

fig. 2 is a flowchart of a log data processing method according to an embodiment of the present invention;

FIG. 3 is a flowchart of a log data processing method according to another embodiment of the present invention;

FIG. 4 is a flowchart of a log data processing method according to another embodiment of the present invention;

FIG. 5 is a flowchart of a log data processing method according to another embodiment of the present invention;

fig. 6 is a flowchart of a log data processing method according to another embodiment of the present invention;

fig. 7 is a block diagram of a log data processing apparatus according to an embodiment of the present invention;

fig. 8 is a block diagram of a log data processing apparatus according to another embodiment of the present invention;

fig. 9 is a block diagram of a log data processing apparatus according to another embodiment of the present invention;

fig. 10 is a block diagram of a log data processing apparatus according to another embodiment of the present invention;

fig. 11 is a structural diagram of a data node according to an embodiment of the present invention;

fig. 12 is a structural diagram of a gateway according to an embodiment of the present invention.

With the foregoing drawings in mind, certain embodiments of the disclosure have been shown and described in more detail below. These drawings and written description are not intended to limit the scope of the disclosed concepts in any way, but rather to illustrate the concepts of the disclosure to those skilled in the art by reference to specific embodiments.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.

In the prior art, the log is usually stored and queried in two ways, the first way is that the log content needs to be segmented when the log is stored, an index is established for a keyword, an index file is generated, and a target log is queried based on the keyword and the index file when the log is queried, such as an elastic search, an elk, solr, Xapia, a Sphinx, a bleve and other systems; in the second mode, when the logs are stored, the log content is not subjected to word segmentation, an index file is not generated, when the logs are inquired, the logs within a certain time period are inquired, and then substrings are searched from the logs through character strings to obtain target logs containing the character strings, such as a loki system of Grafana Labs company.

The first mode has the advantages that the query performance is good, but the cost is that a part of write performance is sacrificed, when the logs are stored, the log content needs to be segmented, and the index is established for the keyword, so that a large amount of computing resources are occupied, the system storage space is sacrificed, especially, the logs are generally collected uninterruptedly all day with few writing and reading, but the logs can be queried only when needed by business personnel occasionally, the storage cost of the logs is high, and a large amount of index files after the segmentation are generated, so that the disk space is wasted.

Although the second method avoids the defects of the first method, the second method needs to rely on a plurality of third-party remote storage systems, for example, a Querier component in a loki system needs to interact with a plurality of third-party remote storages, such as kv storage (key-value key value pair storage) and object storage, so that the query performance is greatly sacrificed, and the second method relies on an external third-party remote storage system, and meanwhile, the query performance is also poor.

In order to solve the above technical problems, an embodiment of the present invention provides a log data processing method, where after a gateway receives a log storage request, a route information of a log to be stored is determined, and then the log to be stored is sent to a target data node of a log storage system according to the route information, and after the target data node receives a plurality of logs to be stored sent by the gateway according to the route information, the target data node writes the plurality of logs to be stored into a first storage unit of the target data node according to a time sequence based on timestamp information carried by the logs to be stored, and generates an inverted index file according to tag information of the plurality of logs to be stored, where the inverted index file is used to locate a target log to be queried according to the inverted index file and the time information in a log querying stage; then, transferring a plurality of ordered logs to be stored in the first storage unit to a second storage unit in a data block form, and performing persistent storage in the second storage unit; wherein each data block in the second storage unit is stored according to time sequence. Through the storage process, the log content does not need to be subjected to word segmentation and a large number of index files are not required to be established in the storage process, only the inverted index file needs to be established based on the label of the log to be stored, so that the processing amount is greatly reduced, the storage space overhead is reduced, the target log can be inquired based on the inverted index file and the time information in the log inquiry stage, the inquiry performance of the log is also ensured, and the processing amount increased in the log inquiry process is within an acceptable range; in addition, the log storage and query of the embodiment of the invention do not need to depend on a third-party remote storage system. Therefore, the log data processing method provided by the embodiment of the invention can reduce the cost of the massive log data processing process, improve the processing efficiency and improve the performance of log data storage and query.

The log data processing method of the embodiment of the invention is applied to the application scenarios shown in fig. 1a and 1b, and the application scenarios comprise: the log storage system comprises any data node of the log storage system of a plurality of data nodes (datanodes), wherein each data node comprises a first storage unit and a second storage unit, the first storage unit can be a memory such as a memory, and the second storage unit can be a memory such as a disk.

In the log storage phase, as shown in fig. 1a, after receiving a log storage request sent by a client 101, a gateway 102 determines routing information of a log to be stored, and sends the log to be stored to a target data node of a log storage system 103 according to the routing information, and after receiving a plurality of logs to be stored sent by the gateway 102 according to the routing information, the target data node executes the log data processing method on the target data node side in the log storage phase in the embodiment of the present invention.

In the log query phase, as shown in fig. 1b, after receiving a log query request sent by the client 101, the gateway 102 determines the route information of the log to be queried, and sends the log query request to the corresponding target data node according to the route information, and after receiving the log query request sent by the gateway 102, the target data node executes the log data processing method on the target data node side in the log storage phase in the embodiment of the present invention, and after obtaining the target log, sends the target log to the gateway 102, and then the gateway 102 returns the target log to the client 101.

Further, the application scenario may further include a predetermined database cluster 104, which may be connected to each gateway 102 and each data node, where the predetermined database cluster 104 stores periodic registration information of each data node, and the periodic registration information includes routing information corresponding to each data node. Each data node may be periodically registered with the predetermined database cluster 104, and the gateway 102 may periodically obtain the routing information corresponding to each data node from the predetermined database cluster 104, or may obtain the routing information corresponding to each data node from the predetermined database cluster 104 when the routing information corresponding to each data node locally pre-stored by the gateway 102 does not exist in the routing information corresponding to the target data node.

The client 101 may be one or more clients, and the client 101 may be a client 101 developed in any computer language, for example, the client 101 may be a go language client 101, or a Netty framework based java language client 101, and so on.

A plurality of gateways 102 may be configured, and each gateway 102 may respond to a log storage request or a log query request sent by the client 101 in a load balancing manner, and is responsible for acquiring routing information, analyzing a query expression, and transmitting data with a data node; the gateways 102 are peer-to-peer and may be expanded as needed to increase computing power. In addition, the client 101 and the gateway 102 may communicate via a long TCP connection. In addition, the gateway 102 and log storage system 103 may be integrated, e.g., each data node is integrated with one gateway 102, which gateway 102 may communicate with other data nodes.

The following describes the technical solutions of the present invention and how to solve the above technical problems with specific embodiments. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments. Embodiments of the present invention will be described below with reference to the accompanying drawings.

Fig. 2 is a flowchart of a log data processing method according to an embodiment of the present invention. The embodiment provides a log data processing method, which is a method in a log storage stage, and the execution main body of the method is any data node of a log storage system comprising a plurality of data nodes, wherein each data node comprises a first storage unit and a second storage unit, and the log data processing method specifically comprises the following steps:

s201, receiving a plurality of logs to be stored sent by a gateway according to routing information, wherein the logs to be stored carry tag information and timestamp information; and the routing information is determined by the gateway according to the label information and the timestamp information of the log to be stored.

In this embodiment, when a network device, a system, a service program, and the like generate a log to be stored in an operation process, a log storage request may be sent to a gateway through a client, where the log storage request includes the log to be stored, and the log to be stored carries tag information and timestamp information, where the tag information includes, but is not limited to, a log source (e.g., which network device, system, and service program the log originates from), a log related object, a log related event, a log related location, some keywords in the log, and the like, and the tag information may be one of retrieval conditions in a subsequent log query process, for example, a certain log relates to a certain transaction between a company a and a company B, and a tag of the log may include a company a name, a company B name, a transaction progress, related personnel information, and the like; the gateway can determine the routing information of the log to be stored according to the label information of the log to be stored, and sends the log to be stored to the corresponding target data node according to the routing information. For example, different data nodes exist at different times, the different data nodes may correspond to different label information, and the gateway may determine the routing information of the target data node, such as an IP address, according to the timestamp information and the label information of the log to be stored; of course, the gateway also determines the routing information of the log to be stored according to the label information of the log to be stored by other methods, which may not be limited herein.

S202, writing the plurality of logs to be stored into the first storage unit according to the tag information and the timestamp information, and generating an inverted index file according to the tag information of the plurality of logs to be stored, wherein the inverted index file is used for positioning a target log to be queried according to the inverted index file and the timestamp information in a log querying stage.

In this embodiment, after receiving a log to be stored sent by a gateway, a data node stores the log to be stored in a first storage unit, and the logs in the first storage unit are sorted according to a time sequence, where the first storage unit may be a storage device such as a memory, and complete logs in the memory are sorted according to the time sequence, and in persistent storage in a second storage unit, the log to be stored can be written out of order.

In this embodiment, each log in the multiple logs to be stored included in the first storage unit has one or more pieces of tag information, and the same tag information may exist, or different tag information may exist, so that an inverted index file may be generated according to the tag information of each log to be stored in the first storage unit, that is, for a certain database, for example, a log with a tag a includes a log a and a log B, a log with a tag B includes a log B and a log c, and so on.

In this embodiment, for a plurality of logs to be stored in a first storage unit, an inverted index file is constructed only based on tag information of the plurality of logs, when log query is performed subsequently, although the plurality of logs to be stored sorted in the first storage unit are migrated to the second storage unit, the inverted index file is usually unchanged, and based on the inverted index file and time information, a target log to be queried can be quickly located, so that log processing cost is reduced, and storage and query performance are improved, for example, a query request includes tag information, time information, keywords, and the like of the logs to be queried, a target data block corresponding to the time information can be determined from the second storage unit according to the time information, and at least one candidate log matched with the tag information of the logs to be queried is queried in the inverted index file based on the target data block, if the number of the candidate logs is only one, the candidate logs can be directly determined to be the inquired target logs, if the number of the candidate logs is multiple, word string searching can be carried out on the candidate logs according to the keywords of the logs to be inquired, and therefore the candidate logs comprising the keywords can be searched and determined to be the inquired target logs.

Through the storage process, the log content does not need to be subjected to word segmentation and a large number of index files are not required to be established in the storage process, only the inverted index file needs to be established based on the label of the log to be stored, the processing amount is greatly reduced, the target log can be inquired based on the inverted index file and the time information in the log inquiry stage, and the inquiry performance of the log is also ensured; in addition, the log storage and query of the embodiment do not need to depend on a third-party remote storage system.

S203, migrating a plurality of ordered logs to be stored in the first storage unit to the second storage unit in a data block form; and storing each data block in the second storage unit according to a time sequence.

In this embodiment, after the first storage unit finishes sorting the logs to be stored, the logs to be stored sorted in the first storage unit may be migrated to the second storage unit in data blocks for persistent storage, where each data block may include multiple logs to be stored within a certain time interval, and each data block in the second storage unit is also stored according to a time sequence, that is, the multiple logs are stored in the second storage unit in data blocks, where each data block is sorted according to a time sequence, and the multiple logs in each data block are sorted according to tag information and a time sequence, for example, a data block may be divided into multiple sequences according to the tag information, each sequence may be sorted according to initials of the tag information, and the multiple logs included in each sequence may be sorted according to a time sequence. Optionally, the second storage unit may be a storage device such as a magnetic disk, and may perform persistent storage on log data.

According to the log data processing method provided by the embodiment, after a received log storage request is received through a gateway, routing information of a log to be stored, which carries tag information and timestamp information, is determined, the log to be stored is sent to a target data node of a log storage system according to the routing information, after the target data node receives a plurality of logs to be stored, which are sent by the gateway according to the routing information, the plurality of logs to be stored are written into a first storage unit of the target data node according to the tag information and the timestamp information carried by the log to be stored in a time sequence, an inverted index file is generated according to the tag information of the plurality of logs to be stored, and the inverted index file is used for positioning the target log to be queried according to the inverted index file and the timestamp information in a log querying stage; then, transferring a plurality of ordered logs to be stored in the first storage unit to a second storage unit in a data block form, and performing persistent storage in the second storage unit; wherein each data block in the second storage unit is stored according to time sequence. The embodiment can reduce the processing amount in the log storage process, the processing amount increased in the log query process is within an acceptable range, and meanwhile, the log data storage and query performance can be ensured by sequencing the logs in a time sequence and constructing the inverted index based on the log label information, the cost in the storage and query processes is reduced, the storage space overhead is reduced, the processing efficiency is improved, and the method is suitable for the processing process of mass log data.

On the basis of any of the above embodiments, the first storage unit is divided into different sequences according to the tag information, each sequence is divided into a plurality of levels of storage units according to time, and the storage units in the next level are obtained by performing finer time granularity division on the storage units in the previous level.

For example, the first storage unit may be a block, and the first storage unit is divided into a plurality of series according to the tag information, that is, different tag information may correspond to different series, each series is divided into a plurality of chunks taking 5 minutes as a unit, and each chunk may be internally divided into 1000 time windows taking 300 milliseconds as a unit, although the dividing manner of the first storage unit is not limited to the above example; further, optionally, the logs to be stored may be stored in an ordered linked list manner within the time window.

On the basis of the foregoing embodiment, the writing, in S202, the plurality of logs to be stored into the first storage unit according to the tag information and the timestamp information in a time sequence includes:

aiming at any log to be stored, determining a target sequence according to the label information of the log to be stored; rounding and remainder processing is carried out on the time stamp information of the log to be stored, a time interval of the time stamp in the minimum time granularity is determined, and a corresponding storage unit of the lowest level is determined according to the time interval; and storing the log to be stored in the storage unit of the lowest level of the target sequence through an ordered linked list.

In this embodiment, for any log to be stored, rounding and remainder operations are performed on the timestamp information of the log to be stored first, and the storage unit of each hierarchy where the log to be stored is located is determined until the storage unit of the lowest hierarchy is determined. For example, the log to be stored is determined which clients belongs to according to the tag information, rounding is performed in the determined clients according to the timestamp information of the log to be stored, which chunk is determined, and then which time interval of the chunk the timestamp falls in is determined, and the log to be stored is stored in the time interval.

On the basis of any of the foregoing embodiments, as shown in fig. 3, migrating the multiple ordered logs to be stored in the first storage unit to the second storage unit in a data block manner in S203 includes:

s301, dividing the multiple logs to be stored in the first storage unit into at least one data block to be migrated according to a preset time interval;

s302, judging whether a stored data block in the same time interval with the data block to be migrated exists in the second storage unit;

s303, if the data block to be migrated exists, merging the data block to be migrated with the stored data block in the same time interval, storing the merged data block in the second storage unit, and sequencing the merged data block with other stored data blocks according to a time sequence;

s304, if the data block to be migrated does not exist, storing the data block to be migrated in the second storage unit, and sequencing the data block to be migrated and other stored data blocks according to a time sequence.

In this embodiment, because the logs to be stored in the first storage unit are sorted according to time, the logs to be stored in the second storage unit are stored in the form of data blocks, wherein the data blocks are sorted according to time sequence, the logs in each data block are sorted according to tag information and time sequence, and because the logs can be written out of sequence, if the logs to be stored in the first storage unit are stored as one data block, the time interval of the data block can overlap with the time interval of the data block already stored in the second storage unit, for example, the time interval of the time stamp starting and ending of the logs to be stored in the first storage unit is 10: 50-12: 15, and the time interval of one data block already stored in the second storage unit is 10: 00-11: 00, if the logs to be stored in the first storage unit are stored as one data block, the time intervals of the two data blocks overlap, resulting in the data blocks in the second storage unit and the logs in the database no longer being in chronological order.

In order to avoid the above problem, in this embodiment, the multiple logs to be stored in the first storage unit are divided into at least one data block to be migrated according to a predetermined time interval, for example, each hour is a predetermined time interval, in the above example, the time interval of starting and stopping the timestamp of the log to be stored in the first storage unit is 10:50 to 12:15, the log to be stored in the first storage unit can be divided into three data blocks to be migrated, the time intervals are 10:50 to 11:00, 11:00 to 12:00, and 12:00 to 12:15, and then it is determined whether a stored data block in the second storage unit is in the same time interval as the data block to be migrated exists, for example, the time interval of one stored data block in the second storage unit is 10:00 to 11:00, the data block to be migrated having the time interval of 10:50 to 11:00 can be merged with the stored data block having the time interval of 10:00 to 11:00, the combined data block time interval is 10: 00-11: 00, the combined data block time interval is stored in a second storage unit, and the combined data block time interval and other stored data blocks are still sequenced according to time sequence, for example, the time interval of one other stored data block is 9: 00-10: 00, the data block with the combined time interval of 10: 00-11: 00 is arranged behind the stored data block with the time interval of 9: 00-10: 00; if the second storage unit does not have the stored data blocks in the same time interval as the data blocks to be migrated, the data blocks to be migrated are stored in the second storage unit and are sorted with other stored data blocks according to a time sequence, for example, the data blocks to be migrated with the time intervals of 10: 50-11: 00, 11: 00-12: 00 and 12: 00-12: 15 are sequentially arranged behind the stored data blocks with the time interval of 9: 00-10: 00.

It should be noted that, because the first storage unit is divided into storage units of multiple hierarchies according to time, where the storage unit of the next hierarchy is obtained by performing finer time granularity division on the storage unit of the previous hierarchy, the hierarchy in the first storage unit may also be reserved in the data block during migration, that is, the data block includes sub data blocks of multiple hierarchies, which correspond to the storage units of multiple hierarchies of the first storage unit, respectively; for example, a data block may be a block, which is divided into several chunks in units of 5 minutes, as compared with the above example of the first storage unit.

Further, in S202, when the inverted index file is generated according to the tag information of the multiple logs to be stored, the inverted index file may be specifically generated according to the tag information of the multiple logs to be stored included in the multiple levels of the storage unit in the first storage unit. For example, for a certain data block, the inverted index file may be generated for the label information of multiple logs to be stored in each series of the block, the inverted index file may also be generated for the label information of multiple logs to be stored in each chunk of each series, and the inverted index file may also be further generated for the label information of multiple logs to be stored in each time window, so as to facilitate quick positioning when querying the logs.

On the basis of the above embodiment, the data block on the second storage unit may include data of three aspects:

1) metadata information, which records the starting and ending time of the log data in the block;

2) the method comprises the steps that an inverted index file is used for quickly searching series meeting conditions with the help of an inverted index;

3) chunks data files, including multiple pieces of log data, may be compressed using a compression algorithm such as lz 4.

In the above embodiment, the merging the data block to be migrated and the stored data block in the same time interval in S303 may specifically include:

reading each log to be stored of the data block to be migrated and each stored log in the stored data block in the same time interval by adopting a cursor iterator; and reordering according to the time stamp information of the logs to obtain the merged data block.

In this embodiment, the vernier iterator sequentially traverses each log to be stored in the data block to be migrated and each stored log in the stored data block in the same time interval, so as to implement time sequence sequencing according to the timestamp information again, so that all logs in the merged data block are still sequenced according to the time sequence.

On the basis of any of the above embodiments, the specific time for S203 to migrate the ordered multiple logs to be stored in the first storage unit to the second storage unit in the form of data blocks may be as follows:

migrating a plurality of ordered logs to be stored in the first storage unit to the second storage unit in a data block mode at preset intervals; and/or

And if the storage space occupied by the sequenced logs to be stored in the first storage unit exceeds a preset threshold value, migrating the sequenced logs to be stored in the first storage unit to the second storage unit in a data block mode.

In this embodiment, the plurality of ordered logs to be stored in the first storage unit may be migrated to the second storage unit in the form of data blocks at predetermined intervals, for example, at intervals of 15 minutes; or when the storage space occupied by the sequenced logs to be stored in the first storage unit is found to exceed a preset threshold value, migrating the sequenced logs to be stored in the first storage unit to a second storage unit in a data block form; of course, it may also be determined at predetermined time intervals whether the storage space occupied by the sequenced multiple logs to be stored in the first storage unit exceeds a preset threshold, and if the storage space occupied by the sequenced multiple logs to be stored in the first storage unit exceeds the preset threshold, the sequenced multiple logs to be stored in the first storage unit are migrated to the second storage unit in the form of data blocks, otherwise, the first storage unit continues to perform S202.

Fig. 4 is a flowchart of a log data processing method according to an embodiment of the present invention. The embodiment provides a log data processing method, which is a method in a log storage stage, wherein an execution main body of the log data processing method is a gateway, and the log data processing method specifically comprises the following steps:

s401, receiving a log storage request, wherein the log storage request comprises a log to be stored, and the log to be stored carries tag information and timestamp information.

In this embodiment, when a network device, a system, a service program, and the like generate a log to be stored in an operation process, a log storage request may be sent to the gateway through the client, where the log storage request includes the log to be stored, and the log to be stored carries tag information and timestamp information.

S402, determining the routing information of the log to be stored according to the label information and the timestamp information of the log to be stored.

In this embodiment, the gateway may determine the routing information of the log to be stored according to the tag information of the log to be stored, and send the log to be stored to the corresponding target data node according to the routing information. For example, different target data nodes may correspond to different label information, and the gateway may determine the corresponding target data node according to the label information of the log to be stored, and then obtain the routing information of the target data node, such as an IP address, a port, and the like; of course, the gateway also determines the routing information of the log to be stored according to the label information of the log to be stored by other methods, which may not be limited herein.

And S403, sending the log to be stored to a corresponding target data node according to the routing information.

In this embodiment, after determining the routing information of the log to be stored, the gateway may send the log to be stored to the corresponding target data node according to the routing information, so that the target data node executes the method on the data node side, that is, writes a plurality of logs to be stored into the first storage unit according to the timestamp information in a time sequence; migrating a plurality of ordered logs to be stored in a first storage unit to a second storage unit in a data block form; wherein each data block in the second storage unit is stored according to time sequence; and generating an inverted index file according to the label information of a plurality of logs to be stored in the data block, so as to be used for positioning the target log to be queried according to the inverted index file and the time information in the log querying stage. For specific processes and technical effects, reference may be made to the above embodiments, which are not described herein again.

In addition, it should be noted that there may be a plurality of gateways, and the gateways may respond to the log storage request sent by the client in a load balancing manner, determine the routing information of the log to be stored, and send the log to be stored to the target data node. The gateways are peer-to-peer and can be expanded when the computing capacity needs to be increased.

On the basis of the foregoing embodiment, the determining, by the step S402, the routing information of the log to be stored according to the tag information and the timestamp information of the log to be stored includes:

acquiring a data node list at the corresponding moment of the timestamp information according to the timestamp information, wherein the data node list comprises routing information of a plurality of data nodes;

obtaining a hash value according to the label information of the log to be stored; and performing modulus on the number of the data nodes in the data node list by using the hash value, and determining the routing information of the log to be stored according to a modulus result and the data node list.

In this embodiment, since different data nodes may be available at different times, that is, the data node list may be different at different times, the data node list at the time corresponding to the timestamp information may be determined according to the timestamp information, the data node list may include all available data nodes and corresponding routing information, such as IP addresses and/or port information, further, a hash value may be obtained from the tag information of the log to be stored, the hash value is modulo the number of data nodes in the data node list, that is, the hash value is divided by the number of data nodes in the data node list and is complementary, a few data nodes are determined from the data node list as target data nodes based on a modulo result (that is, a remainder), and then routing information corresponding to the target data nodes, such as IP addresses, port addresses, and port addresses of the target data nodes may be obtained, Port information, etc.

After determining the routing information of the log to be stored, the method may further include:

determining a target data node to which the log to be stored is to be stored according to the routing information of the log to be stored;

acquiring the target data node information from each data node information pre-stored locally by the gateway; or

If the target data node information does not exist in the data node information locally prestored in the gateway, the target data node information is obtained from a preset database cluster, wherein the preset database cluster stores periodic registration information of the data nodes, and the periodic registration information comprises the data node information.

In this embodiment, in order to improve the response performance to the log storage request, the gateway may locally pre-store each data node information, and after determining the target data node, the gateway may search for the target data node information from each locally pre-stored data node information, where each locally pre-stored data node information of the gateway may be obtained from a predetermined database cluster, and optionally, the gateway may periodically obtain from the predetermined database cluster, or obtain from the predetermined database cluster when the target data node information does not exist in each locally pre-stored data node information of the gateway.

The predetermined database cluster may be connected to each gateway and each data node, and the predetermined database cluster stores therein periodic registration information of each data node, where the periodic registration information includes information of each data node, that is, each data node is connected to the predetermined database cluster, and periodically registers with the predetermined database cluster, and registers its own state information to the predetermined database cluster.

The predetermined database cluster may be an ect cluster, which has a multi-copy characteristic, and the reliability of the routing information may be ensured, and of course, the predetermined database cluster may also be another database cluster, which is not illustrated herein.

On the basis of the embodiment, a plurality of threads can be pre-established between the gateway and any data node, so that a thread pool is formed, a connecting channel (pipeline) exists between each thread and the data node, the plurality of threads comprise a data writing thread and a data reading thread, the gateway comprises a data writing queue and a data reading queue, the data writing queue can select an idle data writing thread from the thread pool, and the to-be-stored log is sent to the corresponding target data node through the data writing thread according to the routing information; and the read data queue can select an idle read data thread from the thread pool, and send the log query request to the corresponding target data node according to the routing information through the read data thread.

In order to ensure that the gateway can effectively process the log storage request with high throughput, after the gateway processes the log storage request (such as a socket request) and sends the log storage request, the next request can be immediately processed without waiting for the completion of the response, so that the log storage request can be consumed quickly; the data writing thread can continuously process the log storage request without being interfered by other factors, and the response message can be returned to the client through the data reading thread; in addition, because writing is more and less, the configurable writing data thread is more than the reading data thread, thereby avoiding the waste of computing resources and ensuring the reasonable utilization of resources.

Fig. 5 is a flowchart of a log data processing method according to an embodiment of the present invention. The present embodiment provides a log data processing method, which is a method of a log query stage corresponding to the foregoing embodiment, and an execution subject of the method is any data node of a log storage system including a plurality of data nodes, where the data node includes a second storage unit, and a plurality of logs are stored in the second storage unit in a data block form, where each data block is sorted according to a time sequence, and the plurality of logs in each data block are sorted according to tag information and the time sequence, where the log data processing method specifically includes the following steps:

s501, receiving a log query request sent by a gateway according to the routing information, wherein the log query request comprises tag information, keywords and time information of a log to be queried.

In this embodiment, when the log needs to be queried, the client may send a log query request to the gateway, where the log query request may include tag information, a keyword, and time information of the log to be queried, and of course, the information may be sent to the gateway in a query expression manner, and the gateway analyzes the query expression to obtain the tag information, the keyword, and the time information of the log to be queried. The keyword is a keyword of the log content, and the time information may be a time interval. Further, the gateway can determine the routing information of the log to be queried according to the label information of the log to be queried, and send the log query request to the corresponding target data node according to the routing information. For example, different target data nodes may correspond to different label information, and the gateway may determine a corresponding target data node according to the label information of the log to be queried, and then obtain routing information of the target data node, such as an IP address; of course, the gateway also determines the routing information of the log to be stored according to the label information of the log to be queried by other methods, which may not be limited herein.

S502, determining a corresponding target data block from the second storage unit according to the time information of the log to be queried.

In this embodiment, because the data blocks in the second storage unit are sorted according to the time sequence, and the multiple logs in each data block are sorted according to the tag information and the time sequence, each data block corresponds to a time interval, that is, the start and end times of the multiple logs included in each data block, and therefore, a corresponding target data block including the time information or having an intersection with the time information can be determined from the second storage unit according to the time information of the log to be queried. For example, the time interval corresponding to the data block 1 in the second storage unit is 10: 00-11: 00, the time interval corresponding to the data block 2 is 11: 00-12: 00, if the time information of the log to be queried is 10: 50-11: 00, the data block 1 is determined to be the target data block, and if the time information of the log to be queried is 10: 50-11: 10, the data block 1 and the data block 2 are determined to be the target data block.

Further, if the data block includes multiple hierarchical sub-data blocks, for example, a data block may be a block, which is divided into a plurality of chunks with 5 minutes as a unit, then the sub-data block with finer granularity may be determined according to the time information of the log to be queried, for example, the sub-data block may be located to one or some chunks according to the time information of the log to be queried.

S503, searching a target log according to the label information and the key words of the log to be queried and the inverted index file of the target data block; the inverted index file is generated in advance according to label information of a plurality of logs included in the target data block.

In this embodiment, after the target data block is determined, the inverted index file of the target data block may be further obtained, and then the target log is queried based on the tag information and the keyword of the log to be queried and the inverted index file of the target data block. The reverse index file is created in advance, and the specific creation process can be referred to the above embodiment.

S504, the target log is sent to the gateway.

In this embodiment, after acquiring the target log, the data node may send the target log to the gateway, so that the gateway sends the target log to the client, so as to respond to the log query request of the client.

According to the method and the device, the target log can be queried quickly and efficiently through the query process, the log query performance is guaranteed, the cost of the query process is reduced, the processing efficiency is improved, a third-party remote storage system is not required to be relied on, and the method and the device are suitable for the processing process of massive logs.

On the basis of the foregoing embodiment, in S503, searching for a target log according to the tag information and the keyword of the log to be queried and the inverted index file of the target data block may specifically include:

s5031, searching at least one candidate log matched with the label information of the log to be queried according to the inverted index file of the target data block;

s5032, retrieving the keyword of the log to be queried in the at least one candidate log, searching for the candidate log including the keyword, and determining the candidate log as the target log.

In this embodiment, since the inverted index file is generated in advance according to the tag information of the logs included in the target data block, and the tag information of the logs may be the same, a candidate log including the tag of the log to be queried may be retrieved according to the inverted index file in the range of the target data block, and if there are multiple candidate logs, the candidate log may be further retrieved according to the keyword of the log to be queried, for example, full-text retrieval may be performed on the candidate log according to the keyword of the log to be queried, so that the candidate log including the keyword is found, and the target log is determined to be queried; if only one candidate log is available, the candidate log can be directly determined to be the queried target log, and certainly, full-text retrieval can be further performed according to the keyword of the log to be queried so as to verify whether the target log really contains the keyword.

It should be noted that, since the first storage unit is not involved in the log query phase, the data node is not limited to include the first storage unit in this embodiment.

Fig. 6 is a flowchart of a log data processing method according to an embodiment of the present invention. The present embodiment provides a log data processing method, which is a method in a log storage phase corresponding to the foregoing embodiment, and an execution subject of the method is a gateway, and the log data processing method specifically includes the following steps:

s601, receiving a log query request, wherein the log query request comprises tag information, keywords and time information of a log to be queried.

In this embodiment, when the log needs to be queried, a log query request may be sent to the gateway through the client, where the log query request may include tag information, a keyword, and time information of the log to be queried, and of course, the information may be sent to the gateway in a query expression manner, and the gateway analyzes the query expression to obtain the tag information, the keyword, and the time information of the log to be queried. The keyword is a keyword of the log content, and the time information may be a time interval.

S602, determining the routing information of the log to be queried according to the label information of the log to be queried.

In this embodiment, the gateway may determine the routing information of the log to be queried according to the tag information of the log to be queried, and send the log query request to the corresponding target data node according to the routing information. For example, different target data nodes may correspond to different label information, and the gateway may determine a corresponding target data node according to the label information of the log to be queried, and then obtain routing information of the target data node, such as an IP address; of course, the gateway also determines the routing information of the log to be stored according to the label information of the log to be queried by other methods, which may not be limited herein.

S603, the log query request is sent to a corresponding target data node according to the routing information.

In this embodiment, after determining the routing information of the log to be stored, the gateway may send the log query request to the corresponding target data node according to the routing information, so that the target data node executes the method on the data node side, that is, determines the corresponding target data block from the second storage unit according to the time information of the log to be queried; searching a target log according to the label information and the key words of the log to be queried and the inverted index file of the target data block; the inverted index file is generated in advance according to label information of a plurality of logs included in the target data block; and sending the target log to the gateway. For specific processes and technical effects, reference may be made to the above embodiments, which are not described herein again.

On the basis of the above embodiment, the determining, by the S602, the routing information of the log to be queried according to the label information of the log to be queried includes:

judging whether target label information exists in the label information of the log to be inquired;

if the target label information exists in the label information of the log to be queried, determining that a data node corresponding to the target label information is determined as a target data node, acquiring routing information corresponding to the target data node, and determining the routing information as the routing information of the log to be queried;

if the target label information does not exist in the label information of the log to be queried, determining all the data nodes as target data nodes, acquiring the routing information corresponding to each target data node, and determining the routing information as the routing information of the log to be queried.

In this embodiment, because some data nodes correspond to specific target tag information, for example, a certain data node X stores log data of a specific service program X, the data node X corresponds to a name tag of the specific service program X, when the name tag of the specific service program X exists in the tag information of a log to be queried, the data node X is directly determined to be a target data node, and routing information corresponding to the target data node is determined to be the routing information of the log to be queried; if the target label information does not exist in the label information of the log to be queried, all the data nodes are determined as target data nodes, namely the log to be queried is queried from all the data nodes.

Certainly, in other optional embodiments, the gateway determines the routing information of the log to be queried, and may also obtain the hash value according to the label information of the log to be queried; and determining a target data node corresponding to the log to be queried according to the hash value, acquiring routing information corresponding to the target data node, and determining the routing information as the routing information of the log to be queried.

On the basis of any embodiment, a plurality of threads can be pre-established between the gateway and any data node, so that a thread pool is formed, a connection channel (pipeline) exists between each thread and the data node, the plurality of threads comprise a data writing thread and a data reading thread, the gateway comprises a data writing queue and a data reading queue, the data writing queue can select an idle data writing thread from the thread pool, and the data writing thread sends the log to be stored to the corresponding target data node according to the routing information; and the read data queue can select an idle read data thread from the thread pool, and send the log query request to the corresponding target data node according to the routing information through the read data thread.

In addition, because writing and reading are less, the data writing thread can be configured to be more than the data reading thread, so that the waste of computing resources is avoided, and the reasonable utilization of the resources is ensured.

Fig. 7 is a block diagram of a log data processing apparatus according to an embodiment of the present invention. The log data processing apparatus provided in this embodiment may perform the processing procedure provided by the method for processing log data at a data node side in a log storage phase, and is applied to any data node of a log storage system including a plurality of data nodes, where the data node includes a first storage unit and a second storage unit, as shown in fig. 6, the log data processing apparatus 700 includes a receiving module 701, a first storage module 702, a second storage module 703, and an indexing module 704.

A receiving module 701, configured to receive multiple logs to be stored, which are sent by a gateway according to routing information, where the logs to be stored carry tag information and timestamp information; the routing information is determined by the gateway according to the label information and the timestamp information of the log to be stored;

a first storage module 702, configured to write the multiple logs to be stored into the first storage unit according to the tag information and the timestamp information in a time sequence;

an index module 704, configured to generate an inverted index file according to the tag information of the multiple logs to be stored, where the inverted index file is used to locate a target log to be queried according to the inverted index file and time information in a log querying stage;

a second storage module 703, configured to migrate the multiple ordered logs to be stored in the first storage unit to the second storage unit in the form of data blocks; and storing each data block in the second storage unit according to a time sequence.

On the basis of any one of the above embodiments, the first storage unit is divided into a plurality of levels of storage units according to time, wherein the next level of storage unit is obtained by performing finer time granularity division on the previous level of storage unit;

the first storage module 702, when writing the plurality of logs to be stored into the first storage unit according to the tag information and the timestamp information, is configured to:

aiming at any log to be stored, determining a target sequence according to the label information of the log to be stored;

rounding and remainder processing is carried out on the time stamp information of the log to be stored, a time interval of the time stamp in the minimum time granularity is determined, and a corresponding storage unit of the lowest level is determined according to the time interval;

and storing the log to be stored in the storage unit of the lowest level of the target sequence through an ordered linked list.

On the basis of any of the above embodiments, when migrating the plurality of ordered logs to be stored in the first storage unit to the second storage unit in a data block manner, the second storage module 703 is configured to:

dividing the plurality of logs to be stored which are sequenced in the first storage unit into at least one data block to be migrated according to a preset time interval;

judging whether a stored data block in the same time interval with the data block to be migrated exists in the second storage unit;

if the data blocks exist, the data blocks to be migrated are merged with the stored data blocks in the same time interval, the merged data blocks are stored in the second storage unit, and the merged data blocks and other stored data blocks are sequenced according to time sequence;

and if the data blocks do not exist, storing the data blocks to be migrated in the second storage unit, and sequencing the data blocks to be migrated and other stored data blocks according to a time sequence.

On the basis of any of the foregoing embodiments, when merging the data block to be migrated and the stored data block in the same time interval, the second storage module 703 is configured to:

reading each log to be stored of the data block to be migrated and each stored log in the stored data block in the same time interval by adopting a cursor iterator;

and reordering according to the time stamp information of the logs to obtain the merged data block.

On the basis of any one of the above embodiments, the data block includes multiple levels of sub data blocks, which correspond to multiple levels of storage units of the first storage unit, respectively;

when the index module 704 generates an inverted index file according to the tag information of the multiple logs to be stored, the index module is configured to:

and generating an inverted index file according to the label information of a plurality of logs to be stored in the storage units of a plurality of hierarchies in the first storage unit.

The log data processing apparatus provided in the embodiment of the present invention may be specifically configured to execute the log data processing method embodiment provided in fig. 2 to 3, and specific functions are not described herein again.

Fig. 8 is a block diagram of a log data processing apparatus according to an embodiment of the present invention. The log data processing apparatus provided in this embodiment may execute the processing flow provided by the gateway-side log data processing method embodiment in the log storage phase, as shown in fig. 8, the log data processing apparatus 800 includes a receiving module 801, a routing module 802, and a sending module 803.

A receiving module 801, configured to receive a log storage request, where the log storage request includes a log to be stored, and the log to be stored carries tag information and timestamp information;

the routing module 802 is configured to determine routing information of the log to be stored according to the tag information and the timestamp information of the log to be stored;

a sending module 803, configured to send the log to be stored to a corresponding target data node according to the routing information.

On the basis of any of the above embodiments, when determining the routing information of the log to be stored according to the tag information and the timestamp information of the log to be stored, the routing module 802 is configured to:

obtaining a hash value according to the label information of the log to be stored;

and performing modulus on the number of the data nodes in the data node list by using the hash value, and determining the routing information of the log to be stored according to a modulus result and the data node list.

On the basis of any of the above embodiments, after determining the routing information of the log to be stored, the routing module 802 is further configured to:

On the basis of any of the foregoing embodiments, when sending the log to be stored to a corresponding target data node according to the routing information, the sending module 803 is configured to:

and selecting an idle data writing thread from a thread pool, and sending the log to be stored to a corresponding target data node according to the routing information through the data writing thread.

The log data processing apparatus provided in the embodiment of the present invention may be specifically configured to execute the embodiment of the log data processing method provided in fig. 4, and specific functions are not described herein again.

Fig. 9 is a block diagram of a log data processing apparatus according to an embodiment of the present invention. The processing flow provided by the embodiment of the method for processing log data at a data node side in the log query phase of the log data processing apparatus provided in this embodiment is applied to any data node of a log storage system including a plurality of data nodes, where the data node includes a second storage unit, and a plurality of logs are stored in the second storage unit in the form of data blocks, where each data block is sorted according to a time sequence, and a plurality of logs in each data block are sorted according to tag information and a time sequence, as shown in fig. 9, the log data processing apparatus 900 includes a receiving module 901, a querying module 902, and a sending module 903.

A receiving module 901, configured to receive a log query request sent by a gateway according to routing information, where the log query request includes tag information, a keyword, and time information of a log to be queried;

a query module 902, configured to determine a corresponding target data block from the second storage unit according to the time information of the log to be queried; searching a target log according to the label information and the key words of the log to be inquired and the inverted index file of the target data block; the inverted index file is generated in advance according to label information of a plurality of logs included in the target data block;

a sending module 903, configured to send the target log to the gateway.

On the basis of any of the above embodiments, when searching for a target log according to the tag information and the keyword of the log to be queried and the inverted index file of the target data block, the querying module 902 is configured to:

searching at least one candidate log matched with the label information of the log to be inquired according to the inverted index file of the target data block;

and searching the keywords of the log to be queried in the at least one candidate log, searching the candidate log comprising the keywords, and determining the candidate log as the target log.

The log data processing apparatus provided in the embodiment of the present invention may be specifically configured to execute the embodiment of the log data processing method provided in fig. 5, and specific functions are not described herein again.

Fig. 10 is a block diagram of a log data processing apparatus according to an embodiment of the present invention. The log data processing apparatus provided in this embodiment may execute the processing flow provided by the gateway-side log data processing method embodiment at the log query stage, as shown in fig. 10, the log data processing apparatus 1000 includes a receiving module 1001, a routing module 1002, and a sending module 1003.

A receiving module 1001, configured to receive a log query request, where the log query request includes tag information, a keyword, and time information of a log to be queried;

the routing module 1002 is configured to determine routing information of the log to be queried according to the tag information of the log to be queried;

a sending module 1003, configured to send the log query request to a corresponding target data node according to the routing information.

On the basis of any of the foregoing embodiments, when sending the log query request to a corresponding target data node according to the routing information, the sending module 1003 is configured to:

and selecting an idle read data thread from a thread pool, and sending the log query request to a corresponding target data node according to the routing information through the read data thread.

The log data processing apparatus provided in the embodiment of the present invention may be specifically configured to execute the embodiment of the log data processing method provided in fig. 6, and specific functions are not described herein again.

Fig. 11 is a schematic structural diagram of a data node according to an embodiment of the present invention. The data node provided in the embodiment of the present invention may execute the processing flow provided in the log data processing method embodiment at the data node side, including the processing flow at the log storage stage and/or the log query stage, and the data node may be a database, a server, and other computer devices, as shown in fig. 11, the data node 1100 includes at least one memory 1101 and a processor 1102, the memory 1101 may include a first storage unit 11011 and a second storage unit 11012, and the memory 1102 may also be used to store a computer execution instruction, where the computer execution instruction may be stored in the first storage unit and/or the second storage unit, and may also be stored in other storage units, of course; the computer program is stored in the memory 1101 and configured to execute the log data processing flow of the log storage phase and/or the log query phase on the data node side in the above embodiments by the processor 1102. In addition, the data node 1100 may also have a communication interface 1103 for receiving control instructions.

The data node in the embodiment shown in fig. 11 may be used to execute the technical solution of the log data processing flow in the log storage stage and/or the log query stage of the data node side in the above embodiments, and the implementation principle and the technical effect are similar, and are not described here again.

In addition, the present embodiment also provides a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to implement the log data processing method in the log storage phase and/or the log query phase of the data node side described in the foregoing embodiment.

Fig. 12 is a schematic structural diagram of a gateway according to an embodiment of the present invention. The gateway provided in the embodiment of the present invention may execute the processing flow provided in the embodiment of the method for processing log data at the gateway side, including the processing flow at the log storage stage and/or the log query stage, as shown in fig. 12, the gateway 1200 includes a memory 1201 and a processor 1202; the memory 1201 stores therein a computer program, and is configured to execute, by the processor 1202, the log data processing method of the log storage phase and/or the log query phase on the gateway side described in the above embodiments. In addition, the electronic device 1200 may further have a communication interface 1203 for receiving a control instruction.

The electronic device in the embodiment shown in fig. 12 may be used to execute the technical solution of the embodiment of the log data processing method in the log storage stage and/or the log query stage at the gateway side, and the implementation principle and the technical effect are similar, which are not described herein again.

In addition, the present embodiment also provides a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to implement the log data processing method in the log storage phase and/or the log query phase of the gateway side described in the above embodiments.

In addition, an embodiment of the present invention provides a computer program product, which includes a computer executable instruction, and when the computer executable instruction is executed by a processor, the method for processing log data in a log storage phase and/or a log query phase on a data node side in the foregoing embodiment is implemented.

In addition, an embodiment of the present invention provides a computer program product, which includes a computer executable instruction, and when the computer executable instruction is executed by a processor, the method for processing log data in a log storage phase and/or a log query phase on a gateway side in the foregoing embodiment is implemented.

In the embodiments provided in the present invention, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.

The integrated unit implemented in the form of a software functional unit may be stored in a computer readable storage medium. The software functional unit is stored in a storage medium and includes several instructions to enable a computer device (which may be a personal computer, a server, or a network device) or a processor (processor) to execute some steps of the methods described in the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

It is obvious to those skilled in the art that, for convenience and simplicity of description, the foregoing division of the functional modules is merely used as an example, and in practical applications, the above function distribution may be performed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules to perform all or part of the above described functions. For the specific working process of the device described above, reference may be made to the corresponding process in the foregoing method embodiment, which is not described herein again.

The above embodiments are only used for illustrating the technical solutions of the embodiments of the present invention, and are not limited thereto; although embodiments of the present invention have been described in detail with reference to the foregoing embodiments, those skilled in the art will understand that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. A log data processing method is applied to any data node of a log storage system comprising a plurality of data nodes, wherein each data node comprises a first storage unit and a second storage unit, and the method comprises the following steps:

2. The method of claim 1, wherein the first storage units are divided into different sequences according to the tag information, each sequence is divided into a plurality of levels of storage units according to time, and the next level of storage units are obtained by performing finer time granularity division on the previous level of storage units;

the writing the plurality of logs to be stored into the first storage unit according to the tag information and the timestamp information and according to a time sequence comprises:

3. The method of claim 2, wherein the migrating the ordered plurality of logs to be stored in the first storage unit to the second storage unit in data blocks comprises:

4. The method of claim 3, wherein merging the data chunk to be migrated with the stored data chunk in the same time interval comprises:

5. The method according to any one of claims 1 to 4, wherein the migrating the ordered plurality of logs to be stored in the first storage unit to the second storage unit in a data block form comprises:

6. The method according to any one of claims 2 to 4, wherein the data block comprises a plurality of levels of sub data blocks, respectively corresponding to a plurality of levels of storage units of the first storage unit; and/or

The generating of the inverted index file according to the label information of the plurality of logs to be stored includes:

7. A log data processing method is applied to a gateway, and comprises the following steps:

8. The method according to claim 7, wherein the determining the routing information of the log to be stored according to the tag information and the timestamp information of the log to be stored comprises:

9. The method of claim 8, wherein after determining the routing information of the log to be stored, the method further comprises:

10. The method according to any one of claims 7 to 9, wherein the sending the log to be stored to the corresponding target data node according to the routing information comprises:

11. A log data processing method is applied to any data node of a log storage system comprising a plurality of data nodes, wherein each data node comprises a second storage unit, a plurality of logs are stored in the second storage unit in the form of data blocks, each data block is sorted according to a time sequence, and the plurality of logs in each data block are sorted according to tag information and the time sequence, and the method comprises the following steps:

and sending the target log to the gateway.

12. The method of claim 11, wherein the searching for the target log according to the tag information and the keyword of the log to be queried and the inverted index file of the target data block comprises:

13. A log data processing method is applied to a gateway, and comprises the following steps:

14. The method of claim 13, wherein sending the log query request to the corresponding target data node according to the routing information comprises:

15. A log data processing device applied to any data node of a log storage system comprising a plurality of data nodes, wherein each data node comprises a first storage unit and a second storage unit, the device comprises:

16. A log data processing apparatus applied to a gateway, the apparatus comprising:

17. A log data processing apparatus, applied to any data node of a log storage system including a plurality of data nodes, the data node including a second storage unit, the second storage unit storing a plurality of logs in data blocks, wherein the data blocks are sorted according to a time sequence, and the plurality of logs in each data block are sorted according to tag information and the time sequence, the apparatus comprising:

and the sending module is used for sending the target log to the gateway.

18. A log data processing apparatus applied to a gateway, the apparatus comprising:

19. A data node, comprising:

at least one processor;

the memory further stores computer-executable instructions;

execution of computer-executable instructions stored by the memory by the at least one processor causes the at least one processor to perform the method of any one of claims 1-6, or the method of claim 11 or 12.

20. A gateway, comprising:

at least one processor;

and a memory storing computer execution instructions;

execution of computer-executable instructions stored by the memory by the at least one processor causes the at least one processor to perform the method of any one of claims 7-10, or the method of any one of claims 13 or 14.

21. A computer-readable storage medium having computer-executable instructions stored thereon which, when executed by a processor, implement the method of any one of claims 1-6, or the method of claim 11 or 12.

22. A computer-readable storage medium having computer-executable instructions stored thereon which, when executed by a processor, implement the method of any one of claims 7-10 or the method of any one of claims 13 or 14.