CN112199463A

CN112199463A - Data query method, device and equipment

Info

Publication number: CN112199463A
Application number: CN202011131867.2A
Authority: CN
Inventors: 韩敏
Original assignee: New H3C Security Technologies Co Ltd
Current assignee: New H3C Security Technologies Co Ltd
Priority date: 2020-10-21
Filing date: 2020-10-21
Publication date: 2021-01-08

Abstract

The application provides a data query method, a data query device and data query equipment, wherein the method comprises the following steps: after receiving a query request, analyzing keywords from the query request; determining a query condition type matched with the keyword based on the query request; the query condition type is a reverse query type or a positive query type; if the query condition type is an inverted query type, acquiring a first data structure matched with the keywords by querying an inverted index table in a fusion index table through the keywords; and if the query condition type is a positive query type, querying a positive index table in the fusion index table through the keyword to obtain a second data structure matched with the keyword. According to the technical scheme, massive log data can be queried, query response can be returned quickly, the reverse index table and the forward index table are fused, data query of different query condition types is achieved, and query speed is improved.

Description

Data query method, device and equipment

Technical Field

The present application relates to the field of communications technologies, and in particular, to a data query method, apparatus, and device.

Background

The log data is the record data for recording the process events generated by the IT system, the IT system can continuously generate a large amount of log data in the process of guaranteeing the operation of the service, and by checking the log data, the user, the time, the device or the application system can be known, and the specific operation can be carried out.

The log data may be sourced from servers, storage devices, network devices, operating systems, middleware, databases, business systems, and the like. The log data can be divided into status logs and application system logs, and the status logs include CPU (Central Processing Unit) use status, memory use status, temperature information, disk capacity, flow information, behavior analysis information, and the like. The application system log includes log data of an operating system, log data of a database, log data of middleware, and the like. If the log data is classified according to the log format, the log data may be classified into text-type log data, system-type log data, SNMP (Simple Network Management Protocol) type log data, and database-type log data.

In more and more security analysis scenes, log data needs to be queried and analyzed, but the data volume of the log data shows the increase of geometric progression, TB-level and even PB-level log data may be generated every day, and how to query mass log data does not exist an effective implementation scheme at present.

Disclosure of Invention

The application provides a data query method, which comprises the following steps:

after receiving a query request, analyzing keywords from the query request;

determining a query condition type matched with the keyword based on the query request; the query condition type is a reverse query type or a positive query type;

if the query condition type is an inverted query type, acquiring a first data structure matched with the keywords by querying an inverted index table in a fusion index table through the keywords;

and if the query condition type is a positive query type, querying a positive index table in the fusion index table through the keyword to obtain a second data structure matched with the keyword.

In a possible implementation manner, if the keywords include a first keyword and a second keyword, the query condition type matched with the first keyword is a reverse query type, and the query condition type matched with the second keyword is a forward query type, the method further includes:

and querying a reverse index table in the fusion index table through the first keyword to obtain a first data structure matched with the first keyword, and querying a forward index table in the fusion index table through the second keyword to obtain a second data structure matched with the second keyword.

Illustratively, the determining a query condition type matching the keyword based on the query request includes: if the query request is used for realizing full-text query, real-time data query, grouping aggregation query or word segmentation summary query, determining the query condition type as an inverted query type; or the like, or, alternatively,

and if the query request is used for realizing multi-field query, or long-time query, or large-flow query, or complex table connection query, or paging query, determining the query condition type as a positive query type.

Illustratively, the querying the inverted index table in the fused index table by the keyword to obtain the first data structure matching the keyword further includes: querying a full index table through the global identification to obtain a full index matched with the keyword;

the fused index table further includes a global identifier, and after the forward index table in the fused index table is queried through the keyword to obtain a second data structure matched with the keyword, the method further includes: and querying a full index table through the global identification to obtain a full index matched with the keyword.

Illustratively, the method further comprises: acquiring log data;

determining inverted query participles and a first data structure corresponding to the inverted query participles based on the log data, and recording a mapping relation between the inverted query participles and the first data structure in an inverted index table;

determining a query participle and a second data structure corresponding to the query participle based on the log data, and recording a mapping relation between the query participle and the second data structure in a query table;

and fusing the reverse index table and the forward index table to obtain a fused index table.

Illustratively, after the log data is obtained, the method further comprises: distributing a global identification for the log data, wherein the global identification has uniqueness, and the fusion index table comprises the global identification;

determining a full index based on the log data, wherein the full index comprises inverted query participles and positive query participles, and recording the mapping relation between the global identification and the full index in a full index table.

Illustratively, the determining inverted query tokens based on the log data and a first data structure corresponding to the inverted query tokens includes:

determining a segmentation type of the inverted query segmentation and a data type of the first data structure;

determining inverted query participles corresponding to the participle types based on the log data;

a first data structure corresponding to the data type is determined based on the log data.

The application provides a data inquiry device, the device includes:

the analysis module is used for analyzing the keywords from the query request after receiving the query request;

a determining module, configured to determine a query condition type matching the keyword based on the query request; the query condition type is a reverse query type or a positive query type;

the query module is used for querying an inverted index table in a fusion index table through the keywords to obtain a first data structure matched with the keywords if the query condition type is the inverted query type; and if the query condition type is a positive query type, querying a positive index table in the fusion index table through the keyword to obtain a second data structure matched with the keyword.

The present application provides a data query device, comprising: a processor and a machine-readable storage medium storing machine-executable instructions executable by the processor;

the processor is configured to execute machine executable instructions to perform the steps of:

after receiving a query request, analyzing keywords from the query request;

The application provides a machine-readable storage medium, on which computer instructions are stored, and when the computer instructions are executed by a processor, the data query method is realized.

Based on the technical scheme, in the embodiment of the application, in a security analysis scene, a fusion index table can be established, and the fusion index table comprises a reverse index table and a forward index table, after a query request is received, if the query condition type is a reverse query type, the reverse index table in the fusion index table can be queried to obtain a first data structure, if the query condition type is a forward query type, the forward index table in the fusion index table can be queried to obtain a second data structure, the method can flexibly customize the fusion index table according to a service scene to realize query of mass log data and quickly return query response, and data query of different query condition types is realized by fusing the reverse index table and the forward index table, so that query speed is improved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments of the present application or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments described in the present application, and other drawings can be obtained by those skilled in the art according to the drawings of the embodiments of the present application.

FIG. 1 is a diagram illustrating the building of a converged index table according to one embodiment of the present application;

FIG. 2A is a diagram of a participle type and a data type in one embodiment of the present application;

FIG. 2B is a diagram illustrating a fused index table according to an embodiment of the present application;

FIG. 2C is a schematic diagram of a clickhouse database in one embodiment of the present application;

FIG. 2D is a schematic diagram illustrating the processing of log data in one embodiment of the present application;

FIG. 3 is a flow diagram of a data query method in one embodiment of the present application;

FIG. 4 is a schematic diagram of a query process in one embodiment of the present application;

FIG. 5 is a block diagram of a data query device according to an embodiment of the present application;

fig. 6 is a block diagram of a data query device according to an embodiment of the present application.

Detailed Description

The terminology used in the embodiments of the present application is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this application and the claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein is meant to encompass any and all possible combinations of one or more of the associated listed items.

It should be understood that although the terms first, second, third, etc. may be used in the embodiments of the present application to describe various information, the information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present application. Depending on the context, moreover, the word "if" as used may be interpreted as "at … …" or "when … …" or "in response to a determination".

In view of the above problems, in this embodiment, a fusion index table may be established, where the fusion index table includes a reverse index table and a forward index table, and the fusion index table is obtained by fusing the reverse index table and the forward index table, so as to implement data query of different query condition types and improve query speed. The fusion index table can be flexibly customized according to the service scene, massive log data can be inquired, and inquiry response can be quickly returned.

In this embodiment of the present application, a fusion index table may be established first, and data query is implemented based on the established fusion index table, referring to fig. 1, in order to establish a schematic diagram of the fusion index table, the method may include:

step 101, obtaining log data.

Log data is illustratively log data that records procedural events generated by the IT system, and by looking at the log data, IT is possible to know which user, at what time, at which device or what application system, what specific operation was done. The log data may be sourced from servers, storage devices, network devices, operating systems, middleware, databases, business systems, and the like. The log data can be divided into two categories of state logs and application system logs, wherein the state logs comprise CPU (Central processing Unit) use states, memory use states, temperature information, disk capacity, flow information, behavior analysis information and the like. The application system log includes log data of an operating system, log data of a database, log data of middleware, and the like. If the log data is classified according to the log format, the log data can be classified into text log data, system log data, SNMP log data and database log data.

The method for acquiring log data is not limited in this embodiment as long as log data can be obtained. See the following examples, which are examples of log data, and the contents of the log data are not limited.

{

"_index":"log_20190104",

"_type":"log_flow",

"doc_id":"AWgUqnLERXj2HjI5G01I",

"app _ name": Web browsing (HTTP) ",

"count _ name": Chinese ",

"src_ip":"10.10.10.1",

"dest_ip":"20.20.20.1",

"app _ type": network protocol ",

"attack _ name": active attack ",

"attach _ type": stiff wood worm ",

"…":"…",

}

step 102, determining inverted query participles and a first data structure corresponding to the inverted query participles based on log data, and recording a mapping relation between the inverted query participles and the first data structure in an inverted index table.

In one possible implementation, for step 102, to determine the inverted query participle and the first data structure corresponding to the inverted query participle based on the log data, the following may be used:

step 1021, determining a participle type of the inverted query participle and a data type of the first data structure.

For example, before step 101, the participle type of the query participle and the data type of the first data structure corresponding to the participle type may be configured, and this configuration manner is not limited.

For example, the user flexibly configures the segmentation type of the inverted query segmentation word (for example, a high-frequency field which needs to be searched and/or analyzed and summarized in real time) and the data type corresponding to the segmentation type (the data type is not limited, and can be flexibly configured according to the service scenario), or the device can count the high-frequency field which needs to be searched and/or analyzed and summarized in real time, determine the high-frequency field as the segmentation type of the inverted query segmentation word, determine the data type corresponding to the segmentation type, and do not limit the determination manner of the data type, for example, the important field except the segmentation type can be determined as the data type.

For example, the type of the participle may be "attack", the type of data corresponding to the type of the participle may be "attack number" indicating the number of times the participle "attack" appears in the log data, and/or "attack position" indicating the position where the participle "attack" appears in the log data, and the like. Of course, the above is only an example of the query participle type and the data type, and the query participle is not limited thereto.

Referring to fig. 2A, to reverse the examples of the segmentation type and the data type of the query segmentation, the segmentation type a and the data type a1 and the data type a2 corresponding to the segmentation type a may be configured first, and the segmentation type b and the data type b1 and the data type b2 corresponding to the segmentation type b may be configured, which is not limited in this regard.

Based on the above configuration content, in step 1021, a participle type of the inverted query participle and a data type of the first data structure may be determined. For example, a participle type a (e.g., attack) is determined as the participle type of the inverted query participle, and a data type a1 (e.g., number of attacks) and a data type a2 (e.g., attack location) are determined as the data types of the first data structure. Alternatively, the segmentation type b is determined as a segmentation type of the inverted query segmentation, and the data type b1 and the data type b2 are determined as data types of the first data structure.

At step 1022, a reciprocal query participle corresponding to the participle type is determined based on the log data.

For example, for each log data, after the log data is obtained, the inverted query participle corresponding to the participle type may be parsed from the log data, for example, if the participle type is "attack", all participles matching the participle type "attack" are parsed from the log data, and the participles are determined as the inverted query participle corresponding to the participle type "attack".

At step 1023, a first data structure corresponding to the data type is determined based on the log data.

For example, after the log data is obtained, a first data structure corresponding to the data type may be parsed from the log data, where the first data structure is a data structure corresponding to the inverted query participle, and for convenience of distinction, the data structure corresponding to the inverted query participle is referred to as the first data structure.

For example, if the data type is "attack times" and "attack position", the number of times the inverted query participle appears in the log data is parsed from the log data, and the position of the inverted query participle appearing in the log data is parsed from the log data, and thus, the first data structure includes the number of times the inverted query participle appears in the log data and the position of the inverted query participle appearing in the log data.

In summary, based on steps 1021-1023, the inverted query participle and the first data structure can be determined based on the log data, and then the mapping relationship between the inverted query participle and the first data structure can be recorded in the pre-maintained inverted index table. Referring to table 1, an example of an inverted index table is shown, and the inverted index table is used to record a mapping relationship between the inverted query participle and the first data structure, which is not limited thereto.

TABLE 1

Inverted query word segmentation	First data structure
		Word segmentation 11	Data 11
Word segmentation 12	Data 12
		…	…

In table 1, the participle 11 may represent an inverted query participle, such as a participle matching the participle type "attack", and the data 11 may represent a first data structure, such as data matching "number of attacks" (the number of times the inverted query participle appears in log data) and data matching "attack location" (the location in log data where the inverted query participle appears, from which the inverted query participle can be found), and so on.

In one possible implementation, the reverse index table may be flexibly designed according to the service scenario, that is, by configuring the segmentation type of the reverse query segmentation word and the data type corresponding to the segmentation type, the reverse index table may be flexibly designed according to the service scenario, and the reverse index table may include the reverse query segmentation word corresponding to the segmentation type and the first data structure corresponding to the data type.

Step 103, determining the query participle and a second data structure corresponding to the query participle based on the log data, and recording the mapping relation between the query participle and the second data structure in the query participle in the query table.

In one possible implementation, for step 103, to determine the query participle and the second data structure corresponding to the query participle based on the log data, the following may be adopted:

step 1031, determining the participle type of the query participle and the data type of the second data structure.

For example, before step 101, the type of the participle of the query participle and the data type of the second data structure corresponding to the type of the participle may be configured, and this configuration is not limited. For example, a user flexibly configures a word segmentation type of a query word according to a service scenario (for example, a high-frequency field requiring offline analysis and/or mass data analysis) and a data type corresponding to the word segmentation type (the data type is not limited, and can be flexibly configured according to the service scenario), or a device can count the high-frequency field requiring offline analysis and/or mass data analysis, determine the high-frequency field as the word segmentation type of the query word, determine the data type corresponding to the word segmentation type, and not limit the determination method of the data type.

Based on the above configuration content, in step 1031, the type of the participle of the query participle and the data type of the second data structure may be determined, and neither the type of the participle nor the data type is limited.

Step 1032 determines a query participle corresponding to the participle type based on the log data.

For example, after the log data is obtained, the query participle corresponding to the participle type is parsed from the log data, for example, if the participle type is m, all participles matching the participle type m are parsed from the log data, and the participles are determined as the query participle corresponding to the participle type m.

At step 1033, a second data structure corresponding to the data type is determined based on the log data.

For example, after the log data is obtained, a second data structure corresponding to the data type may be parsed from the log data, where the second data structure is a data structure corresponding to the query participle, and the data structure corresponding to the query participle is referred to as the second data structure for convenience of distinction.

In summary, based on steps 1031 to 1033, the query participle and the second data structure may be determined based on the log data, and then the mapping relationship between the query participle and the second data structure may be recorded in the pre-maintained query index table. Referring to table 2, an example of a forward index table is shown, and the forward index table is used to record a mapping relationship between the forward query participle and the second data structure, which is not limited thereto.

TABLE 2

Query word segmentation under positive row	Second data structure
		Word segmentation 21	Data 21
Word segmentation 22	Data 22
		…	…

In one possible implementation, the forward index table may be flexibly designed according to the service scenario, that is, by configuring the segmentation type of the forward query segmentation word and the data type corresponding to the segmentation type, the forward index table may be flexibly designed according to the service scenario, and the forward index table may include the forward query segmentation word corresponding to the segmentation type and the second data structure corresponding to the data type.

And 104, fusing the reverse index table and the forward index table to obtain a fused index table.

For example, after the log data is acquired, for each log data, a reverse index table and a forward index table may be established based on the log data, the reverse index table and the forward index table are fused, and the fused data is updated to the fusion index table to obtain an updated fusion index table.

For example, the merged index table may include a reverse index table and a forward index table, where the forward index table is a table established in a forward index manner, the forward index is a table established in a manner that document identifiers are keys (keywords), the number of times each keyword appears is recorded in the forward index table, and information of the keyword in each document in the forward index table is scanned during search until all documents containing the query keyword are found. The inverted index table is a table established by adopting an inverted index mode, records are searched for by the inverted index according to the attribute values, each item in the inverted index table comprises one attribute value and the address of each record with the attribute value, and the position of the record is determined by the attribute value instead of determining the attribute value by the record, so the inverted index table is called the inverted index (inverted index).

For example, a structure of the fused index table may be designed, and a one-to-one correspondence relationship between the reverse index and the forward index is realized through the fused index table, for example, see table 3, as an example of the fused index table, the content of the same row in the fused index table is the reverse index table and the forward index table corresponding to the same log data, and obviously, the fused index table may be obtained by fusing the reverse index table and the forward index table.

TABLE 3

Of course, the above tables 1, 2 and 3 are tables showing the reverse index table, the forward index table and the fusion index table, and in practical applications, other structures may be used to show the reverse index table, the forward index table and the fusion index table, which is not limited to this, and the table is used as an example for explanation.

Referring to fig. 2B, the index field of the reverse index table and the index field of the forward index table may be merged into a composite table, and the composite table is denoted as a merged index table, that is, the merged index table may include the reverse index table and the forward index table, and the merging manner of the reverse index table and the forward index table is not limited.

In a possible implementation manner, for each log data, after the log data is obtained, a global Identifier may be further allocated to the log data, and the global Identifier has uniqueness, for example, the global Identifier may be a UUID (universal Unique Identifier), and the global Identifier is not limited as long as the global Identifier has uniqueness, that is, the global identifiers corresponding to different log data are different.

Illustratively, when the mapping relationship between the inverted query participle and the first data structure is recorded in the inverted index table, the mapping relationship may further include a global identifier of the log data, that is, table 1 further includes a global identifier of the log data. When the mapping relationship between the query participle and the second data structure is recorded in the forward index table, the mapping relationship may further include a global identifier of the log data, that is, table 2 further includes a global identifier of the log data. When the inverted index table and the forward index table are merged, the merged index table may further include a global identifier of log data, as shown in table 4, which is an example of the merged index table and is not limited thereto.

TABLE 4

In a possible implementation manner, for each log data, after the log data is obtained, a full index (such as a full log field) may be further determined based on the log data, that is, based on all log fields available in the log data, the full index may include the inverted query participle and the forward query participle, the full index may further include other participles besides the inverted query participle and the forward query participle, and a type of the full index is not limited. Then, a mapping relationship between the global identifier of the log data and the full index may be recorded in a full index table (also referred to as a data storage table), that is, the full index table takes the global identifier of the log data as a key and the full index as a query result.

In a possible embodiment, for each log data, after the log data is obtained, the log data may be further recorded in the database, and the process is not limited to this process and is related to the type of the database.

For example, taking a clickhouse database (similar to the implementation of other types of databases) as an example, the full-scale index is an index defined by the clickhouse database, that is, the full-scale index is established in the clickhouse database, and log data (full-scale log data) is recorded through the clickhouse database. The clickhouse database supports the establishment of a full index according to the date, the log data is partitioned and indexed according to each day, the full index and the log data are updated, and the query of the log data is supported when the log data is written, which is shown in fig. 2C.

The clickhouse database is a development database component for searching, is also a multidimensional data storage and retrieval tool for a data warehouse scene, and aims to solve the problem of query performance of massive multidimensional data through targeted design. The clickhouse database adopts a Merge tree (Merge tree) to realize management of full-scale index and log data, and the Merge tree is a data indexing and arranging technology oriented to columnar storage.

Illustratively, in a clickhouse database, indexed by date, consisting of many partitions, each day's log data is stored in a separate partition, each partition contains an upper bound and a lower bound, and when the log data is inserted, a new temporary sorted partition is created, while small files are continually merged in the background, merging several small partitions into one large global partition. During the process of inserting log data, data that belong to different days can be separated into different pieces, and the different pieces cannot be recombined together. For each partition, an index file is generated, and the value of the index main key of each specific row of data is stored in the leaf node of the index, so that log data can be quickly located.

In summary, in this embodiment, referring to fig. 2D, log data may be obtained, and the log data may be adapted, that is, whether the log data is of a specified type (may be configured according to a service requirement, and represents the log data that needs to be queried by using a fusion index table, such as a traffic type and a vulnerability attack type).

If not, the technical scheme of the embodiment is not adopted for processing, and the process is not limited, for example, only the forward index table is established for the log data, but the reverse index table and the fusion index table are not established.

If so, it indicates that processing needs to be performed by using the technical scheme of this embodiment, for example, a reverse index table, a forward index table, and a fusion index table need to be established for the log data. Referring to fig. 2D, an inverted index table is first established based on the log data, and a forward index table is established based on the log data. Then, a fusion index table is established based on the reverse index table and the forward index table, and the log data is stored in a database.

Based on the reverse index table, the forward index table and the fusion index table, an embodiment of the present application provides a data query method, as shown in fig. 3, which is a schematic diagram of the data query method, and the method may include:

step 301, after receiving a query request, parsing out keywords from the query request.

For example, after receiving a query request sent by a client, the query request may carry a keyword, that is, a keyword to be queried, which indicates that the client needs to query a content corresponding to the keyword, and therefore, the keyword may be parsed from the query request, which is not limited to this process.

Step 302, determining the type of the query condition matched with the keyword based on the query request; the query condition type may be a reverse query type or a forward query type.

Illustratively, the inverted query type indicates a type of the inverted index table that needs to be queried, and query performance can be improved when the inverted index table is queried, that is, the performance of querying the inverted index table is higher than that of querying the forward index table. The forward query type indicates a type of the forward index table that needs to be queried, and query performance can be improved when the forward index table is queried, that is, the performance of querying the forward index table is higher than that of querying the reverse index table.

In one possible embodiment, the inverted query type may be configured according to business requirements, and is not limited to this, for example, the inverted query type may include but is not limited to: full text query, real-time data query, grouping and aggregating query, word segmentation and summary query, and the like. On the basis, if the query request is used for realizing full-text query, real-time data query, grouping aggregation query or word segmentation summary query, the query condition type matched with the keyword can be determined to be the inverted query type.

For example, the full-text query may be configured in advance as an inverted query type, and when the user needs to perform the full-text query, the query request carries contents related to the full-text query, and the contents of the query request are not limited. After receiving the query request, the query request can be analyzed based on the query request to implement full-text query (full-text query refers to querying an attribute value in the entire log data, such as the number of times that the attribute value a appears in the entire log data), and therefore, the query condition type can be determined to be the inverted query type.

For another example, the real-time data query may be configured in advance as an inverted query type, and when the user needs to perform the real-time data query, the query request carries content related to the real-time data query, and the content of the query request is not limited. After receiving the query request, the query request can be analyzed based on the query request to implement real-time data query (real-time data query refers to querying real-time data in log data, but not querying massive data in the log data, such as querying only log data of a minute, but not querying complete log data of a day), and therefore, the query condition type can be determined to be an inverted query type.

For another example, the packet aggregation query may be configured in advance as an inverted query type, and when the user needs to perform the packet aggregation query, the query request carries content related to the packet aggregation query, and the content of the query request is not limited. After receiving the query request, the query request can be analyzed based on the query request to implement a group aggregation query (a group aggregation query refers to grouping and aggregating log data, the grouping is to divide the log data into a plurality of groups by using a specific condition, the aggregation is to perform some operations on the log data in each group and integrate results), and therefore, the query condition type can be determined to be an inverted query type.

For another example, the word segmentation summary query may be configured in advance as an inverted query type, and when the user needs to perform the word segmentation summary query, the query request carries contents related to the word segmentation summary query, and the contents of the query request are not limited. After receiving the query request, the query request can be analyzed based on the query request to implement a participle summary query (the participle summary query refers to performing participle and summary on log data, the participle is to divide the log data into a plurality of participles, the summary is to count data of each participle, such as the number of times and positions of the participle appearing in the log data, and the like), so that the query condition type can be determined to be an inverted query type.

Of course, the above are just a few examples of the inverted query types, and there is no limitation on the inverted query types.

In one possible implementation, the type of positive query may be configured according to business requirements, and is not limited, for example, the type of positive query may include but is not limited to: multi-field query, long-time query, large-flow query, complex table connection query, paging query and the like. Based on this, if the query request is used to implement multi-field query (for example, the number of fields to be queried is greater than the number threshold), or long-time query (for example, the time period in which the data to be queried is located is greater than the duration threshold, for example, data of one year is queried), or large-flow query (for example, the flow of the data to be queried is greater than the flow threshold), or complex table join query, or page query, it may be determined that the query condition type matching the keyword is the type of query under investigation.

For example, the multi-field query may be configured in advance as a positive query type, and when the user needs to perform the multi-field query, the query request carries content related to the multi-field query, and the content of the query request is not limited. After receiving the query request, the query request can be analyzed based on the query request to implement multi-field query (multi-field query refers to querying a plurality of fields of log data, and the number of the fields to be queried is greater than a number threshold), so that the query condition type can be determined as the type of the positive query.

For another example, the long-time query may be configured in advance as a positive query type, and when the user needs to perform the long-time query, the query request carries content related to the long-time query, and the content of the query request is not limited. After receiving the query request, the query request can be analyzed based on the query request for realizing the long-time query, and therefore, the query condition type can be determined as the positive query type.

For another example, the large flow query may be configured in advance as a positive query type, and when the user needs to perform the large flow query, the query request carries contents related to the large flow query, and the contents of the query request are not limited. After receiving the query request, the query request can be analyzed based on the query request for realizing the large-flow query, so that the query condition type can be determined as the positive query type.

For another example, the complex table connection query may be configured in advance as a positive query type, and when the user needs to perform the complex table connection query, the query request carries contents related to the complex table connection query, and the contents of the query request are not limited. After receiving the query request, the query request can be analyzed based on the query request to implement a complex table join query (the complex table join query refers to an association query of multiple tables, i.e., a multi-table join query), and thus, the query condition type can be determined to be a positive query type.

For another example, the paging query may be configured as a positive query type in advance, and when the user needs to perform the paging query, the query request carries contents related to the paging query, and the contents of the query request are not limited. After receiving the query request, the query request can be analyzed based on the query request to implement a paging query (a paging query refers to a large amount of data to be queried, and cannot display all data by one page, and all data needs to be displayed by a plurality of pages), and therefore, the query condition type can be determined to be a positive query type.

Of course, the above are just a few examples of the type of positive query, and no limitation is made to the type of positive query.

In summary, the content of the query request may be analyzed to obtain a query condition type matching the keyword, where the query condition type may be an inverted query type or a forward query type.

Step 303, if the query condition type is an inverted query type, querying an inverted index table in the fusion index table through the keyword to obtain a first data structure matched with the keyword.

Illustratively, when a user needs to perform full-text query, real-time data query, grouping aggregation query, or segmentation summary query, the keywords carried in the query request may be keywords of an inverted query type, such as inverted query segmentation, so that after the keywords are parsed from the query request, the inverted index table in the fusion index table may be queried through the keywords, and a first data structure matched with the keywords is obtained.

For example, referring to table 4, the inverted index table in the merged index table may be queried through the keyword (e.g. the participle 11), and the first data structure matching the keyword is data 11.

Step 304, if the query condition type is a positive query type, querying a positive index table in the fusion index table through the keyword to obtain a second data structure matched with the keyword.

Illustratively, when a user needs to perform multi-field query, or long-time query, or large-flow query, or complex table link query, or paging query, the keyword carried in the query request may be a keyword of a query type, such as query participle, so that after the keyword is parsed from the query request, the forward index table in the fusion index table may be queried through the keyword to obtain a second data structure matching the keyword. For example, referring to table 4, the forward index table in the merged index table may be queried by the keyword (e.g. the participle 21), and the second data structure matching the keyword is the data 21.

In one possible implementation, if the user needs to perform full-text query (or real-time data query, or grouped aggregated query, or word-segmented summary query), and the user needs to perform multi-field query (or long-time query, or large-flow query, or complex table-linked query, or paging query), the keywords carried by the query request include the first keyword and the second keyword. For example, the query request includes two parts of content, the first part of content is used for implementing a full-text query, and the first part of content includes a first keyword, the second part of content is used for implementing a multi-field query, and the second part of content includes a second keyword. The query condition type matched with the first keyword is determined based on the first part of content of the query request, namely the query condition type matched with the first keyword is a reverse query type (query condition type corresponding to the full text query), and the query condition type matched with the second keyword is determined based on the second part of content of the query request, namely the query condition type matched with the second keyword is a forward query type (query condition type corresponding to the multi-field query).

Based on the above, because the query condition type matched with the first keyword is the inverted query type, the inverted index table in the fusion index table is queried through the first keyword, so that the first data structure matched with the first keyword is obtained. And because the query condition type matched with the second keyword is a forward query type, the forward index table in the fusion index table is queried through the second keyword to obtain a second data structure matched with the second keyword.

In a possible implementation manner, if the user needs to perform a grouping and aggregating query and the user needs to perform a participle summary query, the keywords carried by the query request include a first keyword and a second keyword. For example, the query request includes two parts of content, the first part of content is used for implementing a grouped aggregated query, and the first part of content includes a first keyword, the second part of content is used for implementing a participle summary query, and the second part of content includes a second keyword. And determining the type of the query condition matched with the first keyword as the inverted query type based on the first part of content of the query request, and determining the type of the query condition matched with the second keyword as the inverted query type based on the second part of content of the query request. Based on the first keyword, the inverted index table in the fusion index table is inquired through the first keyword, and a first data structure matched with the first keyword is obtained. And querying an inverted index table in the fusion index table through the second keyword to obtain a first data structure matched with the second keyword.

In a possible implementation manner, if the user needs to perform complex table connection query and the user needs to perform paging query, the keywords carried by the query request include the first keyword and the second keyword. For example, the query request includes two parts of content, the first part of content is used for implementing a complex table join query, and the first part of content includes a first keyword, the second part of content is used for implementing a paging query, and the second part of content includes a second keyword. And determining the type of the query condition matched with the first keyword as a positive query type based on the first part of content of the query request, and determining the type of the query condition matched with the second keyword as the positive query type based on the second part of content of the query request. Based on the first keyword, the forward index table in the fusion index table is queried through the first keyword to obtain a second data structure matched with the first keyword. And querying a forward index table in the fusion index table through the second keyword to obtain a second data structure matched with the second keyword.

In a possible implementation manner, the merged index table further includes a global identifier, and for step 303, when the inverted index table in the merged index table is queried by a keyword to obtain the first data structure matching the keyword, the global identifier matching the keyword may also be queried from the merged index table, as shown in table 4. Further, after the global identifier matching the keyword is obtained, the full index table may be queried through the global identifier to obtain the full index matching the keyword. For example, since the full index table may include a mapping relationship between the global identifier and the full index, the full index table may be queried through the global identifier, so as to obtain the full index matching the key.

Illustratively, since the database (e.g., clickhouse database) includes log data, the log data matching the keyword may also be obtained by querying the database, which is not limited in this process.

In a possible implementation manner, the merged index table further includes a global identifier, and for step 304, when the forward-ranked index table in the merged index table is queried by a keyword to obtain a second data structure matching the keyword, the global identifier matching the keyword may also be queried from the merged index table, as shown in table 4. Further, after the global identifier matching the keyword is obtained, the full index table may be queried through the global identifier to obtain the full index matching the keyword. For example, since the full index table may include a mapping relationship between the global identifier and the full index, the full index table may be queried through the global identifier, so as to obtain the full index matching the key.

Referring to FIG. 4, upon receiving a query request, the query request may be analyzed to determine a query condition type. For example, if the query request is used for realizing query of real-time data or word segmentation summary ranking, the query condition type is an inverted query type, and a first data structure matched with the keywords is obtained by querying an inverted index table in the fusion index table, so that quick query is realized. For another example, if the query request is used to implement complex table connection or paging query, the query condition type is a forward query type, and a second data structure matched with the keyword is obtained by querying a forward index table in the fusion index table, thereby implementing fast query.

After the reverse index table or the forward index table in the fusion index table is queried, the full index table can be queried through global identification to obtain a full index matched with the keywords. And, a database (such as a clickhouse database) can also be queried, from which log data matching the keywords are queried.

For example, the execution sequence is only an example given for convenience of description, and in practical applications, the execution sequence between the steps may also be changed, and the execution sequence is not limited. Moreover, in other embodiments, the steps of the respective methods do not have to be performed in the order shown and described herein, and the methods may include more or less steps than those described herein. Moreover, a single step described in this specification may be broken down into multiple steps for description in other embodiments; multiple steps described in this specification may be combined into a single step in other embodiments.

Based on the above technical solution, in the embodiment of the present application, in a security analysis scenario, a fusion index table may be established, where the fusion index table includes a reverse index table and a forward index table, and after receiving a query request, if a query condition type is a reverse query type, the reverse index table in the fusion index table may be queried to obtain a first data structure, and if the query condition type is a forward query type, the forward index table in the fusion index table may be queried to obtain a second data structure, in which the above manner, the fusion index table may be flexibly customized according to a service scenario to realize fast query of massive log data and quickly return a query response, and by fusing the reverse index table and the forward index table, data query of different query condition types is realized, query speed is increased, storage and query requirements of large data may be satisfied, and flexible index customization is supported, the method supports specified field and joint index query, is suitable for multi-table connection query, and supports query under complex SQL conditions.

Based on the same application concept as the above method, an embodiment of the present application provides a data query apparatus, as shown in fig. 5, which is a schematic structural diagram of the data query apparatus, and the apparatus may include:

the parsing module 51 is configured to parse a keyword from a query request after receiving the query request;

a determining module 52, configured to determine a query condition type matching the keyword based on the query request; the query condition type is a reverse query type or a positive query type;

the query module 53 is configured to, if the query condition type is an inverted query type, query an inverted index table in a fusion index table through the keyword to obtain a first data structure matched with the keyword; and if the query condition type is a positive query type, querying a positive index table in the fusion index table through the keyword to obtain a second data structure matched with the keyword.

In a possible implementation manner, if the keywords include a first keyword and a second keyword, the query condition type matched with the first keyword is a reverse query type, and the query condition type matched with the second keyword is a forward query type, based on this, the query module 53 is further configured to query the reverse index table in the fusion index table through the first keyword to obtain a first data structure matched with the first keyword, and query the forward index table in the fusion index table through the second keyword to obtain a second data structure matched with the second keyword.

For example, the determining module 52 is specifically configured to, when determining the query condition type matching the keyword based on the query request: if the query request is used for realizing full-text query, real-time data query, grouping aggregation query or word segmentation summary query, determining the query condition type as an inverted query type; or, if the query request is used for realizing multi-field query, or long-time query, or large-flow query, or complex table connection query, or paging query, determining that the query condition type is a positive query type.

Illustratively, the fused index table further includes a global identifier, and the querying module 53 is further configured to query the full index table through the global identifier to obtain a full index matched with the keyword.

In a possible embodiment, the device may further comprise (not shown in the figures):

the acquisition module is used for acquiring log data; the recording module is used for determining inverted query participles and a first data structure corresponding to the inverted query participles based on the log data, and recording the mapping relation between the inverted query participles and the first data structure in an inverted index table; determining a query participle and a second data structure corresponding to the query participle based on the log data, and recording a mapping relation between the query participle and the second data structure in a query table; and the fusion module is used for fusing the reverse index table and the forward index table to obtain a fusion index table.

Illustratively, the obtaining module is further configured to allocate a global identifier to the log data, where the global identifier has uniqueness, and the fusion index table includes the global identifier; the recording module is further configured to determine a full index based on the log data, where the full index includes inverted query participles and forward query participles, and record a mapping relationship between the global identifier and the full index in a full index table.

For example, when the logging module determines, based on the log data, inverted query tokens and a first data structure corresponding to the inverted query tokens, the logging module is specifically configured to: determining a segmentation type of the inverted query segmentation and a data type of the first data structure; determining inverted query participles corresponding to the participle types based on the log data; a first data structure corresponding to the data type is determined based on the log data.

Based on the same application concept as the method, in the embodiment of the present application, a data query device is provided, as shown in fig. 6, where the data query device includes: a processor 61 and a machine-readable storage medium 62, the machine-readable storage medium 62 storing machine-executable instructions executable by the processor 61; the processor 61 is configured to execute machine executable instructions to perform the following steps:

after receiving a query request, analyzing keywords from the query request;

Based on the same application concept as the method, embodiments of the present application further provide a machine-readable storage medium, where a plurality of computer instructions are stored on the machine-readable storage medium, and when the computer instructions are executed by a processor, the data query method disclosed in the above example of the present application can be implemented.

The machine-readable storage medium may be any electronic, magnetic, optical, or other physical storage device that can contain or store information such as executable instructions, data, and the like. For example, the machine-readable storage medium may be: a RAM (random Access Memory), a volatile Memory, a non-volatile Memory, a flash Memory, a storage drive (e.g., a hard drive), a solid state drive, any type of storage disk (e.g., an optical disk, a dvd, etc.), or similar storage medium, or a combination thereof.

The systems, devices, modules or units illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions. A typical implementation device is a computer, which may take the form of a personal computer, laptop computer, cellular telephone, camera phone, smart phone, personal digital assistant, media player, navigation device, email messaging device, game console, tablet computer, wearable device, or a combination of any of these devices.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

Furthermore, these computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The above description is only an example of the present application and is not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.

Claims

1. A method for data query, the method comprising:

after receiving a query request, analyzing keywords from the query request;

2. The method of claim 1, wherein if the keywords comprise a first keyword and a second keyword, the query condition type matching the first keyword is a reverse query type, and the query condition type matching the second keyword is a forward query type, the method further comprising:

3. The method of claim 1,

the determining the type of the query condition matched with the keyword based on the query request comprises:

if the query request is used for realizing full-text query, real-time data query, grouping aggregation query or word segmentation summary query, determining the query condition type as an inverted query type; or the like, or, alternatively,

4. The method according to any one of claims 1 to 3,

the fused index table further includes a global identifier, and after the inverted index table in the fused index table is queried through the keyword to obtain a first data structure matched with the keyword, the method further includes: querying a full index table through the global identification to obtain a full index matched with the keyword;

5. The method according to any one of claims 1-3, further comprising:

acquiring log data;

6. The method of claim 5,

after the log data is obtained, the method further comprises: distributing a global identification for the log data, wherein the global identification has uniqueness, and the fusion index table comprises the global identification;

7. The method of claim 5, wherein determining inverted query tokens and a first data structure corresponding to the inverted query tokens based on the log data comprises:

8. A data query apparatus, characterized in that the apparatus comprises:

9. A data query device, comprising: a processor and a machine-readable storage medium storing machine-executable instructions executable by the processor;

after receiving a query request, analyzing keywords from the query request;

10. A machine-readable storage medium comprising, in combination,

the machine-readable storage medium has stored thereon computer instructions which, when executed by a processor, implement the method steps of any of claims 1-7.