WO2014101445A1

WO2014101445A1 - Data query method and system

Info

Publication number: WO2014101445A1
Application number: PCT/CN2013/082130
Authority: WO
Inventors: 谢永方
Original assignee: 华为技术有限公司
Priority date: 2012-12-24
Filing date: 2013-08-23
Publication date: 2014-07-03
Also published as: CN103064933B; CN103064933A

Abstract

Provided are a data query method and system. The method comprises: receiving an input query request, the query request carries a field to be queried and query words in the field; querying a concentration index table corresponding to the field to obtain a collector identity corresponding to the query words; generating, according to the query request, a query command carrying the field and the query words, and sending the query command to a data collector corresponding to the collector identity, so that the data collector queries a local index table corresponding to the field carried in the query command in the data collector for data matching the query words carried in the query command; and receiving the data returned by the data collector, forming a query result of the query request according to the received data and outputting the result. The present invention can improve processing speed of data query, reduce occupation of system resources of the data collector, and decrease the load and pressure of a data query server.

Description

The present invention claims the priority of the Chinese Patent Application filed on Dec. 24, 2012, the Chinese Patent Application No. 201210566137.4, the name of the invention is "data query method and system", the entire contents of which are incorporated by reference. In this application. Technical field

The present invention relates to the field of computer network technologies, and in particular, to a data query method, a data query server, a data collector, and a data query system.

BACKGROUND In the era when the Internet is extremely developed, data collection and query systems have a wide range of uses, and various information technology (Information Technology, IT) systems, network devices, and security devices generate a large amount of logs and the like, and many of them have logs. Data needs to be archived for long periods of time and used for various audits and queries.

In the massive data collection and query system, the system architecture has two modes: distributed storage and centralized storage. In either case, it faces the need for rapid storage and fast query of massive log data.

An existing distributed data query system includes a data query server and multiple data collectors. The data collector is responsible for collecting (receiving, formatting, merging) logs, storing and indexing, and the data query server is a log query. Unified entrance. When the specified log is to be queried, the data query server sends a query command to all data collectors, and the query results of all data collectors are collected to summarize the final query result. If there are many data collectors, the logs to be queried in one query exist only in a few data collectors, and the query operations are very frequent. This existing solution will increase the burden on all data collectors, including the power consumption of the data collector. And the central processing unit (CPU) resources, in addition to the query, the data collector needs to do data receiving and warehousing work. If the query operation is frequent, it will also affect the data collector's acquisition performance and reduce The overall processing power of the system.

The original log data of another distributed data query system is centralized storage, each The data collector is only responsible for collecting (receiving, formatting, merging) and reporting the log. The log content is not saved locally after being processed by the data collector, but is reported to the data query server for storage. After receiving the log reported by the data collector, the data query server stores the data in the database and establishes an index. When the log query is needed, it can be directly queried in the database of the data query server. This way of storing data in a centralized manner, the log query operation is limited to being executed in the database of the data query server, and does not affect the data collector. However, since the log data is stored in the database of the data query server, the data collector needs to report a large amount of log data, which greatly increases the load of the data query server, and consumes the data collector and the data query server on the other hand. The bandwidth between the two limits the number of data collectors that a data query server can carry. The processing power of the entire system cannot be very high. SUMMARY OF THE INVENTION The present invention provides a data query method, a data query server, a data collector, and a data query system, which can improve the processing speed of data query, reduce the system resource occupation of the data collector, and load pressure of the data query server, and improve the entire The processing power of the system.

In order to achieve the above object, a first aspect of the present invention provides a data query method, where the method includes:

Receiving an input query request, where the query request carries a field to be queried and a query word in the field;

Obtaining, by the centralized index table corresponding to the field, the collector identifier corresponding to the query term, where the centralized index table stores a correspondence between the query word and the collector identifier in the field; generating according to the query request a query command carrying the field and the query word, and sending the query command to the data collector corresponding to the collector identifier, where the data collector performs the query command in the data collector In the local index table corresponding to the field, the query obtains data that matches the query word carried in the query command;

Receiving the data returned by the data collector, forming a query result of the query request according to the received data, and outputting the result.

With reference to the first aspect, in a first possible implementation manner of the first aspect, before the obtaining, by the centralized index table corresponding to the field, the collector identifier corresponding to the query term, the method further includes: a field, establishing a centralized index table corresponding to the field; The establishing a centralized index table corresponding to the field includes:

Receiving a report index table of the field sent by each data collector, where the report index table includes a query word corresponding to the field in a data collector that sends the report index table; In the index table, the correspondence between the identifier of the data collector and the query term of the field in the report index table reported by the data collector is stored.

With reference to the first aspect, in a second possible implementation manner of the first aspect, the obtaining, by the query, the collector identifier corresponding to the query term from the centralized index table corresponding to the field includes:

If the query request carries at least two fields to be queried, obtain query terms of each field in the query request, and record a logical relationship between the query words of the fields;

Querying, from the centralized index table corresponding to each field, a collector identifier corresponding to the query word of each field;

And collecting, according to the logical relationship between the query words of the fields, the collector identifier that satisfies the logical relationship from the collector identifier obtained by the query.

In a second aspect, the present invention further provides a data query method, the method includes: receiving a query command sent by a data query server, where the query command includes a field to be queried carried in a query request received by the data query server And a query term in the field; querying, from a local index table corresponding to the field, a storage location of data matching a query word in the query command, where the local index table stores the Correspondence between the query word and the storage location of the data;

The data is acquired and sent to the data query server according to the storage location of the data. With reference to the second aspect, in a first possible implementation manner of the second aspect, the querying, from the local index table corresponding to the field, the location of the data that matches the query word in the query command Previously, it also included:

Establishing a local index table corresponding to the field for the field;

The establishing a local index table corresponding to the field includes:

Obtaining data in the current data collector and a storage location of the data, where the data includes content of at least one field;

For each field, the content of the data in the field is used as a query word of the data, and a mapping relationship between the query word and the storage location is established, and a local index table of the field in the current data collector is formed. . With reference to the first possible implementation manner of the second aspect, in a second possible implementation manner of the second aspect, the data is stored in a local index table of the field in the current data collector After the correspondence between the query word and the storage location of the data, the method further includes:

Extracting the query term from the local index table of the field, performing deduplication processing on the query word to form a report of the field of the current data collector;

And sending the report index table of the field to the data query server, where the data query server establishes a centralized index table corresponding to the field.

With reference to the second aspect, in a third possible implementation manner of the second aspect, the obtaining, by the local index table corresponding to the field, the storage location of the data that matches the query term in the query command, Includes:

If the query command carries at least two fields to be queried, the query words of each field in the query command are obtained, and the logical relationship between the query words of the fields is recorded;

Querying, from the local index table corresponding to each field, a storage location of data matching the query words of each field in the query command;

And according to the logical relationship between the query words of the respective fields, the storage location of the data satisfying the logical relationship is filtered out from the storage location of the data obtained by the query.

In a third aspect, the present invention further provides a data query server, where the data query server includes:

a first receiving unit, configured to receive an input query request, where the query request carries a field to be queried and a query word in the field;

a first querying unit, configured to: obtain, from the centralized index table corresponding to the field, a collector identifier corresponding to the query word carried by the first receiving unit, and the storage device in the centralized index table Corresponding relationship between the query word in the field and the collector identifier;

a first processing unit, configured to generate, according to the query request, a query command that carries the field and the query word, and send the query command to a data collector corresponding to the collector identifier obtained by querying by the first query unit And the data is obtained by the data collector in a local index table corresponding to the field carried by the query command in the data collector, and the query obtains data that matches the query word carried in the query command;

The first output unit is configured to receive the data returned by the data collector, form a query result of the query request according to the received data, and output the result. In conjunction with the third aspect, in a first possible implementation manner of the third aspect, the data query server further includes:

a first index unit, configured to establish, according to the field, a centralized index table corresponding to the field; the first index unit includes:

a first receiving subunit, configured to receive a reporting index table of the field sent by each data collector, where the reporting index table includes a query corresponding to the field in the data collector that sends the reporting index table Word

The first index sub-unit is configured to store, in the centralized index table of the field, a correspondence between an identifier of the data collector and a query word of the field in the report index table reported by the data collector.

With reference to the third aspect, in a second possible implementation manner of the third aspect, the first query unit includes:

a first parsing subunit, configured to: if the query request received by the first receiving unit carries at least two fields to be queried, obtain query words of each field in the query request, and record the fields Query the logical relationship between words;

a first query subunit, configured to query, from the centralized index table corresponding to each field, a collector identifier corresponding to the query word of each field acquired by the first parsing subunit;

a first filtering subunit, configured to filter, according to the logical relationship between the query words of the fields obtained by the first parsing subunit, from the collector identifier obtained by querying by the first query subunit A collector identifier that satisfies the logical relationship.

In a fourth aspect, the present invention further provides a data collector, where the data collector includes: a second receiving unit, configured to receive a query command sent by a data query server, where the query command carries a field to be queried and the The query word in the field;

a second query unit, configured to query, from a local index table corresponding to the field, a storage location of data that matches a query word in a query command received by the second receiving unit, where the local index table stores Corresponding relationship between the query word in the field and the storage location of the data;

And a second processing unit, configured to acquire the data according to the storage location of the data obtained by querying by the second query unit, and send the data to the data query server.

In conjunction with the fourth aspect, in a first possible implementation manner of the fourth aspect, the data collector further includes:

a second indexing unit, configured to establish, according to the field, a local index table corresponding to the field; The second index unit includes:

Obtaining a subunit, configured to acquire data in a current data collector and a storage location of the data, where the data includes content of at least one field;

a second index subunit, configured to use, for each field acquired by the obtaining subunit, a content of the data in the field as a query word of the data, and a local index of the field in the data collector In the table, a correspondence between a query word storing the data and a storage location of the data.

With reference to the first possible implementation of the fourth aspect, in a second possible implementation manner of the fourth aspect, the second indexing unit further includes:

a third index subunit, configured to extract the query word from a local index table of the field obtained by the second index subunit, and perform deduplication processing on the query word to form the current data collector The report index table of the field;

And a sending subunit, configured to send the report index table of the field formed by the third index subunit to the data query server, where the data query server establishes a centralized index table of the field.

In conjunction with the fourth aspect, in a third possible implementation manner of the fourth aspect, the second query unit includes:

a second parsing subunit, configured to: if the query command received by the second receiving unit carries at least two fields to be queried, obtain query words of each field in the query command, and record the fields Query the logical relationship between words;

a second query subunit, configured to query, from the local index table corresponding to each field, a storage location of data that matches a query word of each field in the query command acquired by the second parsing subunit;

a second filtering subunit, configured to filter, according to a logical relationship between the query words of the fields obtained by the second parsing subunit, from a storage location of the data obtained by querying by the second query subunit A storage location of data that satisfies the logical relationship is obtained.

In a fifth aspect, the present invention further provides a data query system, where the system includes: the data query server provided by the third aspect, and the data collector provided by the fourth aspect.

The data query method, the data query server, the data collector, and the data query system provided by the embodiments of the present invention respectively establish a local index table in the data collector and the data query server, respectively. The centralized index table can effectively alleviate the system resource usage of the data collector, so that the data collector can have more resources for improving the collection performance, thereby improving the overall processing capability of the system and improving the processing speed of the data query. BRIEF DESCRIPTION OF DRAWINGS FIG. 1 is a structural diagram of a data query system according to an embodiment of the present invention;

2 is a signaling diagram of an index establishment process according to Embodiment 1 of the present invention;

3 is a flowchart of a data query method according to Embodiment 1 of the present invention;

4 is a flowchart of still another data query method according to Embodiment 1 of the present invention;

FIG. 5 is a schematic diagram of a data query system according to Embodiment 2 of the present invention; FIG.

6 is a schematic diagram of a data query server and a data collector provided by Embodiment 2 of the present invention; FIG. 7 is a schematic diagram of a data query server according to Embodiment 3 of the present invention;

FIG. 8 is a schematic diagram of a data collector provided by Embodiment 3 of the present invention. DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS The technical solution of the present invention will be further described in detail below with reference to the accompanying drawings and embodiments. 1 is a structural diagram of a data query system according to an embodiment of the present invention. As shown in FIG. 1, the present invention adopts a distributed architecture, including a data query server 10 and multiple data collectors 20. The data collector 20 is responsible for The data source server 10 is a unified entry of data query for collecting (including receiving, formatting, merging), storing, and indexing data such as massive logs reported by the log source 30.

The data query method provided by the present invention can be used for quick query of massive data. In the following embodiments, the log data is taken as an example for description.

Embodiment 1

Before querying the log data, the data stored in the system needs to be indexed in advance, usually during data storage, for the system to query the data according to the established index table.

In this embodiment, a local index table and a centralized index table are respectively established in the data collector and the data query server. The local index table is used to store the index of the log data in the current data collector, and its function is: When the query condition is given, the specific storage location of all the logs in the local data that meet the conditions can be found. The centralized index table is used to store an index of the query word and the collector identifier of each field, The function is: When the query condition is given, it can be found on which data collectors the data to be queried may be stored, and the identification information of the data collector storing the data to be queried is given in the centralized index table.

2 is a signaling diagram of the index establishment process provided in this embodiment. As shown in FIG. 2, the method includes the following steps: Step S101: A data collector acquires data in a current data collector and a storage location of the data.

Optionally, the data stored in the data collector is the original log data reported by the log source. After the log source reports the raw log data to the data collector, the data collector also needs to build a local index for the original log data.

The data collector formats and merges the original log data, and processes the original log data into the form of each record in the log table (that is, each row of records in the log table). Each log table may have multiple fields, as follows As shown in Table 1, the log table includes fields such as field 1 and field 2, and the serial number indicates the storage location of the data.

Table 1

Step S102: The data collector uses, for each field, the content of the data in the field as a query word of the data, and establishes a mapping relationship between the query word and the storage location, forming the field in the current data. The local index table of the collector.

In the local index of the data collector, a local index table is established for each field of each record in the log table, and each index table corresponds to a field content in the specified log table and contains a specified field content. The storage location information of the data in the log table. The content of the field is used as the query term for the corresponding data. The mapping relationship between the query term and the storage location may be, but is not limited to, expressed in the form of a table, as shown in Table 2 below, the local index table of the field 1 in the data collector:

Table 2

Field 1 within aaa 2

The specific location of the data on the data collector can be quickly found by the local index table of the corresponding field. For example, if you want the specific location of a content in field 1, you can quickly find the corresponding location based on the local index table of field 1.

Step S103: The data collector extracts the query word from the local index table of the field, and performs deduplication processing on the query word to form a report index table of the field of the current data collector.

The data collector extracts the reported report index table according to the local index table of the field. There is no specific original log location corresponding to each index content in the report index table, and only the content of each data corresponding field, that is, the query word. Usually, before the report, the query words are de-reprocessed so that the fields of the reported fields are not duplicated. For example, the field 1 in the table has two aaa and one bbb. After de-reprocessing, there is only one aaa and one bbb in the index table, as shown in Table 3 below.

table 3

Field 1 within aaa

Bbb compares the query words extracted from the newly added local index table with the reported report index table. If the query words are the same, they are not added to the report index table, only the new ones are not duplicated. Report the index table.

Step S104: The data collector sends the report index table of the field to the data query server.

The data collector reports the report index table as shown in Table 3 to the data query server. What's new in this table.

The data query server summarizes the report index table sent by the multiple data collectors, and establishes a centralized index table corresponding to each field in the data query server, which specifically includes: Step S105: The data query server receives the report index table of the field sent by each data collector.

The report index table of the data collector includes a query word corresponding to the field in the data collector.

Step S106: Establish a mapping relationship between the query word and the identifier of the collector, and form a centralized index table of the field.

For different fields, the data query server respectively establishes a centralized index table of corresponding fields. For example, for field 1, the centralized index table of the data query server stores the query words and corresponding collector identifiers contained in each data collector, as shown in Table 4 below:

Table 4

For another example, for field 2, the centralized index table established by the data query server is shown in Table 5 below:

table 5

Each of the fields in the data query server has a centralized index table, and the report index table reported by each data collector is summarized into a centralized index table of the data query server, that is, in the centralized index table of the field on the data query server. Corresponding relationship between the identifier of the data collector and the query term of the field in the report index table reported by the data collector.

After the index table is created, when the query request is received, the data to be queried can be found through the index table.

FIG. 3 is a flowchart of a data query method according to the embodiment. As shown in FIG. 3, the data query method of the present invention includes: Step S201: The data query server receives the input query request.

The user inputs a query request by means of a form search or an expression search. For the form retrieval method, a fixed field is given in the interaction interface with the user, the user can input the query word in the prompt box of multiple fields, and finally submit the query request to the data query server through the submit button. For the expression retrieval method, the user directly inputs the field to be queried and the query word of the field, and submits it to the data query server.

The query request received by the data query server carries the field to be queried and the query term of the field.

When the query request contains multiple fields, the query request received by the data query server also carries the logical relationship between the query words of the respective fields. For example, the user enters query terms in different search fields, and the data query server receives the logical relationships between the query words and the various query terms in the various search fields.

For example, the input query request is: (field l=aaa) AND (field 2=ccc), the query request includes two fields, field 1 and field 2, and the corresponding query words are aaa and ccc, AND represents The logical relationship between these two query words is the relationship between "and".

Step S202: The data query server queries the centralized index table corresponding to the field to obtain a collector identifier corresponding to the query word.

When the query request includes the field 1, the query word matching the query word is obtained in the centralized index table of the field 1, and the corresponding collector identifier is obtained, thereby obtaining the data collector of the query word.

For example, if the query request is: field l=aaa, the collector identifier corresponding to aaa is obtained from the centralized index table shown in Table 4 as collector 1 and collector 3.

For the case where the query request includes multiple fields, this step specifically includes:

Step S2021: Acquire a query word of each field in the query request, and record a logical relationship between the query words of the fields carried in the query request.

Step S2022: Query, from the centralized index table corresponding to each field, the collector identifier corresponding to the query word of each field.

Step S2023: According to the logical relationship between the query words of the fields, the collector identifier that is obtained by querying in step S2022 is filtered to obtain a collector identifier that satisfies the logical relationship.

For example, the query request is: (field l=aaa) AND (field 2=ccc), then from the set shown in Table 4. In the index table, the identifier of the collector corresponding to aaa is collected as collector 1 and collector 3. From the centralized index table shown in Table 5, the collector identifier corresponding to ccc is obtained as collector 1, between aaa and ccc. If the logical relationship is "and", then the collector identifier that satisfies the logical relationship can be filtered out only by the collector 1.

Step S203: The data query server generates a query command that carries the field and the query word according to the query request, and sends the query command to the data collector corresponding to the collector identifier.

The query command generated by the data query server may be the same as the input query request, or may include only the query words included in the transmitted destination data collector. For example, if the query request is: (field l=aaa) OR (field 2=ccc), the collector identifier corresponding to aaa is obtained from the centralized index table shown in Table 4 as collector 1 and collector 3, and the slave table The query in the centralized index table shown in 5 shows that the collector identifier corresponding to ccc is the collector 1, and the logical relationship between aaa and ccc is "or". The query command sent to the collector 1 is: (field l=aaa) OR (field 2 = ccc), the query command sent to collector 3 is: 2 = ccc.

The data collector obtains data matching the query word through the local index table of the data collector, and the specific process is described in detail in conjunction with FIG. 3 .

Step S204: The data query server receives the data returned by the data collector, and forms a query result of the query request according to the received data, and outputs the result.

The data query server summarizes the received data, which may be, but is not limited to, output in the form of a table.

For example, for a query request: (field l=aaa) AND (field 2=ccc), the result of the last output query is shown in Table 6 below:

Table 6

FIG. 4 is a flowchart of still another data query method provided by this embodiment. As shown in FIG. 4, the data query method of the present invention includes:

Step S301: The data collector receives the query command sent by the data query server.

The query command includes a field to be queried and a query word of the field carried in the query request received by the data query server, and optionally, a query word and a plurality of fields The logical relationship between the query words.

Step S302: The data collector queries the local index table corresponding to the field to obtain a storage location of data that matches the query command.

For the case of the query word including the multiple fields in the query command, the step includes: Step S3021: The data collector obtains the query words of each field in the query command received in step S301, and records each field carried in the query command. The logical relationship between the query words.

Step S3022: The data collector queries, from the local index table corresponding to each field, a storage location of data that matches the query words of the respective fields.

For example, if the query command is: (field l=aaa) OR (field 2=ccc), then the storage location corresponding to the data obtained in the local index table of field 1 is 2, and the data is queried in the local index table of field 2. The corresponding storage location is also 2.

Step S3023: The data collector filters, according to the logical relationship between the query words of the fields, the storage location of the data that satisfies the logical relationship from the storage location of the data obtained by the query.

The data collector filters the storage location of the matched data according to the logical relationship between the query words to obtain the storage location of the data satisfying the logical relationship.

Step S303: The data collector acquires the data according to the storage location of the data, and sends the data to the data query server.

The data collector obtains the corresponding data according to the storage location of the data obtained in step S302. For example, the data that can be obtained to satisfy the query command is:

The data collector sends the data to the data query server, and the data query server aggregates and outputs the query result of the query request.

In the data query method provided by the embodiment, the local index table and the centralized index table are respectively established in the data collector and the data query server, and when the data is queried, the collector identifier corresponding to the query word is found through the centralized index table of the corresponding field. Therefore, the corresponding data is obtained in the data collector corresponding to the collector identifier, which can effectively reduce the system resource occupation of the data query server and the data collector, so that the data collector can have more resources for improving the collection performance. Improve the processing speed of data queries. Embodiment 2

FIG. 5 is a schematic diagram of a data query system according to the embodiment. As shown in FIG. 5, the data query system of the embodiment of the present invention includes: a data query server 10 and a data collector 20. The data collector 20 is responsible for data collection, including receiving, formatting, merging, storing data, and indexing the stored data. The data query server 10 is used for unified management of the contents stored on the plurality of data collectors 20 and serves as a unified entry for data query.

FIG. 6 is a schematic diagram of the data query server 10 and the data collector 20 provided in this embodiment. As shown in FIG. 6, the data query server 10 includes a first index unit 100, a first receiving unit 101, and a first query unit 102. A processing unit 103 and a first output unit 104.

The data collector 20 includes a second index unit 200, a second receiving unit 201, a second query unit 202, and a second processing unit 203.

Before the query of the log data is performed, the data collector 20 and the data query server 10 need to pre-index the data stored in the system, which is usually completed during the data storage, and is used by the system to query the data according to the established index table. The data query server 10 establishes a centralized index table of the fields for each of the fields using the first index unit 100. The data collector 20 uses the second index unit 200 to establish a local index table of fields.

The local index table is used to store the index of the log data in the current data collector. The function is: When the query condition is given, the specific storage location of all the logs in the local data that meet the conditions can be found. The centralized index table is used to store the index of the data to be queried and the identifier of the collector, and its function is: When the query condition is given, it can be found on which data collectors the data to be queried may be stored, and the storage is given in the centralized index table. Identification information of the data collector of the data to be queried.

The second index unit 200 includes an acquisition subunit 2001, a second index subunit 2002, a third index subunit 2003, and a transmission subunit 2004.

The obtaining subunit 2001 is configured to acquire data in the current data collector and a storage location of the data, where the data includes content of at least one field.

The data collector formats and merges the original log data, and processes the original log data into the form of each record in the log table (ie, each row in the log table), and each log table may There are multiple fields, as shown in Table 1, the log table includes fields such as field 1 and field 2, and the serial number indicates the storage location of the data.

The second index sub-unit 2002 is configured to use, for each field, the content of the data in the field as a query word of the data, to establish a mapping relationship between the query word and the storage location, to form the field in the The local index table of the current data collector.

In the local index of the data collector 20, a local index table is established for each field of each record in the log table, and each index table corresponds to a field content in the specified log table and a certain field containing the specified The data of the content is stored in the log table. Among them, the field content is used as the query word of the corresponding data. The mapping relationship between the query term and the storage location may be, but is not limited to, expressed in the form of a table, as shown in Table 2.

The third index subunit 2003 is configured to extract the query word from the local index table of the field, and perform deduplication processing on the query word to form a report index table of the field of the current data collector.

The data collector extracts the reported report index table according to the local index table of the field. There is no specific original log location corresponding to each index content in the report index table, and only the content of each data corresponding field, that is, the query word. Usually, before the report, the query words are de-reprocessed so that the fields of the reported fields are not duplicated. For example, the field 1 in the table has two aaa and one bbb. After de-reprocessing, there is only one aaa and one bbb in the index table, as shown in Table 3.

For the query words extracted from the newly added local index table, compare them with the reported report index table. If the query words are the same, they are not added to the report index table, and only the newly added non-repeating reports are uploaded. direction chart.

The sending sub-unit 2004 is configured to send the report index table of the field to the first index unit 100 of the data query server 10, and use the data query server to establish a centralized index table of the field.

The first index unit 100 includes a first receiving subunit 1001 and a first index subunit 1002. The first receiving subunit 1001 is configured to receive a reporting index table of the field sent by each data collector. The report index table includes a query word corresponding to the field in the data collector that sends the report index table. The first index sub-unit 1002 is configured to establish a mapping relationship between the query word and the collector identifier, and form a centralized index table of the field.

The first index subunit 1002 respectively establishes a centralized index table of corresponding fields for different fields. For example, for field 1, the query words contained in each data collector and the corresponding collector identifiers are stored in the centralized index table, as shown in Table 4.

The first index sub-unit 1002 establishes a centralized index table for each field, and the report index table reported by each data collector is summarized into a centralized index table of the data query server.

After the index table is established in the data query server 10 and the data collector 20 by the first index unit 100 and the second index unit 200, respectively, when the query request is received, the data to be queried can be found through the index table.

The first receiving unit 101 is configured to receive an input query request.

The query request received by the first receiving unit ₁₀₁ carries the field to be queried and the query word of the field.

When the query request contains a plurality of fields, the query request received by the first receiving unit 101 further includes a logical relationship between the query words of the respective fields. For example, the user inputs a query word in a different search field, and the first receiving unit 101 receives the logical relationship between the query words and the respective query words in the respective different search fields.

The first querying unit 102 is configured to obtain, from the centralized index table of the field, the collector identifier corresponding to the query word carried by the query request received by the first receiving unit 101.

When the query request includes the field 1, the first query unit 102 obtains the query word matching the query word from the centralized index table of the field 1, and obtains the corresponding collector identifier, thereby obtaining the query word. Data collector.

For the case where the query request includes multiple fields, the first query unit 102 includes: a parsing subunit, a first query subunit, and a first filtering subunit (not shown).

The first parsing subunit is configured to obtain a query word of each field in the query request, and record the The logical relationship between the query words of each field.

The first query sub-unit is configured to query, from the centralized index table corresponding to the field, the collector identifier corresponding to the query word of each field obtained by the first parsing sub-unit.

a first filtering subunit, configured to: according to the logical relationship between the query words of the fields obtained by the first parsing subunit, the collector identifier that is obtained by querying from the first query subunit is satisfied The collector ID of the logical relationship.

The first processing unit 103 is configured to generate, according to the query request, a query command that carries the field and the query word, and send the query command to the data collector 20 corresponding to the obtained collector identifier that is obtained by the first query unit 102. .

The query command generated by the first processing unit 103 may be the same as the input query request, or may include only the query words included in the transmitted destination data collector. For example, if the query request is: (field l=aaa) OR (field 2=ccc), the first query unit 102 queries the centralized index table shown in Table 4 to obtain the collector identifier corresponding to aaa as the collector 1 and the collection. The first query unit 102 queries the centralized index table shown in Table 5 to obtain the collector identifier corresponding to ccc as the collector 1, and the logical relationship between aaa and ccc is "OR", and the first processing unit 103 generates The query command sent to the collector 1 is: (field l = aaa) OR (field 2 = ccc), and the first processing unit 103 generates a query command sent to the collector 3 as: field 2 = ccc.

The second receiving unit 201 of the data collector 20 is configured to receive the query command sent by the data query server 10.

The query command includes a field to be queried and a query term of the field carried in the query request received by the data query server 10, and may include a logical relationship between the query words of the plurality of fields and the query word.

The second query unit 202 is configured to query, from the local index table corresponding to the field, a storage location of data that matches the query word in the query command received by the second receiving unit 201.

Optionally, for the case that the query word includes multiple fields in the query command, the second query unit 202 includes: a second parsing subunit, a second query subunit, and a second filtering subunit (not shown).

The second parsing sub-unit is configured to: when the query command received by the second receiving unit 201 carries a plurality of fields to be queried, obtain query terms of each field in the query command, and record the information carried in the query command The logical relationship between the query words of each field. The storage location of the data matching the query words of each field in the query command obtained by the second parsing subunit.

The second filtering subunit is configured to filter, according to the logical relationship between the query words of the fields obtained by the second parsing subunit, from the storage location of the data obtained by querying by the second query subunit The storage location of the data that satisfies the logical relationship.

The second processing unit 203 is configured to obtain the data according to the storage location of the data that is queried by the second query unit 202, and send the data to the data query server.

The second processing unit 203 sends the data queried by the second query unit 202 to the first output unit 104 of the data query server 10 for outputting the query result of the query request.

The first output unit 104 is configured to receive the data returned by the second processing unit 203 of the data collector 20, form a query result of the query request according to the received data, and output the result.

The data query server, the data collector and the system provided by the embodiment of the present invention use the first index unit to establish a centralized index table in the data query server, and use the second index unit to establish a local index table in the data collector, thereby improving data query. Processing speed. In the case that the data to be queried exists only in a few data collectors, the present invention does not need to query each data collector, which reduces the burden on the data collector. For a log system that requires frequent query, the solution provided by the embodiment of the present invention can effectively reduce the system resource occupation of the data collector, so that the data collector can have more resources for improving the collection performance, thereby improving the overall processing of the system. ability.

Embodiment 3

FIG. 7 is a schematic diagram of the data query server 10 according to the embodiment. As shown in FIG. 7, the data query server 10 includes: a network interface 71, a processor 72, and a memory 73. The system bus 74 is used to connect the network interface 71, the processor 72, and the memory 73.

Network interface 71 is used to communicate with data collector 20.

The memory 73 may be a persistent storage such as a hard disk drive and a flash memory having a software module and a device driver. The software modules are capable of executing the various functional modules of the above described method of the present invention; the device drivers can be network and interface drivers.

At startup, these software components are loaded into memory 73 and then accessed by processor 72 and executed as follows:

Receiving an input query request, where the query request carries a field to be queried and a query word in the field; Obtaining, by the centralized index table corresponding to the field, the collector identifier corresponding to the query term, where the centralized index table stores a correspondence between the query word and the collector identifier in the field; generating according to the query request a query command carrying the field and the query word, and sending the query command to the data collector corresponding to the collector identifier, where the data collector is carried by the query command in the data collector The local index table query corresponding to the field obtains data matching the query word carried in the query command;

The data query server of the present embodiment finds the collector identifier corresponding to the query word through the centralized index table of the field, so that the corresponding data is obtained in the data collector corresponding to the collector identifier, which can effectively alleviate the system resource occupation of the data query server. Improve the processing speed of data queries.

Further, after the processor accesses the software component of the memory 73, the instructions of the following process are executed:

Establishing a centralized index table corresponding to the field for the field;

The establishing a centralized index table corresponding to the field includes:

Receiving a report index table of the field sent by each data collector, where the report index table includes a query word corresponding to the field in a data collector that sends the report index table; establishing the query word and The mapping relationship of the collector identifier forms a centralized index table of the field.

The above instruction process is a process in which the data query server establishes a centralized index table, and establishes a mapping relationship between the query word and the collector identifier, so that when the data is queried, the collector identifier corresponding to the query word is found, and the matching is obtained from the corresponding data collector. The data.

When the query request carries at least two fields to be queried, the query words of each field in the query request are obtained, and the logical relationship between the query words of the fields is recorded;

According to the logical relationship between the query words of the fields, the collector identifier obtained from the query is filtered to obtain a collector identifier that satisfies the logical relationship. The above instruction process is a process in which the data query server searches for a corresponding collector identifier for a plurality of query words of the field to be queried, and can avoid accessing data that cannot fully satisfy the query request.

^l^S^.

FIG. 8 is a schematic diagram of the data collector 20 provided by the embodiment. As shown in FIG. 8, the data collector 20 includes: a network interface 81, a processor 82, and a memory 83. System bus 84 is used to connect network interface 81, processor 82, and memory 83.

Network interface 81 is used to communicate with data query server 10.

The memory 83 may be a persistent storage such as a hard disk drive and a flash memory having a software module and a device driver. The software modules are capable of executing the various functional modules of the above described method of the present invention; the device drivers can be network and interface drivers.

At startup, these software components are loaded into memory 83 and then accessed by processor 82 and executed as follows:

Receiving a query command sent by the data query server, where the query command includes a field to be queried carried in the query request received by the data query server and a query word in the field; querying from a local index table corresponding to the field Obtaining a storage location of the data matching the query command, where the local index table stores a correspondence between the query words in the field and the storage location of the data;

The data is acquired and sent to the data query server according to the storage location of the data. The data collector of the embodiment finds the data corresponding to the query word through the local index table of the field, and provides the data query server, which can effectively reduce the system resource occupation of the data collector, so that the data collector can have more resources for the data collector. Improve the performance of the acquisition and improve the processing speed of data query.

Further, after the processor accesses the software component of the memory 83, the instructions of the following process are executed:

Establishing a local index table corresponding to the field for the field;

The establishing a local index table corresponding to the field includes:

For each field, the content of the data in the field is used as a query word of the data, and a mapping relationship between the query word and the storage location is established, and a local index table of the field in the current data collector is formed. . The above instruction process is a process in which the data collector establishes a local index table, and obtains a data according to a storage location of the data corresponding to the query word by establishing a mapping relationship between the query word and the storage location of the data.

The above instruction process is a process in which the data collector establishes a report index table according to the local index table and sends it to the data query server, so that the data query server establishes a centralized index table.

When the query command carries at least two fields to be queried, the query words of each field in the query command are obtained, and the logical relationship between the query words of the fields is recorded;

Querying, from the local index table corresponding to each field, a storage location of data matching the query words of the respective fields;

The above instruction process is a process in which the data collector finds a corresponding storage location for a plurality of query words of the field to be queried, and can avoid obtaining data that cannot completely satisfy the query command.

A person skilled in the art should further appreciate that the elements and algorithm steps of the various examples described in connection with the embodiments disclosed herein can be implemented in electronic hardware, computer software, or a combination of both, in order to clearly illustrate hardware and software. Interchangeability, the composition and steps of the various examples have been generally described in terms of function in the above description. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the solution. A person skilled in the art can use different methods for implementing the described functions for each particular application, but such implementation should not be considered to be beyond the scope of the present invention.

The steps of a method or algorithm described in connection with the embodiments disclosed herein can be implemented in hardware, a software module executed by a processor, or a combination of both. Software modules can be placed in random access memory (RAM), memory, read only memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, removable disk, CD-ROM, or any other form of storage known in the art. In the medium.

The above described embodiments of the present invention are further described in detail, and the embodiments of the present invention are intended to be illustrative only. The scope of the protection, any modifications, equivalents, improvements, etc., made within the spirit and scope of the invention are intended to be included within the scope of the invention.

Claims

Rights request

1. A data query method, characterized in that the method includes:

Receive an input query request, which carries the field to be queried and the query word in the field;

Query the collector identification corresponding to the query word from the centralized index table corresponding to the field, and store the corresponding relationship between the query word in the field and the collector identification in the centralized index table; Generate according to the query request A query command carrying the field and query word is sent to the data collector corresponding to the collector identification, so that the data collector carries the query command in the data collector. In the local index table corresponding to the field, query to obtain data that matches the query word carried in the query command;

Receive the data returned by the data collector, form the query result of the query request based on the received data and output it.

2. The data query method according to claim 1, characterized in that, before querying the centralized index table corresponding to the field to obtain the collector identifier corresponding to the query word, it further includes: for the field, Establish a centralized index table corresponding to the fields;

The establishment of a centralized index table corresponding to the field includes:

Receive the reporting index table of the field sent by each data collector. The reporting index table includes the query word corresponding to the field in the data collector that sent the reporting index table; in the set of the field In the index table, the correspondence relationship between the identification of the data collector and the query word of this field in the reporting index table reported by the data collector is stored.

3. The data query method according to claim 1, characterized in that: querying the centralized index table corresponding to the field to obtain the collector identification corresponding to the query word includes:

If the query request carries at least two fields to be queried, obtain the query words of each field in the query request, and record the logical relationship between the query words of each field carried in the query request;

Query from the centralized index table corresponding to each field to obtain the query term corresponding to each field. Collector ID;

According to the logical relationship between the query words in each field, the collector identifiers obtained from the query are screened to obtain the collector identifiers that satisfy the logical relationship.

4. A data query method, characterized in that the method includes:

Receive a query command sent by the data query server, the query command carries the field to be queried and the query word in the field;

Search the local index table corresponding to the field to obtain the storage location of the data that matches the query word in the query command. The local index table stores the query word in the field and the storage location of the data. corresponding relationship;

According to the storage location of the data, the data is obtained and sent to the data query server.

5. The data query method according to claim 4, characterized in that, before querying the local index table corresponding to the field to obtain the storage location of the data matching the query word in the query command, include:

For the field, establish a local index table corresponding to the field;

The establishment of a local index table corresponding to the field includes:

Obtain the data in the current data collector and the storage location of the data, where the data includes the content of at least one field;

For each field, use the content of the data in the field as the query word of the data, and store the query word of the data and the data in the local index table of the field in the current data collector. corresponding relationship between storage locations.

6. The data query method according to claim 5, characterized in that, in the local index table of the field in the current data collector, the query words of the data and the storage location of the data are stored. After the corresponding relationship, it also includes:

Extract the query word from the local index table of the field, perform deduplication processing on the query word, and form a reporting index table of the field of the current data collector;

The reported index table of the field is sent to the data query server, so that the data query server establishes a centralized index table corresponding to the field.

7. The data query method according to claim 4, characterized in that: querying the local index table corresponding to the field to obtain the storage location of the data matching the query word in the query command includes:

If the query command carries at least two fields to be queried, obtain the query words of each field in the query command, and record the logical relationship between the query words of each field carried in the query command;

Query from the local index table corresponding to each field to obtain the storage location of the data that matches the query words of each field in the query command;

According to the logical relationship between the query words in each field, the storage locations of the data obtained from the query are filtered to obtain the storage locations of the data that satisfy the logical relationship.

8. A data query server, characterized in that, the data query server includes: a first receiving unit, configured to receive an input query request, where the query request carries the field to be queried and the query word in the field. ;

The first query unit is configured to query the collector identification corresponding to the query word carried in the query request received by the first receiving unit from the centralized index table corresponding to the field, and the centralized index table stores all the collector identifiers. The corresponding relationship between the query words in the above field and the collector identification;

The first processing unit is configured to generate a query command carrying the field and the query word according to the query request, and send the query command to the data collector corresponding to the collector identification obtained by querying the first query unit. , used by the data collector to query the local index table corresponding to the field carried by the query command in the data collector to obtain data that matches the query word carried in the query command;

The first output unit is configured to receive the data returned by the data collector, form the query result of the query request based on the received data, and output it.

9. The data query server according to claim 8, characterized in that the data query server further includes:

The first index unit is used to establish a centralized index table corresponding to the field for the field; the first index unit includes: The first receiving subunit is used to receive the reporting index table of the field sent by each data collector. The reporting index table includes a query corresponding to the field in the data collector that sent the reporting index table. word;

The first index subunit is used to store, in the centralized index table of the field, the corresponding relationship between the identification of the data collector and the query word of the field in the reporting index table reported by the data collector.

10. The data query server according to claim 8, characterized in that the first query unit includes:

The first parsing subunit is used to obtain the query words of each field in the query request if the query request received by the first receiving unit carries at least two fields to be queried, and record the query words in the query request. The logical relationship between the query words carried in each field;

The first query subunit is configured to query the centralized index table corresponding to each field to obtain the collector identifier corresponding to the query word of each field obtained by the first parsing subunit;

The first filtering subunit is configured to filter the collector identification obtained from the query of the first querying subunit according to the logical relationship between the query words of each field obtained by the first parsing subunit. The identifier of the collector that satisfies the logical relationship.

11. A data collector, characterized by including:

The second receiving unit is configured to receive a query command sent by the data query server, where the query command carries the field to be queried and the query word in the field;

The second query unit is configured to query the local index table corresponding to the field to obtain the storage location of the data that matches the query word in the query command received by the second receiving unit, where all the data stored in the local index table are stored. The corresponding relationship between the query words in the field and the storage location of the data;

The second processing unit is configured to obtain the data according to the storage location of the data queried by the second query unit and send it to the data query server.

12. The data collector according to claim 11, characterized in that, the data collector further includes:

The second index unit is used to establish a local index table corresponding to the field for the field; the second index unit includes: The acquisition subunit is used to acquire the data in the current data collector and the storage location of the data, where the data includes the content of at least one field;

The second index subunit is used for each field obtained by the acquisition subunit, using the content of the data in the field as the query word of the data, and the local index of the field in the data collector. In the table, the corresponding relationship between the query words of the data and the storage location of the data is stored.

13. The data collector according to claim 12, characterized in that the second index unit further includes:

The third index subunit is used to extract the query word from the local index table of the field obtained by the second index subunit, and perform deduplication processing on the query word to form all the data collector. Reporting index table for the above fields;

The sending subunit is configured to send the reporting index table of the field formed by the third indexing subunit to the data query server, so that the data query server can establish a centralized index table of the field.

14. The data collector according to claim 11, characterized in that the second query unit includes:

The second parsing subunit is used to obtain the query words of each field in the query command if the query command received by the second receiving unit carries at least two fields to be queried, and record the query words in the query command. The logical relationship between the query words carried in each field;

The second query subunit is used to query the local index table corresponding to each field to obtain the storage location of the data that matches the query words of each field in the query command obtained by the second parsing subunit;

The second filtering subunit is used to filter the storage locations of the data obtained from the query of the second querying subunit according to the logical relationship between the query words of each field obtained by the second parsing subunit. Obtain the storage location of data that satisfies the logical relationship.

15. A data query system, characterized in that the system includes:

The data query server as described in any one of claims 8 to 10 and the data collector as described in any one of claims 11 to 14.