CN107633094B

CN107633094B - Method and device for data retrieval in cluster environment

Info

Publication number: CN107633094B
Application number: CN201710939998.5A
Authority: CN
Inventors: 林皓; 陶永波; 严启阳; 张峥嵘
Original assignee: Beixinyuan System Integration Co Ltd
Current assignee: Beixinyuan System Integration Co Ltd
Priority date: 2017-10-11
Filing date: 2017-10-11
Publication date: 2020-12-29
Anticipated expiration: 2037-10-11
Also published as: CN107633094A

Abstract

The invention provides a method for data retrieval in a cluster environment, which comprises configuring information based on the cluster environment, defining a set of keywords and predefining usage rules of the keywords, analyzing a query relation by using an analyzer, integrating analyzed parts according to the analyzed result to perform query, sending the integrated result to a cluster to perform request, acquiring response and acquiring final retrieval result from the response. The invention has the advantages of convenient and quick query and low cost.

Description

Method and device for data retrieval in cluster environment

Technical Field

The present invention relates to the field of information control technology, and more particularly, to a method and apparatus for data retrieval in a cluster environment.

Background

At present, the retrieval of cluster data mainly adopts a mode based on an open API (application program interface) interface of the cluster. However, the method has great limitations, and is mainly reflected in that the query method in the distributed cluster has great difference from the existing mature relational database, and the T-SQL method commonly used in the industry at present cannot be used for quick query; slightly complicated query needs to be realized by writing codes, so that the cost is high; there is a certain difficulty for operation and maintenance personnel to use the clusters quickly and efficiently.

Therefore, how to design a data retrieval method, which can conveniently use the mature T-SQL standard in the industry to quickly and efficiently query and retrieve data in a distributed cluster environment becomes a technical problem to be solved urgently.

Disclosure of Invention

To this end, it is an object of the invention to propose a method for data retrieval in a clustered environment.

It is another object of the present invention to provide an apparatus for data retrieval in a cluster environment.

In order to achieve the above object, according to an aspect of the present invention, a method for data retrieval in a cluster environment is provided, where the method includes configuring information based on the cluster environment; defining a set of keywords and predefining usage rules for the keywords; analyzing the query relation by using an analyzer; according to the analyzed result, integrating the analyzed parts to query; sending the integrated result to the cluster for requesting; and obtaining the response and obtaining the final retrieval result from the response.

According to one embodiment of the invention, the information comprises cluster server IP, cluster name, port number.

According to one embodiment of the invention, a collection contains keywords used in a T-SQL query.

According to an embodiment of the present invention, the parsing the query relation using the parser specifically includes:

acquiring a character string from the query relational expression, and performing matching verification on the character string and the SELECT key words, wherein the rule of the matching verification is as follows:

if the character string is equal to the SELECT keyword, the character string is of the search class,

if the string is not equal to the SELECT keyword, the string is not a search type.

According to an embodiment of the invention, for the retrieval class, the query relation is divided according to a SELECT keyword, a FROM keyword and a WHERE keyword, a part between the SELECT keyword and the FROM keyword is divided into M segments, a part between the FROM keyword and the WHERE keyword is divided into N segments, and a part behind the WHERE keyword is divided into Q segments.

According to another aspect of the present invention, there is provided an apparatus for data retrieval in a cluster environment, the apparatus comprising a module for configuring information based on the cluster environment; a module defining a set of keywords and predefining usage rules for the keywords; a module for analyzing the query relation by using an analyzer; a module for integrating the analyzed parts to query according to the analyzed result; a module that sends the integrated result to a cluster for request; and a module that obtains the response and obtains a final retrieval result from the response.

According to an embodiment of the present invention, the module for parsing the query relation using the parser further comprises:

a sub-module for obtaining the character string from the query relational expression and performing matching check on the character string and the SELECT keyword, wherein the rule of the matching check is as follows:

if the character string is equal to the SELECT keyword, the character string is a retrieval class,

and if the character string is not equal to the SELECT keyword, the character string is in a non-retrieval type.

According to an aspect of the present invention, there is also provided a computer-readable storage medium, on which a computer program (instructions) for implementing data retrieval in a cluster environment is stored, the program (instructions) implementing the method according to any one of the above aspects when executed by a processor.

Additional aspects and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.

Drawings

FIG. 1 illustrates a flow diagram of a method of data retrieval in a cluster environment, according to one embodiment of the invention;

FIG. 2 illustrates a workflow diagram of a parser in accordance with another embodiment of the invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to the accompanying drawings. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

FIG. 1 illustrates a method for data retrieval in a clustered environment, the method beginning at step S01, in accordance with one embodiment of the present invention. At step S01, relevant information is configured based on the cluster environment, including key information such as cluster server IP address, name of cluster, port number, etc., and the method proceeds to step S02. At step S02, a set of keywords supported by the parser is defined, where the set of keywords includes keywords used in the T-SQL query, and the method then proceeds to step S03. At step S03, usage rules for the keywords are predefined, e.g., combined usage keywords, etc., and the method then proceeds to step S04. In step S04, the query sentence is parsed by the parser, that is, the keywords supported by the parser in the query sentence used by the user are parsed by the parser, which is as follows:

for each operation statement, intercepting a part before a first space and filtering invalid characters such as the spaces in the intercepted part, and then judging the obtained final character string, namely performing matching check on the character string and a SELECT (SQL query statement), wherein if the character string is equal to (namely the character string and the SELECT keyword are completely the same as) a retrieval character string, and if the character string is not equal to (namely the character string and the SELECT keyword are not completely the same as) the selection keyword, the character string is a non-retrieval character string.

For the retrieval type character string, dividing the T-SQL query statement into three parts according to SELECT, FROM and WHERE: the M section between the SELECT and the FROM, the N section between the FROM and the WHERE, and the Q section after the WHERE.

For the M segment, a cut is made by using a space to obtain an array, and a preceding space and a following space are removed for each item in the array. And then checking whether the array contains other T-SQL keywords or not, if so, analyzing the T-SQL keywords and judging the query type according to the keywords, and if not, analyzing each item in the array into a field required to be acquired by the query.

For the N segment, similarly to the M segment, a cut is made by using a space to obtain an array, and a preceding and following space is removed for each item in the array. And then, analyzing the obtained array as a data source to be queried.

For the Q section, judging whether the Q section contains any key word of ORDER BY, GROUP BY, LIMIT and the like, and if so, resolving a part between WHERE and ORDER BY \ GROUPBY \ LIMIT as a query condition; if not, all parts after WHERE are resolved to query conditions. When the segment Q includes LIMIT, the number after LIMIT is acquired and analyzed as: the data from the item number to the item number defined by LIMIT is filtered out of the data satisfying the query. The method then proceeds to step S05.

In step S05, the respective parts obtained after parsing are integrated into a final call form by using the corresponding API interfaces according to the parsing result in step S04, and the method then proceeds to step S06. At step S06, the integrated API call form is sent to the cluster for request, and the method then proceeds to step S07. In step S07, the response from the cluster is obtained and the response result is filtered according to the parsed search field to get the final search result, and the method ends.

FIG. 2 illustrates a workflow diagram of a parser in accordance with another embodiment of the invention. Firstly, a user determines a corresponding query statement according to keywords supported by a parser, and parses the query statement to determine a query mode, if the query mode is a data type query mode, a field needing to be queried is parsed from the query statement, then a data source with query is obtained according to a parsing result, and if the query mode is a statistic type query, the query statement is directly parsed to obtain the queried data source. Then, analyzing the query conditions from the query statement, analyzing grouping, sequencing conditions and the like according to the query conditions, analyzing the queried data source according to the query conditions to obtain data fragments which meet the query conditions and need to be returned, and combining the returned data fragments into a query API (application programming interface) for analysis.

According to another embodiment of the present invention, an apparatus for data retrieval in a clustered environment comprises a module for configuring information based on the clustered environment, the information including cluster server IP addresses, cluster names, port numbers, and the like; a module that defines a set of keywords and predefines usage rules for the keywords (e.g., combined usage keywords), wherein the set of keywords includes keywords used in a T-SQL query; a module for analyzing the query relation by using the analyzer, specifically, the module uses the analyzer to analyze the keywords supported by the analyzer in the query sentence used by the user; according to the result of the analysis, integrating each part obtained after the analysis into a final calling form by using a corresponding API (application program interface) interface so as to carry out a module for inquiring; a module for sending the integrated API calling form to the cluster for request; and a module for obtaining the response from the cluster and filtering the response result from the cluster according to the analyzed retrieval field to obtain a final result.

With respect to the processes, systems, methods, etc., described herein, it should be understood that although the steps of such processes, etc., are described as occurring in a certain order, such processes may perform operations with the described steps performed in an order other than the order described herein. It is further understood that certain steps may be performed simultaneously, that other steps may be added, or that certain steps described herein may be omitted. In other words, the description of the processes herein is provided for the purpose of illustrating certain embodiments and should not be construed in any way as limiting the claimed invention.

Accordingly, it is to be understood that the above description is intended to be illustrative, and not restrictive. Many embodiments and applications other than the examples provided will be apparent upon reading the above description. The scope of the invention should be determined with reference to the appended claims, along with the full scope of equivalents to which such claims are entitled, and not by reference to the above description. It is expected that further developments will occur in the arts discussed herein, and that the disclosed systems and methods will be incorporated into such future embodiments. In sum, it is to be understood that the invention is capable of modification and variation.

It should also be understood that any described process or steps in a described process may be combined with other disclosed processes or steps to form structures within the scope of the present disclosure. The exemplary structures, and processes disclosed herein are for purposes of illustration and are not to be construed as limiting.

Claims

1. A method for data retrieval in a clustered environment, the method comprising the steps of:

configuring information based on the cluster environment;

defining a set of keywords and predefining usage rules for the keywords, the set containing keywords used in a T-SQL query;

analyzing the query relation by using an analyzer;

according to the analyzed result, integrating the analyzed part for query, wherein the integrated result is in an API calling form;

sending the integrated result to the cluster for requesting; and

obtaining a response and obtaining a final retrieval result from the response;

the analyzing the query relation by using the analyzer specifically comprises:

acquiring a character string from the query relational expression, and performing matching check on the character string and the SELECT keyword, wherein the rule of the matching check is as follows:

2. The method of claim 1, wherein the information comprises cluster server IP, cluster name, and port number.

3. The method of claim 1, wherein for the retrieval class, the query relation is divided according to a SELECT key, a FROM key, and a WHERE key, wherein a portion between the SELECT key and the FROM key is divided into M segments, a portion between the FROM key and the WHERE key is divided into N segments, and a portion after the WHERE key is divided into Q segments.

4. An apparatus for data retrieval in a clustered environment, the apparatus comprising:

a module to configure information based on the cluster environment;

a module defining a set of keywords and predefining usage rules for the keywords, the set containing keywords used in a T-SQL query;

a module for analyzing the query relation by using an analyzer;

integrating the analyzed part according to the analyzed result to carry out a query module, wherein the integrated result is in an API calling form;

a module that sends the integrated result to the cluster for request; and

a module for obtaining a response and obtaining a final retrieval result from the response;

the module for resolving the query relation by using the resolver further comprises:

a sub-module for obtaining a character string from the query relational expression and performing matching check on the character string and the SELECT keyword, wherein the rule of the matching check is as follows:

5. An arrangement for data retrieval in a clustered environment according to claim 4 where said information comprises cluster server IP, cluster name, port number.

6. A computer-readable storage medium, on which a computer program is stored for enabling data retrieval in a clustered environment, characterized in that the program, when being executed by a processor, is adapted to carry out the method of any one of the claims 1-3.