CN107633094B - Method and device for data retrieval in cluster environment - Google Patents

Method and device for data retrieval in cluster environment Download PDF

Info

Publication number
CN107633094B
CN107633094B CN201710939998.5A CN201710939998A CN107633094B CN 107633094 B CN107633094 B CN 107633094B CN 201710939998 A CN201710939998 A CN 201710939998A CN 107633094 B CN107633094 B CN 107633094B
Authority
CN
China
Prior art keywords
query
character string
cluster
retrieval
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710939998.5A
Other languages
Chinese (zh)
Other versions
CN107633094A (en
Inventor
林皓
陶永波
严启阳
张峥嵘
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beixinyuan System Integration Co Ltd
Original Assignee
Beixinyuan System Integration Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beixinyuan System Integration Co Ltd filed Critical Beixinyuan System Integration Co Ltd
Priority to CN201710939998.5A priority Critical patent/CN107633094B/en
Publication of CN107633094A publication Critical patent/CN107633094A/en
Application granted granted Critical
Publication of CN107633094B publication Critical patent/CN107633094B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a method for data retrieval in a cluster environment, which comprises configuring information based on the cluster environment, defining a set of keywords and predefining usage rules of the keywords, analyzing a query relation by using an analyzer, integrating analyzed parts according to the analyzed result to perform query, sending the integrated result to a cluster to perform request, acquiring response and acquiring final retrieval result from the response. The invention has the advantages of convenient and quick query and low cost.

Description

Method and device for data retrieval in cluster environment
Technical Field
The present invention relates to the field of information control technology, and more particularly, to a method and apparatus for data retrieval in a cluster environment.
Background
At present, the retrieval of cluster data mainly adopts a mode based on an open API (application program interface) interface of the cluster. However, the method has great limitations, and is mainly reflected in that the query method in the distributed cluster has great difference from the existing mature relational database, and the T-SQL method commonly used in the industry at present cannot be used for quick query; slightly complicated query needs to be realized by writing codes, so that the cost is high; there is a certain difficulty for operation and maintenance personnel to use the clusters quickly and efficiently.
Therefore, how to design a data retrieval method, which can conveniently use the mature T-SQL standard in the industry to quickly and efficiently query and retrieve data in a distributed cluster environment becomes a technical problem to be solved urgently.
Disclosure of Invention
To this end, it is an object of the invention to propose a method for data retrieval in a clustered environment.
It is another object of the present invention to provide an apparatus for data retrieval in a cluster environment.
In order to achieve the above object, according to an aspect of the present invention, a method for data retrieval in a cluster environment is provided, where the method includes configuring information based on the cluster environment; defining a set of keywords and predefining usage rules for the keywords; analyzing the query relation by using an analyzer; according to the analyzed result, integrating the analyzed parts to query; sending the integrated result to the cluster for requesting; and obtaining the response and obtaining the final retrieval result from the response.
According to one embodiment of the invention, the information comprises cluster server IP, cluster name, port number.
According to one embodiment of the invention, a collection contains keywords used in a T-SQL query.
According to an embodiment of the present invention, the parsing the query relation using the parser specifically includes:
acquiring a character string from the query relational expression, and performing matching verification on the character string and the SELECT key words, wherein the rule of the matching verification is as follows:
if the character string is equal to the SELECT keyword, the character string is of the search class,
if the string is not equal to the SELECT keyword, the string is not a search type.
According to an embodiment of the invention, for the retrieval class, the query relation is divided according to a SELECT keyword, a FROM keyword and a WHERE keyword, a part between the SELECT keyword and the FROM keyword is divided into M segments, a part between the FROM keyword and the WHERE keyword is divided into N segments, and a part behind the WHERE keyword is divided into Q segments.
According to another aspect of the present invention, there is provided an apparatus for data retrieval in a cluster environment, the apparatus comprising a module for configuring information based on the cluster environment; a module defining a set of keywords and predefining usage rules for the keywords; a module for analyzing the query relation by using an analyzer; a module for integrating the analyzed parts to query according to the analyzed result; a module that sends the integrated result to a cluster for request; and a module that obtains the response and obtains a final retrieval result from the response.
According to one embodiment of the invention, the information comprises cluster server IP, cluster name, port number.
According to one embodiment of the invention, a collection contains keywords used in a T-SQL query.
According to an embodiment of the present invention, the module for parsing the query relation using the parser further comprises:
a sub-module for obtaining the character string from the query relational expression and performing matching check on the character string and the SELECT keyword, wherein the rule of the matching check is as follows:
if the character string is equal to the SELECT keyword, the character string is a retrieval class,
and if the character string is not equal to the SELECT keyword, the character string is in a non-retrieval type.
According to an aspect of the present invention, there is also provided a computer-readable storage medium, on which a computer program (instructions) for implementing data retrieval in a cluster environment is stored, the program (instructions) implementing the method according to any one of the above aspects when executed by a processor.
Additional aspects and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.
Drawings
FIG. 1 illustrates a flow diagram of a method of data retrieval in a cluster environment, according to one embodiment of the invention;
FIG. 2 illustrates a workflow diagram of a parser in accordance with another embodiment of the invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to the accompanying drawings. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
FIG. 1 illustrates a method for data retrieval in a clustered environment, the method beginning at step S01, in accordance with one embodiment of the present invention. At step S01, relevant information is configured based on the cluster environment, including key information such as cluster server IP address, name of cluster, port number, etc., and the method proceeds to step S02. At step S02, a set of keywords supported by the parser is defined, where the set of keywords includes keywords used in the T-SQL query, and the method then proceeds to step S03. At step S03, usage rules for the keywords are predefined, e.g., combined usage keywords, etc., and the method then proceeds to step S04. In step S04, the query sentence is parsed by the parser, that is, the keywords supported by the parser in the query sentence used by the user are parsed by the parser, which is as follows:
for each operation statement, intercepting a part before a first space and filtering invalid characters such as the spaces in the intercepted part, and then judging the obtained final character string, namely performing matching check on the character string and a SELECT (SQL query statement), wherein if the character string is equal to (namely the character string and the SELECT keyword are completely the same as) a retrieval character string, and if the character string is not equal to (namely the character string and the SELECT keyword are not completely the same as) the selection keyword, the character string is a non-retrieval character string.
For the retrieval type character string, dividing the T-SQL query statement into three parts according to SELECT, FROM and WHERE: the M section between the SELECT and the FROM, the N section between the FROM and the WHERE, and the Q section after the WHERE.
For the M segment, a cut is made by using a space to obtain an array, and a preceding space and a following space are removed for each item in the array. And then checking whether the array contains other T-SQL keywords or not, if so, analyzing the T-SQL keywords and judging the query type according to the keywords, and if not, analyzing each item in the array into a field required to be acquired by the query.
For the N segment, similarly to the M segment, a cut is made by using a space to obtain an array, and a preceding and following space is removed for each item in the array. And then, analyzing the obtained array as a data source to be queried.
For the Q section, judging whether the Q section contains any key word of ORDER BY, GROUP BY, LIMIT and the like, and if so, resolving a part between WHERE and ORDER BY \ GROUPBY \ LIMIT as a query condition; if not, all parts after WHERE are resolved to query conditions. When the segment Q includes LIMIT, the number after LIMIT is acquired and analyzed as: the data from the item number to the item number defined by LIMIT is filtered out of the data satisfying the query. The method then proceeds to step S05.
In step S05, the respective parts obtained after parsing are integrated into a final call form by using the corresponding API interfaces according to the parsing result in step S04, and the method then proceeds to step S06. At step S06, the integrated API call form is sent to the cluster for request, and the method then proceeds to step S07. In step S07, the response from the cluster is obtained and the response result is filtered according to the parsed search field to get the final search result, and the method ends.
FIG. 2 illustrates a workflow diagram of a parser in accordance with another embodiment of the invention. Firstly, a user determines a corresponding query statement according to keywords supported by a parser, and parses the query statement to determine a query mode, if the query mode is a data type query mode, a field needing to be queried is parsed from the query statement, then a data source with query is obtained according to a parsing result, and if the query mode is a statistic type query, the query statement is directly parsed to obtain the queried data source. Then, analyzing the query conditions from the query statement, analyzing grouping, sequencing conditions and the like according to the query conditions, analyzing the queried data source according to the query conditions to obtain data fragments which meet the query conditions and need to be returned, and combining the returned data fragments into a query API (application programming interface) for analysis.
According to another embodiment of the present invention, an apparatus for data retrieval in a clustered environment comprises a module for configuring information based on the clustered environment, the information including cluster server IP addresses, cluster names, port numbers, and the like; a module that defines a set of keywords and predefines usage rules for the keywords (e.g., combined usage keywords), wherein the set of keywords includes keywords used in a T-SQL query; a module for analyzing the query relation by using the analyzer, specifically, the module uses the analyzer to analyze the keywords supported by the analyzer in the query sentence used by the user; according to the result of the analysis, integrating each part obtained after the analysis into a final calling form by using a corresponding API (application program interface) interface so as to carry out a module for inquiring; a module for sending the integrated API calling form to the cluster for request; and a module for obtaining the response from the cluster and filtering the response result from the cluster according to the analyzed retrieval field to obtain a final result.
With respect to the processes, systems, methods, etc., described herein, it should be understood that although the steps of such processes, etc., are described as occurring in a certain order, such processes may perform operations with the described steps performed in an order other than the order described herein. It is further understood that certain steps may be performed simultaneously, that other steps may be added, or that certain steps described herein may be omitted. In other words, the description of the processes herein is provided for the purpose of illustrating certain embodiments and should not be construed in any way as limiting the claimed invention.
Accordingly, it is to be understood that the above description is intended to be illustrative, and not restrictive. Many embodiments and applications other than the examples provided will be apparent upon reading the above description. The scope of the invention should be determined with reference to the appended claims, along with the full scope of equivalents to which such claims are entitled, and not by reference to the above description. It is expected that further developments will occur in the arts discussed herein, and that the disclosed systems and methods will be incorporated into such future embodiments. In sum, it is to be understood that the invention is capable of modification and variation.
It should also be understood that any described process or steps in a described process may be combined with other disclosed processes or steps to form structures within the scope of the present disclosure. The exemplary structures, and processes disclosed herein are for purposes of illustration and are not to be construed as limiting.

Claims (6)

1. A method for data retrieval in a clustered environment, the method comprising the steps of:
configuring information based on the cluster environment;
defining a set of keywords and predefining usage rules for the keywords, the set containing keywords used in a T-SQL query;
analyzing the query relation by using an analyzer;
according to the analyzed result, integrating the analyzed part for query, wherein the integrated result is in an API calling form;
sending the integrated result to the cluster for requesting; and
obtaining a response and obtaining a final retrieval result from the response;
the analyzing the query relation by using the analyzer specifically comprises:
acquiring a character string from the query relational expression, and performing matching check on the character string and the SELECT keyword, wherein the rule of the matching check is as follows:
if the character string is equal to the SELECT keyword, the character string is a retrieval class,
and if the character string is not equal to the SELECT keyword, the character string is in a non-retrieval type.
2. The method of claim 1, wherein the information comprises cluster server IP, cluster name, and port number.
3. The method of claim 1, wherein for the retrieval class, the query relation is divided according to a SELECT key, a FROM key, and a WHERE key, wherein a portion between the SELECT key and the FROM key is divided into M segments, a portion between the FROM key and the WHERE key is divided into N segments, and a portion after the WHERE key is divided into Q segments.
4. An apparatus for data retrieval in a clustered environment, the apparatus comprising:
a module to configure information based on the cluster environment;
a module defining a set of keywords and predefining usage rules for the keywords, the set containing keywords used in a T-SQL query;
a module for analyzing the query relation by using an analyzer;
integrating the analyzed part according to the analyzed result to carry out a query module, wherein the integrated result is in an API calling form;
a module that sends the integrated result to the cluster for request; and
a module for obtaining a response and obtaining a final retrieval result from the response;
the module for resolving the query relation by using the resolver further comprises:
a sub-module for obtaining a character string from the query relational expression and performing matching check on the character string and the SELECT keyword, wherein the rule of the matching check is as follows:
if the character string is equal to the SELECT keyword, the character string is a retrieval class,
and if the character string is not equal to the SELECT keyword, the character string is in a non-retrieval type.
5. An arrangement for data retrieval in a clustered environment according to claim 4 where said information comprises cluster server IP, cluster name, port number.
6. A computer-readable storage medium, on which a computer program is stored for enabling data retrieval in a clustered environment, characterized in that the program, when being executed by a processor, is adapted to carry out the method of any one of the claims 1-3.
CN201710939998.5A 2017-10-11 2017-10-11 Method and device for data retrieval in cluster environment Active CN107633094B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710939998.5A CN107633094B (en) 2017-10-11 2017-10-11 Method and device for data retrieval in cluster environment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710939998.5A CN107633094B (en) 2017-10-11 2017-10-11 Method and device for data retrieval in cluster environment

Publications (2)

Publication Number Publication Date
CN107633094A CN107633094A (en) 2018-01-26
CN107633094B true CN107633094B (en) 2020-12-29

Family

ID=61104284

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710939998.5A Active CN107633094B (en) 2017-10-11 2017-10-11 Method and device for data retrieval in cluster environment

Country Status (1)

Country Link
CN (1) CN107633094B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109299101B (en) * 2018-10-15 2020-12-01 上海达梦数据库有限公司 Data retrieval method, device, server and storage medium
CN111782766B (en) * 2020-06-30 2023-02-24 福建健康之路信息技术有限公司 Method and system for retrieving all resources in Kubernetes cluster through keywords

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9515869B2 (en) * 2012-01-18 2016-12-06 Dh2I Company Systems and methods for server cluster application virtualization
CN103412897B (en) * 2013-07-25 2017-03-01 中国科学院软件研究所 A kind of parallel data processing method based on distributed frame
CN104657439B (en) * 2015-01-30 2019-12-13 欧阳江 Structured query statement generation system and method for precise retrieval of natural language
CN106844380A (en) * 2015-12-04 2017-06-13 阿里巴巴集团控股有限公司 A kind of database operation method, information processing method and related device
CN106649455B (en) * 2016-09-24 2021-01-12 孙燕群 Standardized system classification and command set system for big data development
CN106991183B (en) * 2017-03-27 2019-09-06 福建数林信息科技有限公司 A kind of packaging method and system of business intelligence ETL
CN107180113B (en) * 2017-06-16 2020-12-29 成都亿橙科技有限公司 Big data retrieval platform

Also Published As

Publication number Publication date
CN107633094A (en) 2018-01-26

Similar Documents

Publication Publication Date Title
CN110633292B (en) Query method, device, medium, equipment and system for heterogeneous database
CN106407360B (en) Data processing method and device
CN109388523B (en) Method for recovering MySQL database based on binary log file
CN110457346B (en) Data query method, device and computer readable storage medium
CN110659282B (en) Data route construction method, device, computer equipment and storage medium
CN112765282B (en) Data online analysis processing method, device, equipment and storage medium
EP2862101B1 (en) Method and a consistency checker for finding data inconsistencies in a data repository
CN110597844B (en) Unified access method for heterogeneous database data and related equipment
CN112579610A (en) Multi-data source structure analysis method, system, terminal device and storage medium
CN111191276A (en) Data desensitization method and device, storage medium and computer equipment
CN111198898B (en) Big data query method and big data query device
CN108388606B (en) Method for checking base table field names in Sql sentences and computer equipment
US9053207B2 (en) Adaptive query expression builder for an on-demand data service
CN111680043B (en) Method for quickly retrieving mass data
CN107633094B (en) Method and device for data retrieval in cluster environment
CN114116762A (en) Offline data fuzzy search method, device, equipment and medium
CN110019306B (en) SQL statement searching method and system based on XML format file
CN111078728B (en) Cross-database query method and device in database archiving mode
CN110580170B (en) Method and device for identifying software performance risk
US20150347506A1 (en) Methods and apparatus for specifying query execution plans in database management systems
CN111177506A (en) Classification storage method and system based on big data
CN113934430A (en) Data retrieval analysis method and device, electronic equipment and storage medium
CN113448965A (en) Method, device and equipment for determining full-table-scanning structured query statement
CN108268517B (en) Method and system for managing labels in database
CN114428789B (en) Data processing method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: Room 2298, Yingying building, 99 Tuanjie Road, yanchuangyuan, Jiangbei new district, Nanjing, Jiangsu Province, 211800

Applicant after: Beixinyuan system integration Co., Ltd

Address before: No.3 Ruiyun Road, Jiangpu street, Pukou District, Nanjing, Jiangsu Province, 211899

Applicant before: JIANGSU SHENZHOU XINYUAN SYSTEM ENGINEERING Co.,Ltd.

GR01 Patent grant
GR01 Patent grant