CN112860727B - Data query method, device, equipment and medium based on big data query engine - Google Patents

Data query method, device, equipment and medium based on big data query engine Download PDF

Info

Publication number
CN112860727B
CN112860727B CN202110193443.7A CN202110193443A CN112860727B CN 112860727 B CN112860727 B CN 112860727B CN 202110193443 A CN202110193443 A CN 202110193443A CN 112860727 B CN112860727 B CN 112860727B
Authority
CN
China
Prior art keywords
query
information
statement
target
sql
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110193443.7A
Other languages
Chinese (zh)
Other versions
CN112860727A (en
Inventor
包云飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN202110193443.7A priority Critical patent/CN112860727B/en
Publication of CN112860727A publication Critical patent/CN112860727A/en
Application granted granted Critical
Publication of CN112860727B publication Critical patent/CN112860727B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2433Query languages
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/335Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The method comprises the steps of analyzing an obtained SQL query statement to obtain a corresponding query table and filtering information in the SQL query statement, obtaining associated file information corresponding to the query table, generating a target query table by combining the filtering information, carrying out matching processing on the SQL query statement, replacing the corresponding query table in the matched SQL query statement by the target query table to obtain a target SQL query statement, carrying out data query on the target SQL query statement by the big data query engine, and sending the obtained query result to a user side. The application also relates to blockchain techniques in which query results are stored. According to the method and the device, the related file information and the filtering conditions are added into the query statement, so that the data query range is reduced, and the data query efficiency of big data is improved.

Description

Data query method, device, equipment and medium based on big data query engine
Technical Field
The present disclosure relates to the field of data query technologies, and in particular, to a data query method, device, equipment, and medium based on a big data query engine.
Background
The use of big data technology to query and analyze large amounts of historical data is well established. The big data technology inquiry has the following characteristics: the data size is large, and can reach the data size of several GB to TB, even some task inquired data can be close to or reach PB level; a large number of tasks are query tasks containing SQL-like or SQL queries themselves, basically filtering source data in an SQL-like manner; a large number of tasks filter data by date.
The existing big data query is applied by data cleaning processing through a data warehouse technology or by submitting timing task advance calculation in a job mode, and then the data query is performed through data scanning and data mapping modes. However, the large amount of data results in a need to scan a large amount of data, which results in a low efficiency of data query and a long query time. There is a need for a method that can improve the efficiency of data queries on large data.
Disclosure of Invention
The embodiment of the application aims to provide a data query method, device, equipment and medium based on a big data query engine so as to improve the data query efficiency of big data.
In order to solve the above technical problems, an embodiment of the present application provides a data query method based on a big data query engine, including:
acquiring SQL query sentences from a user side;
analyzing the SQL query statement to obtain a corresponding query table and filtering information in the SQL query statement;
acquiring associated file information corresponding to the lookup table, and generating a target lookup table according to the associated file information and the filtering information;
acquiring a preset matching rule, and carrying out matching processing on the SQL query statement through the preset matching rule to obtain a matched SQL query statement;
replacing the corresponding query table in the matched SQL query statement with the target query table to obtain a target SQL query statement;
and carrying out data query on the target SQL query statement through a big data query engine to obtain a query result, and sending the query result to the user side.
In order to solve the above technical problems, an embodiment of the present application provides a data query device based on a big data query engine, including:
the query statement acquisition module is used for acquiring SQL query statements from the user side;
the query statement analysis module is used for analyzing the SQL query statement to acquire a corresponding query table and filtering information in the SQL query statement;
The target lookup table generation module is used for acquiring the associated file information corresponding to the lookup table and generating a target lookup table according to the associated file information and the filtering information;
the query statement matching module is used for acquiring a preset matching rule, and carrying out matching processing on the SQL query statement through the preset matching rule to obtain a matched SQL query statement;
the query table replacing module is used for replacing the corresponding query table in the matched SQL query statement with the target query table to obtain a target SQL query statement;
and the query result generation module is used for carrying out data query on the target SQL query statement through a big data query engine to obtain a query result and sending the query result to the user side.
In order to solve the technical problems, the invention adopts a technical scheme that: a computer device is provided comprising one or more processors; and the memory is used for storing one or more programs, so that the one or more processors can realize the data query method based on the big data query engine.
In order to solve the technical problems, the invention adopts a technical scheme that: a computer readable storage medium having stored thereon a computer program which when executed by a processor implements the big data query engine based data query method of any of the above.
The embodiment of the invention provides a data query method, a device, equipment and a medium based on a big data query engine. According to the embodiment of the invention, the obtained SQL query statement is analyzed to obtain the corresponding query table and the filtering information in the SQL query statement, then the associated file information corresponding to the query table is obtained, and the target query table is generated according to the associated file information and the filtering information, so that the associated file information and the filtering condition are combined, and the data query range is narrowed; and finally, carrying out data query on the target SQL query statement through a big data query engine to obtain a query result, and sending the query result to a user side, so that the associated file information and the filtering condition are added into the query statement, the data query range is greatly reduced, and the data query efficiency of big data is improved.
Drawings
For a clearer description of the solution in the present application, a brief description will be given below of the drawings that are needed in the description of the embodiments of the present application, it being obvious that the drawings in the following description are some embodiments of the present application, and that other drawings may be obtained from these drawings without inventive effort for a person of ordinary skill in the art.
Fig. 1 is an application environment schematic diagram of a data query method based on a big data query engine according to an embodiment of the present application;
FIG. 2 is a flowchart of an implementation of a data query method based on a big data query engine according to an embodiment of the present application;
FIG. 3 is a flowchart of an implementation of a sub-process in a data query method based on a big data query engine according to an embodiment of the present application;
FIG. 4 is a flowchart of still another implementation of a sub-process in a data query method based on a big data query engine provided in an embodiment of the present application;
FIG. 5 is a flowchart of still another implementation of a sub-process in a data query method based on a big data query engine provided in an embodiment of the present application;
FIG. 6 is a flowchart of still another implementation of a sub-process in a data query method based on a big data query engine provided in an embodiment of the present application;
FIG. 7 is a flowchart of still another implementation of a sub-process in a data query method based on a big data query engine provided in an embodiment of the present application;
FIG. 8 is a flowchart of still another implementation of a sub-process in a data query method based on a big data query engine provided in an embodiment of the present application;
FIG. 9 is a schematic diagram of a data query device based on a big data query engine according to an embodiment of the present application;
Fig. 10 is a schematic diagram of a computer device provided in an embodiment of the present application.
Detailed Description
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs; the terminology used in the description of the applications herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application; the terms "comprising" and "having" and any variations thereof in the description and claims of the present application and in the description of the figures above are intended to cover non-exclusive inclusions. The terms first, second and the like in the description and in the claims or in the above-described figures, are used for distinguishing between different objects and not necessarily for describing a sequential or chronological order.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the present application. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments.
In order to better understand the technical solutions of the present application, the following description will clearly and completely describe the technical solutions in the embodiments of the present application with reference to the accompanying drawings.
The present invention will be described in detail with reference to the drawings and embodiments.
Referring to fig. 1, a system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 is used as a medium to provide communication links between the terminal devices 101, 102, 103 and the server 105. The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.
The user may interact with the server 105 via the network 104 using the terminal devices 101, 102, 103 to receive or send messages or the like. Various communication client applications, such as a web browser application, a search class application, an instant messaging tool, etc., may be installed on the terminal devices 101, 102, 103.
The terminal devices 101, 102, 103 may be a variety of electronic devices having a display screen and supporting web browsing, including but not limited to smartphones, tablets, laptop and desktop computers, and the like.
The server 105 may be a server providing various services, such as a background server providing support for pages displayed on the terminal devices 101, 102, 103.
It should be noted that, the data query method based on the big data query engine provided in the embodiments of the present application is generally executed by a server, and accordingly, the data query device based on the big data query engine is generally configured in the server.
It should be understood that the number of terminal devices, networks and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
Referring to FIG. 2, FIG. 2 illustrates one embodiment of a data query method based on a big data query engine.
It should be noted that, if there are substantially the same results, the method of the present invention is not limited to the flow sequence shown in fig. 2, and the method includes the following steps:
s1: and acquiring the SQL query statement from the user side.
Specifically, when the big data is required to be subjected to data query, the user side constructs an SQL query statement according to actual requirements and sends the SQL query statement to the server, and after receiving the SQL query statement, the server starts to perform subsequent data query steps.
The structured query language (Structured Query Language) is abbreviated as SQL, is a special purpose programming language, and is a database query and programming language for accessing data and querying, updating and managing relational database systems. In the embodiment of the application, the SQL query statement is a query statement written according to a structured query language and is used for carrying out data query on big data.
S2: analyzing the SQL query statement to obtain a corresponding query table and filtering information in the SQL query statement.
Specifically, by analyzing the SQL query statement, an execution plan corresponding to the SQL query statement is obtained, and the execution plan includes metadata information such as various query tables, filtering conditions, file field information and the like which are required to be accessed by the data query task, so that after the metadata information of the execution plan is extracted, the corresponding query tables and filtering information in the execution plan are obtained.
The filtering information refers to filtering conditions and filtering information for limiting the file to be queried in the SQL query statement. The filtering condition and the filtering information can respectively comprise date filtering information and file field information, and the filtering condition is provided when the data query is carried out later so as to reduce the file scanning range and further improve the data query efficiency of big data. In a specific embodiment, the date screening information is "where data > '2020-10-02'" exists in the SQL query statement, the date screening information is a file with a scanning date after 10 months and 2 days in 2020, the file field information is "id=huodong" in the SQL query statement, the file field information is a file with "huodong" in the name of the file, the log screening information and the file field information are combined, and the result is that the scanning date is after 10 months and 2 days in 2020, and the file name is a file with "huodong". Thus, when the file is scanned later, the scanning range can be greatly reduced.
S3: and acquiring associated file information corresponding to the lookup table, and generating a target lookup table according to the associated file information and the filtering information.
Specifically, since the relevant information of the file to be queried exists in the query table, the external associated file information with the associated information of the query table is obtained by extracting the file information in the query table and according to the file information; and integrating the associated file information and the filtering information to generate a file list containing the associated file information and the filtering information, and finally constructing the file list to generate a target lookup table. The storage forms of the files can be analyzed by acquiring the associated file information, so that the range of the scanned files in the process of inquiring the files later is conveniently reduced, the associated file information and the filtering information are combined to generate the target inquiry table, the range of the scanned files in the process of inquiring the files later is further reduced, and the data inquiry efficiency of big data is improved.
The associated file information refers to an externally stored file in the query table, wherein the externally stored file has an associated relation with a file to be queried; the storage form of the file can be obtained by analyzing the associated file information, so that the scanning file range can be reduced conveniently.
S4: and acquiring a preset matching rule, and carrying out matching processing on the SQL query statement through the preset matching rule to obtain a matched SQL query statement.
Specifically, the preset matching rule is a matching rule corresponding to SQL preset by a developer, and the aim is to optimize SQL query sentences, so that the SQL query sentences can be conveniently subjected to data query by a big data query engine. The preset matching rules can be executed by the SQL optimizer to perform matching processing on the SQL query statement.
Specifically, after the server obtains the SQL query statement, the server obtains a preset matching rule corresponding to the SQL query statement, and performs matching processing on the SQL query statement according to the preset matching rule through a preset SQL optimizer, that is, the preset matching rule converts the SQL query statement, so as to obtain the matched SQL query statement. Further, the server may execute the step S2 and the step S4 separately, but in order to improve the data query efficiency for big data, the server may execute the step S2 and the step S4 at the same time, and execute the following step S5 after executing the step 3 and the step 4.
Among them, the preset SQL optimizers include, but are not limited to: RBO (Rule-Based Optimizer) Based optimizers, CBO (Cost-Based optimizers) Based optimizers, etc., which transform relational expressions according to some established optimization rules, ultimately generating an optimal execution plan. The CBO (Cost-Based optimization) Based Optimizer converts the relational expression according to the optimization rules to generate a plurality of execution plans, and finally calculates the Cost of each execution plan according to the statistical information and the Cost model. In the present embodiment, a CBO (Cost-Based Optimizer) Based Optimizer is preferred because it relies on statistical information and Cost models, which are more accurate.
In one embodiment, the preset matching rule is: rearranging the field sequence in the Group by statement according to the preset sequence, putting a large table in the SQL query statement to the left when using the Join statement, and uniformly replacing UNION ALL in the SQL query statement.
S5: and replacing the corresponding query table in the matched SQL query statement with the target query table to obtain the target SQL query statement.
Specifically, a new efficient SQL query statement, namely a target SQL query statement, is formed by identifying a corresponding query table in the matched SQL query statement and replacing the target query table. The target SQL query statement is an SQL query statement containing the target query table and has associated file information and filtering information, namely the target SQL query statement has a limited file range, so that the scanning range of the query file is reduced conveniently, and the data query efficiency of big data is improved.
S6: and carrying out data query on the target SQL query statement through a big data query engine to obtain a query result, and sending the query result to the user side.
Specifically, the big data query engine is called to perform data query on the target SQL query statement, and the big data query engine analyzes the target SQL query statement to obtain related information in a target query table of the big data query engine, so as to obtain a file scanning range, scans file information in the range, so as to obtain a query result, and returns the query result to the user side.
Wherein the big data query engine includes, but is not limited to: presto and Sparksql. In the embodiment of the application, prest is preferable, and prest is an open-source distributed SQL query engine and is suitable for interactive analysis query, and the data volume supports GB to PB bytes. Because prest is able to handle PB-level mass data analysis, it has higher processing speed because it is based on memory operations, and it is able to connect multiple data sources.
In this embodiment, the obtained SQL query statement is parsed to obtain a corresponding query table and filtering information in the SQL query statement, then associated file information corresponding to the query table is obtained, and a target query table is generated according to the associated file information and the filtering information, so that the associated file information and the filtering condition are combined, and the data query scope is narrowed; and finally, carrying out data query on the target SQL query statement through a big data query engine to obtain a query result, and sending the query result to a user side, so that the associated file information and the filtering condition are added into the query statement, the data query range is greatly reduced, and the data query efficiency of big data is improved.
Referring to fig. 3, fig. 3 shows a specific implementation manner of step S2, in which the SQL query statement is parsed in step S2 to obtain a specific implementation process of the corresponding query table and the filtering information in the SQL query statement, which is described in detail as follows:
s21: and analyzing the SQL query statement to obtain an execution plan in the SQL query statement.
Specifically, the execution plan includes metadata information such as various lookup tables, filtering conditions, file field information and the like that the data query task needs to access, so that the SQL query statement needs to be parsed first to obtain the execution plan in the SQL query statement.
S22: metadata information in the execution plan is obtained by extracting metadata from the execution plan.
Specifically, metadata (Metadata), also called intermediate data and relay data, is data (data about data) describing data, mainly describing information of data attributes, and is used to support functions such as indicating storage locations, history data, resource searching, file recording, and the like. Metadata is an electronic catalog, and in order to achieve the purpose of cataloging, the contents or characteristics of data are described and collected, so that the purpose of assisting in data retrieval is achieved. In the embodiment of the application, metadata information is obtained by extracting metadata from the execution plan, so that the subsequent acquisition of the lookup table and the filtering information is facilitated.
S23: and carrying out data screening processing on the metadata information to obtain a lookup table and filtering information.
Specifically, the metadata information contains header information, filtering conditions and file field information, and data information is subjected to screening processing, so that a lookup table is generated according to the header information, and filtering information is generated according to the filtering conditions and the file field information.
In the implementation, the SQL query statement is analyzed to obtain the execution plan in the SQL query statement, metadata information in the execution plan is obtained by extracting metadata from the execution plan, and then data screening processing is carried out on the metadata information to obtain the query table and the filter information, so that the query table and the filter information are obtained, and the file range is conveniently narrowed.
Referring to fig. 4, fig. 4 shows a specific implementation manner of step S21, in which the specific implementation process of the execution plan in the SQL query statement is obtained by parsing the SQL query statement in step S21, which is described in detail as follows:
s211: keywords and identifiers in the SQL query statement are identified by a lexical analyzer.
Among these, lexical analyzers are also known as Scanner, lexical analyzer and token. Since SQL statement files are composed of keywords and strictly defined grammatical structures, the task of the lexical analyzer is to analyze and quantify those character streams that are otherwise meaningless, translating them into discrete groups of characters (i.e., tokens), including keywords, identifiers, and so forth. These parsed keywords and identifiers are provided to a parser in a subsequent step, which ultimately forms a syntax tree.
S212: and constructing grammar of the keywords and the identifiers according to the grammar analyzer, and generating a grammar tree.
In particular, the parser does not care about the grammatical meaning of the generated individual character sets and their relation to the context when parsing the character stream, which is the work of the parser. The parser organizes the received character sets and converts them into sequences allowed by the target language grammar definition.
Wherein, when analyzing the character stream, the grammar analyzer organizes the grammar meaning of the generated single character group and the relation between the grammar meaning and the context, which are not concerned by the grammar analyzer, and converts the grammar meaning into a sequence allowed by the grammar definition of the target language. In the embodiment of the application, the grammar analyzer organizes and builds the character groups such as keywords, identifiers and the like generated by the lexical analyzer, and converts the character groups into grammar trees.
The grammar construction is to construct the character groups such as keywords, identifiers and the like generated by the lexical analyzer through the grammar analyzer according to the grammar meaning of the character groups and the relation between the contexts of the character groups, and finally form a grammar tree.
S213: and compiling and analyzing the grammar tree by using a compiler to obtain an execution plan.
Specifically, the above steps have constructed a syntax tree, and the corresponding execution plan can be obtained only by compiling and analyzing the syntax tree. In the embodiment of the application, the compiler adopts an AstBuilder which is an open source code parser. And compiling and analyzing the grammar tree through an AstBuilder to finally obtain an execution plan.
In this embodiment, the lexical analyzer is used to identify the keywords and identifiers in the SQL query statement, and the grammar analyzer is used to build the keywords and identifiers according to the grammar, generate a grammar tree, and the compiler is used to compile and analyze the grammar tree to obtain an execution plan, so as to implement the grammar building and the execution plan acquisition of the SQL query statement, thereby facilitating the subsequent acquisition of the query table and the filtering information, and further facilitating the improvement of the data query efficiency of big data.
Referring to fig. 5, fig. 5 shows a specific implementation manner of step S23, and a specific implementation process of performing data filtering processing on metadata information in step S23 to obtain a lookup table and filtering information is described in detail as follows:
s231: header information, date screening information, and file field information in the metadata information are identified.
Specifically, the header information refers to the first cell information in the table, and the corresponding lookup table can be obtained through the header information; the date screening information refers to a file set of files to be queried in a certain date range, and defines a date scanning range of scanned files; the file field information refers to corresponding field information of part of the files to be queried contained in the scanned file.
S232: and obtaining a lookup table in the metadata information according to the header information.
S233: and combining the date screening information and the file field information to generate filtering information.
Specifically, the table header information is acquired to acquire the lookup table in the metadata information. And the date screening information and the file field information are combined to form a range of the scanned file, namely filtering information, which can reduce the range of the follow-up scanned file.
In this embodiment, by identifying header information, date screening information and file field information in metadata information, a lookup table in metadata information is obtained according to the header information, and date screening information and file field information are combined to generate filtering information, so that the lookup table and filtering information are obtained, a file range is limited, a subsequent scanned file range is narrowed, and data query efficiency for big data is improved.
Referring to fig. 6, fig. 6 shows a specific implementation manner of step S3, in which the relevant file information corresponding to the lookup table is obtained in step S3, and a specific implementation process of the target lookup table is generated according to the relevant file information and the filtering information, which is described in detail as follows:
s31: extracting file information in the lookup table, and acquiring associated file information according to the file information.
Specifically, the file information exists in the lookup table, and the file information points to an externally stored file having an association relationship with the file to be queried, so that the relationship file information is obtained by using the file information.
S32: and integrating the associated file information and the filtering information to generate a file list.
S33: and constructing and generating a target lookup table according to the file list.
Specifically, the associated file information and the filtering information are integrated, so that a file list form is generated, and then a target query table is constructed and generated through the file list, so that the range of scanning files in the process of querying the files later is further reduced, and the data query efficiency of big data is improved.
In this embodiment, by extracting the file information in the lookup table, acquiring the associated file information according to the file information, integrating the associated file information with the filtering information to generate the file list, and constructing the target lookup table according to the file list, the efficient target lookup table is obtained, the file range is reduced, and the data query efficiency of big data is improved.
Referring to fig. 6, fig. 6 shows a specific implementation manner of step S5, in which the target query table is replaced with the corresponding query table in the matched SQL query statement in step S5, so as to obtain a specific implementation process of the target SQL query statement, which is described in detail below:
s51: and identifying a corresponding lookup table in the matched SQL query statement to serve as a basic lookup table.
Specifically, the header information in the SQL query statement after the matching is identified and identified, the corresponding query table is obtained according to the header information, the query table is used as a basic query table, and the basic query table is the query table which needs to be replaced.
S52: and replacing the basic lookup table with the target lookup table to obtain the target SQL query statement.
Specifically, since the basic query table has been identified in the above steps, the target SQL query statement with the efficient query file is obtained by replacing the target query table,
in this embodiment, the corresponding query table in the matched SQL query statement is identified and used as the basic query table, and then the target query table is replaced with the basic query table to obtain the target SQL query statement, so that the target query table is replaced, the target SQL query statement is provided with an optimized and file-scope-reduced efficient SQL query statement, and thus the data query efficiency of big data is improved.
Referring to fig. 7, fig. 7 shows a specific implementation manner of step S6, in which a big data query engine performs data query on a target SQL query statement in step S6 to obtain a query result, and sends the query result to a user terminal, which is described in detail as follows:
s61: and analyzing the target SQL query statement to obtain file header information.
Specifically, the target SQL query statement is analyzed, and files to be scanned are determined according to the associated file information and the filtering information, so that file header information of the files is obtained. The header information is a piece of data at the beginning of the file, which is used for bearing a certain task, and is generally at the beginning part, and the corresponding file can be determined by the header information.
S62: and generating a file list to be scanned according to the file header information.
S63: and carrying out file scanning on the file list to be scanned through a big data query engine to obtain a query result, and sending the query result to the user side.
Specifically, the corresponding file is acquired through the file header information, so that a file list to be scanned is generated. And then the file scanning is carried out on the file list to be scanned through the big data query engine, so that a final query result is obtained, and the query result is returned to the user side, so that a data query task for big data is realized.
In this embodiment, the file header information is obtained by analyzing the target SQL query statement, the file list to be scanned is generated according to the file header information, then the file list to be scanned is scanned by the big data query engine to obtain the query result, the query result is sent to the user terminal, the data query task for big data is realized, and the query result is provided to the user terminal, so that the data query efficiency for big data is improved.
It should be emphasized that, to further ensure the privacy and security of the query results, the query results may also be stored in a node of a blockchain.
Those skilled in the art will appreciate that implementing all or part of the above-described methods in accordance with the embodiments may be accomplished by way of a computer program stored in a computer-readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. The storage medium may be a nonvolatile storage medium such as a magnetic disk, an optical disk, a Read-Only Memory (ROM), or a random access Memory (Random Access Memory, RAM).
Referring to fig. 9, as an implementation of the method shown in fig. 2, the present application provides an embodiment of a data query device based on a big data query engine, where the embodiment of the device corresponds to the embodiment of the method shown in fig. 2, and the device may be specifically applied to various electronic devices.
As shown in fig. 9, the data query device based on the big data query engine of the present embodiment includes: the system comprises an item identification acquisition module 71, a configuration file selection module 72, a target version branching module 73, a warehouse source code pulling module 74, a judgment result acquisition module 75 and a successful item construction module 76, wherein:
a query term acquisition module 71, configured to acquire an SQL query term from a user terminal;
the query statement analysis module 72 is configured to analyze the SQL query statement to obtain a corresponding query table and filtering information in the SQL query statement;
the target lookup table generating module 73 is configured to obtain associated file information corresponding to the lookup table, and generate a target lookup table according to the associated file information and the filtering information;
the query sentence matching module 74 is configured to obtain a preset matching rule, and perform matching processing on the SQL query sentence through the preset matching rule to obtain a matched SQL query sentence;
A lookup table replacement module 75, configured to replace the corresponding lookup table in the matched SQL query statement with the target lookup table to obtain a target SQL query statement;
the query result generation module 76 is configured to perform a data query on the target SQL query statement by using the big data query engine, obtain a query result, and send the query result to the user side.
Further, the query statement parsing module 72 includes:
the execution plan acquisition unit is used for acquiring an execution plan in the SQL query statement by analyzing the SQL query statement;
the metadata extraction unit is used for extracting metadata from the execution plan to obtain metadata information in the execution plan;
and the data screening processing unit is used for carrying out data screening processing on the metadata information to obtain a lookup table and filtering information.
Further, the execution plan acquisition unit includes:
a query sentence recognition subunit for recognizing keywords and identifiers in the SQL query sentence through the lexical analyzer;
the grammar tree generation subunit is used for carrying out grammar construction on the keywords and the identifiers according to the grammar analyzer to generate a grammar tree;
and the grammar tree compiling and analyzing subunit is used for compiling and analyzing the grammar tree by using a compiler to obtain an execution plan.
Further, the data screening processing unit includes:
a metadata information identifying subunit, configured to identify header information, date screening information, and file field information in the metadata information;
a lookup table obtaining subunit, configured to obtain a lookup table in the metadata information according to the header information;
and the filtering information generating unit is used for combining the date screening information and the file field information to generate filtering information.
Further, the target lookup table generation module 73 includes:
the associated file information acquisition unit is used for extracting file information in the lookup table and acquiring associated file information according to the file information;
the file list generation unit is used for integrating the associated file information and the filtering information to generate a file list;
and the target lookup table construction unit is used for constructing and generating a target lookup table according to the file list.
Further, the lookup table replacement module 75 includes:
the basic query table acquisition unit is used for identifying the corresponding query table in the matched SQL query statement and taking the corresponding query table as a basic query table;
and the target SQL query statement generating unit is used for replacing the basic query table with the target query table to obtain a target SQL query statement.
Further, the query result generation module 76 includes:
the file header information acquisition unit is used for analyzing the target SQL query statement to obtain file header information;
the file list to be scanned generating unit is used for generating a file list to be scanned according to the file header information;
the inquiry result sending unit is used for carrying out file scanning on the file list to be scanned through the big data inquiry engine to obtain an inquiry result and sending the inquiry result to the user side.
It should be emphasized that, to further ensure the privacy and security of the query results, the query results may also be stored in a node of a blockchain.
In order to solve the technical problems, the embodiment of the application also provides computer equipment. Referring specifically to fig. 10, fig. 10 is a basic structural block diagram of a computer device according to the present embodiment.
The computer device 8 comprises a memory 81, a processor 82, a network interface 83 communicatively connected to each other via a system bus. It should be noted that only a computer device 8 having three components memory 81, a processor 82, a network interface 83 is shown in the figures, but it should be understood that not all of the illustrated components are required to be implemented and that more or fewer components may be implemented instead. It will be appreciated by those skilled in the art that the computer device herein is a device capable of automatically performing numerical calculations and/or information processing in accordance with predetermined or stored instructions, the hardware of which includes, but is not limited to, microprocessors, application specific integrated circuits (Application Specific Integrated Circuit, ASICs), programmable gate arrays (fields-Programmable Gate Array, FPGAs), digital processors (Digital Signal Processor, DSPs), embedded devices, etc.
The computer device may be a desktop computer, a notebook computer, a palm computer, a cloud server, or the like. The computer device can perform man-machine interaction with a user through a keyboard, a mouse, a remote controller, a touch pad or voice control equipment and the like.
The memory 81 includes at least one type of readable storage medium including flash memory, hard disk, multimedia card, card memory (e.g., SD or DX memory, etc.), random Access Memory (RAM), static Random Access Memory (SRAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), programmable Read Only Memory (PROM), magnetic memory, magnetic disk, optical disk, etc. In some embodiments, the memory 81 may be an internal storage unit of the computer device 8, such as a hard disk or memory of the computer device 8. In other embodiments, the memory 81 may also be an external storage device of the computer device 8, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card) or the like, which are provided on the computer device 8. Of course, the memory 81 may also include both internal storage units of the computer device 8 and external storage devices. In the present embodiment, the memory 81 is typically used to store an operating system and various types of application software installed on the computer device 8, such as program codes of a data query method based on a big data query engine. Further, the memory 81 may be used to temporarily store various types of data that have been output or are to be output.
The processor 82 may be a central processing unit (Central Processing Unit, CPU), controller, microcontroller, microprocessor, or other data processing chip in some embodiments. The processor 82 is typically used to control the overall operation of the computer device 8. In this embodiment, the processor 82 is configured to execute the program code stored in the memory 81 or process data, for example, execute the program code of the data query method based on the big data query engine described above, so as to implement various embodiments of the data query method based on the big data query engine.
The network interface 83 may comprise a wireless network interface or a wired network interface, which network interface 83 is typically used to establish a communication connection between the computer device 8 and other electronic devices.
The present application also provides another embodiment, namely, a computer readable storage medium, where a computer program is stored, where the computer program is executable by at least one processor, so that the at least one processor performs the steps of a data query method based on a big data query engine as described above.
From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk), comprising several instructions for causing a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the method of the embodiments of the present application.
The blockchain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, encryption algorithm and the like. The Blockchain (Blockchain), which is essentially a decentralised database, is a string of data blocks that are generated by cryptographic means in association, each data block containing a batch of information of network transactions for verifying the validity of the information (anti-counterfeiting) and generating the next block. The blockchain may include a blockchain underlying platform, a platform product services layer, an application services layer, and the like.
It is apparent that the embodiments described above are only some embodiments of the present application, but not all embodiments, the preferred embodiments of the present application are given in the drawings, but not limiting the patent scope of the present application. This application may be embodied in many different forms, but rather, embodiments are provided in order to provide a more thorough understanding of the present disclosure. Although the present application has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that modifications may be made to the embodiments described in the foregoing, or equivalents may be substituted for elements thereof. All equivalent structures made by the specification and the drawings of the application are directly or indirectly applied to other related technical fields, and are also within the protection scope of the application.

Claims (8)

1. A data query method based on a big data query engine, comprising:
acquiring SQL query sentences from a user side;
identifying keywords and identifiers in the SQL query statement through a lexical analyzer;
constructing grammar of the keywords and the identifiers according to a grammar analyzer, and generating a grammar tree;
compiling and analyzing the grammar tree by using a compiler to obtain an execution plan;
metadata extraction is carried out on the execution plan, so that metadata information in the execution plan is obtained;
performing data screening processing on the metadata information to obtain a lookup table and filtering information;
acquiring associated file information corresponding to the lookup table, and generating a target lookup table according to the associated file information and the filtering information;
acquiring a preset matching rule, and carrying out matching processing on the SQL query statement through the preset matching rule to obtain a matched SQL query statement, wherein the preset matching rule is to rearrange the field sequence in the Group by statement according to a preset sequence, place a large table in the SQL query statement on the left when using a Join statement, and uniformly replace UNINALL in the SQL query statement;
Replacing the corresponding query table in the matched SQL query statement with the target query table to obtain a target SQL query statement;
and carrying out data query on the target SQL query statement through a big data query engine to obtain a query result, and sending the query result to the user side.
2. The data query method based on a big data query engine according to claim 1, wherein the performing data filtering processing on the metadata information to obtain the query table and the filtering information includes:
identifying header information, date screening information and file field information in the metadata information;
acquiring the lookup table in the metadata information according to the header information;
and combining the date screening information and the file field information to generate the filtering information.
3. The data query method based on the big data query engine according to claim 1, wherein the obtaining the associated file information corresponding to the query table, and generating the target query table according to the associated file information and the filtering information, comprises:
extracting file information in the lookup table, and acquiring the associated file information according to the file information;
Integrating the associated file information and the filtering information to generate a file list;
and constructing and generating the target lookup table according to the file list.
4. The method for querying data based on big data query engine according to claim 1, wherein replacing the target query table with the corresponding query table in the matched SQL query statement to obtain the target SQL query statement comprises:
identifying a corresponding query table in the matched SQL query statement to serve as a basic query table;
and replacing the basic lookup table with the target lookup table to obtain the target SQL query statement.
5. The data query method based on a big data query engine according to any one of claims 1 to 4, wherein the performing, by the big data query engine, the data query on the target SQL query statement to obtain a query result, and sending the query result to the user side includes:
analyzing the target SQL query statement to obtain file header information;
generating a file list to be scanned according to the file header information;
and carrying out file scanning on the file list to be scanned through the big data query engine to obtain the query result, and sending the query result to the user side.
6. A data query device based on a big data query engine, comprising:
the query statement acquisition module is used for acquiring SQL query statements from the user side;
a query statement identification subunit, configured to identify, by using a lexical analyzer, keywords and identifiers in the SQL query statement;
a grammar tree generation subunit, configured to perform grammar construction on the keyword and the identifier according to a grammar analyzer, and generate a grammar tree;
the grammar tree compiling and analyzing subunit is used for compiling and analyzing the grammar tree by using a compiler to obtain an execution plan;
the metadata extraction unit is used for extracting metadata from the execution plan to obtain metadata information in the execution plan;
the data screening processing unit is used for carrying out data screening processing on the metadata information to obtain a lookup table and filtering information;
the target lookup table generation module is used for acquiring the associated file information corresponding to the lookup table and generating a target lookup table according to the associated file information and the filtering information;
the query sentence matching module is used for acquiring a preset matching rule, and carrying out matching processing on the SQL query sentence through the preset matching rule to obtain a matched SQL query sentence, wherein the preset matching rule is that the field sequence in the Group by sentence is rearranged according to a preset sequence, a large table in the SQL query sentence is placed on the left side when a Join sentence is used, and UNONALL is uniformly replaced with UNION in the SQL query sentence;
The query table replacing module is used for replacing the corresponding query table in the matched SQL query statement with the target query table to obtain a target SQL query statement;
and the query result generation module is used for carrying out data query on the target SQL query statement through a big data query engine to obtain a query result and sending the query result to the user side.
7. A computer device comprising a memory and a processor, the memory having stored therein a computer program, the processor implementing the big data query engine based data query method of any of claims 1 to 5 when the computer program is executed.
8. A computer readable storage medium, characterized in that the computer readable storage medium has stored thereon a computer program which, when executed by a processor, implements a big data query engine based data query method according to any of claims 1 to 5.
CN202110193443.7A 2021-02-20 2021-02-20 Data query method, device, equipment and medium based on big data query engine Active CN112860727B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110193443.7A CN112860727B (en) 2021-02-20 2021-02-20 Data query method, device, equipment and medium based on big data query engine

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110193443.7A CN112860727B (en) 2021-02-20 2021-02-20 Data query method, device, equipment and medium based on big data query engine

Publications (2)

Publication Number Publication Date
CN112860727A CN112860727A (en) 2021-05-28
CN112860727B true CN112860727B (en) 2024-01-12

Family

ID=75988378

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110193443.7A Active CN112860727B (en) 2021-02-20 2021-02-20 Data query method, device, equipment and medium based on big data query engine

Country Status (1)

Country Link
CN (1) CN112860727B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113672781A (en) * 2021-08-20 2021-11-19 平安国际智慧城市科技股份有限公司 Data query method and device, electronic equipment and storage medium
CN114443691B (en) * 2022-01-18 2024-01-23 苏州浪潮智能科技有限公司 Database query optimization method, system and computer equipment
CN114238286B (en) * 2022-02-28 2022-08-05 连连(杭州)信息技术有限公司 Data warehouse data processing method and device, electronic equipment and storage medium
CN114760369B (en) * 2022-04-14 2023-12-19 曙光网络科技有限公司 Protocol metadata extraction method, device, equipment and storage medium
CN115563167B (en) * 2022-12-02 2023-03-31 浙江大华技术股份有限公司 Data query method, electronic device and computer-readable storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109656947A (en) * 2018-11-09 2019-04-19 金蝶软件(中国)有限公司 Data query method, apparatus, computer equipment and storage medium
CN111522816A (en) * 2020-04-16 2020-08-11 云和恩墨(北京)信息技术有限公司 Data processing method, device, terminal and medium based on database engine
CN111639078A (en) * 2020-05-25 2020-09-08 北京百度网讯科技有限公司 Data query method and device, electronic equipment and readable storage medium

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10649986B2 (en) * 2017-01-31 2020-05-12 Salesforce.Com, Inc. Systems, methods, and apparatuses for implementing a BY ORGID command term within a multi-tenant aware structured query language
CN109144994B (en) * 2017-06-19 2022-04-29 华为技术有限公司 Index updating method, system and related device

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109656947A (en) * 2018-11-09 2019-04-19 金蝶软件(中国)有限公司 Data query method, apparatus, computer equipment and storage medium
CN111522816A (en) * 2020-04-16 2020-08-11 云和恩墨(北京)信息技术有限公司 Data processing method, device, terminal and medium based on database engine
CN111639078A (en) * 2020-05-25 2020-09-08 北京百度网讯科技有限公司 Data query method and device, electronic equipment and readable storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于Oracle数据库SQL查询语句优化规则的研究;吴洁明;周锦;;陕西理工学院学报(自然科学版)(第04期);全文 *

Also Published As

Publication number Publication date
CN112860727A (en) 2021-05-28

Similar Documents

Publication Publication Date Title
CN112860727B (en) Data query method, device, equipment and medium based on big data query engine
CN110795455B (en) Dependency analysis method, electronic device, computer apparatus, and readable storage medium
US6877000B2 (en) Tool for converting SQL queries into portable ODBC
CN109522341B (en) Method, device and equipment for realizing SQL-based streaming data processing engine
CN111309760A (en) Data retrieval method, system, device and storage medium
CN111782763A (en) Information retrieval method based on voice semantics and related equipment thereof
CN115576984A (en) Method for generating SQL (structured query language) statement and cross-database query by Chinese natural language
CN101055566B (en) Function collection method and device of electronic data table
CN111553556A (en) Business data analysis method and device, computer equipment and storage medium
CN113407785A (en) Data processing method and system based on distributed storage system
CN1601524A (en) Fuzzy inquiry system and method
CN110309214B (en) Instruction execution method and equipment, storage medium and server thereof
CN113297251A (en) Multi-source data retrieval method, device, equipment and storage medium
CN113934786A (en) Implementation method for constructing unified ETL
CN116383412B (en) Functional point amplification method and system based on knowledge graph
CN111126073B (en) Semantic retrieval method and device
CN116010662A (en) Construction method, device and medium of energy consumption-carbon emission query system
CN112527880B (en) Method, device, equipment and medium for collecting metadata information of big data cluster
CN113505143A (en) Statement type conversion method and device, storage medium and electronic device
CN115185973A (en) Data resource sharing method, platform, device and storage medium
CN113934430A (en) Data retrieval analysis method and device, electronic equipment and storage medium
CN115242638B (en) Feasible touch screening method and device, electronic equipment and storage medium
CN115168408A (en) Query optimization method, device, equipment and storage medium based on reinforcement learning
CN117251472B (en) Cross-source data processing method, device, equipment and storage medium
CN111079391B (en) Report generation method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant