CN112860727A - Data query method, device, equipment and medium based on big data query engine - Google Patents

Data query method, device, equipment and medium based on big data query engine Download PDF

Info

Publication number
CN112860727A
CN112860727A CN202110193443.7A CN202110193443A CN112860727A CN 112860727 A CN112860727 A CN 112860727A CN 202110193443 A CN202110193443 A CN 202110193443A CN 112860727 A CN112860727 A CN 112860727A
Authority
CN
China
Prior art keywords
query
information
sql
statement
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110193443.7A
Other languages
Chinese (zh)
Other versions
CN112860727B (en
Inventor
包云飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN202110193443.7A priority Critical patent/CN112860727B/en
Publication of CN112860727A publication Critical patent/CN112860727A/en
Application granted granted Critical
Publication of CN112860727B publication Critical patent/CN112860727B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2433Query languages
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/335Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The method comprises the steps of analyzing the obtained SQL query statement to obtain a corresponding query table and filter information in the SQL query statement, then obtaining associated file information corresponding to the query table, generating a target query table by combining the filter information, matching the SQL query statement, replacing the matched query table with the corresponding query table in the SQL query statement to obtain a target SQL query statement, carrying out data query on the target SQL query statement through the big data query engine, and sending the obtained query result to a user side. The application also relates to blockchain techniques, where query results are stored in blockchains. By adding the associated file information and the filtering condition into the query statement, the data query range is reduced, and the data query efficiency of big data is improved.

Description

Data query method, device, equipment and medium based on big data query engine
Technical Field
The present application relates to the field of data query technologies, and in particular, to a data query method, apparatus, device, and medium based on a big data query engine.
Background
It is mature to analyze large amounts of historical data using big data technology queries. The big data technology query has the following characteristics: the data size is large and can reach the data size from several GB to TB, and even the data queried by some tasks can be close to or reach the PB level; a large number of tasks are query tasks containing SQL-like or are SQL queries, and source data are basically filtered in a manner similar to SQL; a large number of tasks filter data by date.
The existing big data query is carried out by data cleaning processing through a data warehouse technology or is applied by submitting a timing task in a jobmode and calculating in advance, and then the data query is carried out through a data scanning and data mapping mode. However, the large amount of data causes a large amount of data to be scanned, which results in inefficient data query and longer query time. There is a need for a method that can improve the efficiency of data query on large data.
Disclosure of Invention
The embodiment of the application aims to provide a data query method, a data query device, data query equipment and a data query medium based on a big data query engine so as to improve the data query efficiency of big data.
In order to solve the above technical problem, an embodiment of the present application provides a data query method based on a big data query engine, including:
acquiring SQL query sentences from a user side;
analyzing the SQL query statement to acquire a corresponding query table and filtering information in the SQL query statement;
acquiring associated file information corresponding to the query table, and generating a target query table according to the associated file information and the filtering information;
acquiring a preset matching rule, and performing matching processing on the SQL query statement through the preset matching rule to obtain a matched SQL query statement;
replacing the corresponding query table in the matched SQL query statement by the target query table to obtain a target SQL query statement;
and performing data query on the target SQL query statement through a big data query engine to obtain a query result, and sending the query result to the user side.
In order to solve the above technical problem, an embodiment of the present application provides a data query apparatus based on a big data query engine, including:
the query statement acquisition module is used for acquiring SQL query statements from a user side;
the query statement analysis module is used for analyzing the SQL query statement to acquire a corresponding query table and filtering information in the SQL query statement;
the target query table generating module is used for acquiring the associated file information corresponding to the query table and generating a target query table according to the associated file information and the filtering information;
the query statement matching module is used for acquiring a preset matching rule and matching the SQL query statement according to the preset matching rule to obtain a matched SQL query statement;
the query table replacing module is used for replacing the corresponding query table in the matched SQL query statement by the target query table to obtain the target SQL query statement;
and the query result generation module is used for performing data query on the target SQL query statement through a big data query engine to obtain a query result and sending the query result to the user side.
In order to solve the technical problems, the invention adopts a technical scheme that: a computer device is provided that includes, one or more processors; a memory for storing one or more programs to cause the one or more processors to implement the big data query engine-based data query method as described in any one of the above.
In order to solve the technical problems, the invention adopts a technical scheme that: a computer readable storage medium, wherein a computer program is stored on the computer readable storage medium, and when being executed by a processor, the computer program implements the big data query engine-based data query method as described in any one of the above.
The embodiment of the invention provides a data query method, a device, equipment and a medium based on a big data query engine. The embodiment of the invention analyzes the obtained SQL query statement to obtain a corresponding query table and filtering information in the SQL query statement, then obtains associated file information corresponding to the query table, and generates a target query table according to the associated file information and the filtering information, thereby realizing the combination of the associated file information and the filtering condition and reducing the data query range; and finally, performing data query on the target SQL query statement through a big data query engine to obtain a query result, and sending the query result to a user side, so that associated file information and filtering conditions are added into the query statement, the data query range is greatly reduced, and the data query efficiency of big data is improved.
Drawings
In order to more clearly illustrate the solution of the present application, the drawings needed for describing the embodiments of the present application will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present application, and that other drawings can be obtained by those skilled in the art without inventive effort.
FIG. 1 is a schematic diagram of an application environment of a data query method based on a big data query engine according to an embodiment of the present application;
FIG. 2 is a flowchart of an implementation of a big data query engine-based data query method according to an embodiment of the present application;
FIG. 3 is a flowchart of an implementation of a sub-process in a data query method based on a big data query engine according to an embodiment of the present application;
FIG. 4 is a flowchart of another implementation of a sub-process in a data query method based on a big data query engine according to an embodiment of the present application;
FIG. 5 is a flowchart of another implementation of a sub-process in a data query method based on a big data query engine according to an embodiment of the present application;
FIG. 6 is a flowchart of another implementation of a sub-process in a data query method based on a big data query engine according to an embodiment of the present application;
FIG. 7 is a flowchart of another implementation of a sub-process in a data query method based on a big data query engine according to an embodiment of the present application;
FIG. 8 is a flowchart of another implementation of a sub-process in a big data query engine-based data query method according to an embodiment of the present application;
FIG. 9 is a schematic diagram of a data query device based on a big data query engine according to an embodiment of the present application;
fig. 10 is a schematic diagram of a computer device provided in an embodiment of the present application.
Detailed Description
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs; the terminology used in the description of the application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application; the terms "including" and "having," and any variations thereof, in the description and claims of this application and the description of the above figures are intended to cover non-exclusive inclusions. The terms "first," "second," and the like in the description and claims of this application or in the above-described drawings are used for distinguishing between different objects and not for describing a particular order.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.
In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings.
The present invention will be described in detail below with reference to the accompanying drawings and embodiments.
Referring to fig. 1, the system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the terminal devices 101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
The user may use the terminal devices 101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. The terminal devices 101, 102, 103 may have installed thereon various communication client applications, such as a web browser application, a search-type application, an instant messaging tool, and the like.
The terminal devices 101, 102, 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.
The server 105 may be a server providing various services, such as a background server providing support for pages displayed on the terminal devices 101, 102, 103.
It should be noted that, the data query method based on the big data query engine provided in the embodiments of the present application is generally executed by a server, and accordingly, the data query apparatus based on the big data query engine is generally configured in the server.
It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
Referring to fig. 2, fig. 2 shows a specific embodiment of a data query method based on a big data query engine.
It should be noted that, if the result is substantially the same, the method of the present invention is not limited to the flow sequence shown in fig. 2, and the method includes the following steps:
s1: and acquiring the SQL query statement from the user terminal.
Specifically, when data query is required to be performed on big data, the user side constructs an SQL query statement according to actual requirements, and sends the SQL query statement to the server, and the server starts to perform subsequent data query steps after receiving the SQL query statement.
The Structured Query Language (SQL) is a special purpose programming Language, and is a database Query and programming Language used for accessing data and querying, updating, and managing a relational database system. In the embodiment of the present application, the SQL query statement is a query statement written according to a structured query language, and is used for performing data query on big data.
S2: and analyzing the SQL query statement to acquire a corresponding query table and filtering information in the SQL query statement.
Specifically, the execution plan corresponding to the SQL query statement is obtained by analyzing the SQL query statement, and the execution plan includes various query tables, filtering conditions, file field information and other metadata information that the data query task needs to access, so after the metadata information of the execution plan is extracted, the query tables and the filtering information corresponding to the execution plan are obtained.
The filtering information refers to filtering conditions and screening information for limiting the file to be queried in the SQL query statement. The filtering condition and the screening information can respectively comprise date screening information and file field information, and the filtering condition and the screening information provide the screening condition when data is subsequently queried so as to reduce the file scanning range and further improve the data query efficiency of big data. In a specific embodiment, the date screening information in the SQL query statement is "where data > '2020-10-02'", the date screening information is a file whose scan date is 10/2/2020, the file field information in the SQL query statement is "id ═ huodong", the file field information is a file whose name is "huodong", and the combination of the log screening information and the file field information is a file whose scan date is 10/2/2020, and whose name is "huodong". Therefore, when the file is scanned subsequently, the scanning range can be greatly reduced.
S3: and acquiring associated file information corresponding to the query table, and generating a target query table according to the associated file information and the filtering information.
Specifically, since the query table has the relevant information of the file to be queried, the external associated file information having the relevant information with the query table is obtained by extracting the file information in the query table and according to the file information; and integrating the associated file information and the filtering information to generate a file list containing the associated file information and the filtering information, and finally constructing the file list to generate a target query table. By acquiring the associated file information, the storage forms of the files can be analyzed, the range of scanning the files during subsequent file query is convenient to reduce, the associated file information and the filtering information are combined to generate a target query table, the range of scanning the files during subsequent file query is further reduced, and the efficiency of querying the data of the big data is improved.
The associated file information refers to an externally stored file which has an association relation with a file to be inquired in the inquiry table; the storage form of the file can be obtained by analyzing the associated file information, so that the file scanning range is conveniently narrowed.
S4: and acquiring a preset matching rule, and matching the SQL query statement according to the preset matching rule to obtain the matched SQL query statement.
Specifically, the preset matching rule is a matching rule corresponding to SQL preset by a developer, and is intended to optimize the SQL query statement, so as to facilitate subsequent data query of the SQL query statement by a big data query engine. The preset matching rule can be used for executing the SQL query statement matching processing by the SQL optimizer.
Specifically, after the server obtains the SQL query statement, the server obtains a preset matching rule corresponding to the SQL query statement, and performs matching processing on the SQL query statement according to the preset matching rule through a preset SQL optimizer, that is, the preset matching rule transforms the SQL query statement, thereby obtaining the matched SQL query statement. Further, the server may perform step S2 and step S4 in no order, but in order to improve the efficiency of querying the data of the big data, the server may perform step S2 and step S4 at the same time, and after performing step S3 and step S4, perform step S5 described below.
Wherein, the preset SQL optimizer includes but is not limited to: an RBO (Rule-Based Optimizer) Based Optimizer, a CBO (Cost-Based Optimizer) Based Optimizer, and the like, wherein the RBO (Rule-Based Optimizer) converts the relational expression according to some established optimization rules, and finally generates an optimal execution plan. The CBO (Cost-Based Optimizer) converts the relational expressions according to optimization rules to generate a plurality of execution plans, and finally calculates the Cost of each execution plan according to statistical information and a Cost model. In the embodiments of the present application, a CBO (Cost-Based Optimizer) Cost-Based Optimizer is preferred because it relies on statistical information and Cost models, which are more accurate.
In one embodiment, the preset matching rule is: rearranging the field sequence in the Group by statement according to a preset sequence, placing a large table in the SQL query statement on the left when using the Join statement, and uniformly replacing UNION ALL in the SQL query statement.
S5: and replacing the corresponding query table in the matched SQL query statement by the target query table to obtain the target SQL query statement.
Specifically, a new and efficient SQL query statement, that is, the target SQL query statement, is formed by identifying a corresponding query table in the matched SQL query statement and then replacing the target query table with the corresponding query table. The target SQL query statement is the SQL query statement containing the target query table, has associated file information and filtering information, namely has a limited file range, is convenient for subsequently reducing the scanning range of the query file, and is beneficial to improving the data query efficiency of big data.
S6: and performing data query on the target SQL query statement through a big data query engine to obtain a query result, and sending the query result to the user side.
Specifically, a big data query engine is called to perform data query on a target SQL query statement, the big data query engine obtains relevant information in a target query table of the big data query engine by analyzing the target SQL query statement to obtain a file scanning range, and scans file information in the range to obtain a query result, and the query result is returned to the user side.
The big data query engine includes but is not limited to: presto and Sparksql. In the embodiment of the present application, Presto is preferred, where Presto is an open-source distributed SQL query engine, and is suitable for interactive analytic query, and the data size supports GB to PB bytes. Presto has higher processing speed because it can handle mass data analysis at PB level, because it is based on memory operations, and it can connect multiple data sources.
In the embodiment, the obtained SQL query statement is analyzed to obtain the corresponding query table and the filtering information in the SQL query statement, then the associated file information corresponding to the query table is obtained, and the target query table is generated according to the associated file information and the filtering information, so that the associated file information and the filtering condition are combined, and the data query range is reduced; and finally, performing data query on the target SQL query statement through a big data query engine to obtain a query result, and sending the query result to a user side, so that associated file information and filtering conditions are added into the query statement, the data query range is greatly reduced, and the data query efficiency of big data is improved.
Referring to fig. 3, fig. 3 shows a specific implementation manner of step S2, where the specific implementation process of parsing the SQL query statement to obtain the corresponding query table and filtering information in the SQL query statement in step S2 is described as follows:
s21: and analyzing the SQL query statement to obtain an execution plan in the SQL query statement.
Specifically, the execution plan includes metadata information such as various lookup tables, filter conditions, and file field information that the data query task needs to access, so the SQL query statement needs to be analyzed first to obtain the execution plan in the SQL query statement.
S22: and extracting metadata from the execution plan to obtain metadata information in the execution plan.
Specifically, Metadata (also called intermediary data or relay data) is data (data about data) describing data, and is mainly information describing data attributes, and is used to support functions such as indicating storage locations, history data, resource searching, file recording, and the like. Metadata is an electronic catalog, and is used for describing and collecting the content or characteristics of data to assist data retrieval for the purpose of compiling the catalog. In the embodiment of the application, the metadata information is obtained by extracting the metadata of the execution plan, so that the query table and the filtering information can be conveniently obtained subsequently.
S23: and carrying out data screening processing on the metadata information to obtain a query table and filtering information.
Specifically, the data information including header information, filtering conditions and file field information in the metadata information is filtered, so that a query table is generated according to the header information, and filtering information is generated according to the filtering conditions and the file field information.
In the implementation, the SQL query statement is analyzed to obtain the execution plan in the SQL query statement, the metadata in the execution plan is extracted to obtain the metadata information in the execution plan, and then the metadata information is subjected to data screening processing to obtain the query table and the filtering information, so that the query table and the filtering information are obtained, and the file range is convenient to reduce.
Referring to fig. 4, fig. 4 shows an embodiment of step S21, where the specific implementation process of step S21, by parsing the SQL query statement, to obtain the execution plan in the SQL query statement, is described as follows:
s211: keywords and identifiers in the SQL query statement are identified by the lexical analyzer.
Among them, the Lexical analyzer is also called Scanner, Lexical analyzer and Tokenizer. Since the SQL statement file is composed of keywords and well-defined syntactic structures, the work of the lexical analyzer is to analyze and quantize the otherwise meaningless character stream, and translate it into discrete character sets (i.e., tokens, identifiers, etc.). The parsed keywords and identifiers are provided to a parser in a subsequent step, and a syntax tree is finally formed.
S212: and carrying out grammar building on the keywords and the identifiers according to the grammar analyzer to generate a grammar tree.
Specifically, when the parser parses a character stream, the lexical parser does not care about the grammatical meaning of the generated single character set and its relationship to the context, which is the work of the parser. The parser organizes the received character sets and converts them into sequences that are allowed for the target language grammar definition.
When the syntactic parser parses the character stream, the lexical parser organizes the syntactic meaning of the generated single character set and the relation between the syntactic meaning and the context without concern, and converts the syntactic meaning into a sequence allowed by the grammar definition of the target language. In the embodiment of the application, the grammar analyzer organizes and constructs character groups such as keywords, identifiers and the like generated by the lexical analyzer and converts the character groups into grammar trees.
The grammar building is to organize and build character groups such as keywords, identifiers and the like generated by the lexical analyzer through the grammar analyzer according to the grammar meaning of the character groups and the relation between contexts of the character groups, and finally form a grammar tree.
S213: and compiling and analyzing the syntax tree by using a compiler to obtain an execution plan.
Specifically, the syntax tree is constructed in the above steps, and the corresponding execution plan can be obtained only by compiling and analyzing the syntax tree. In the embodiment of the present application, the compiler uses the AstBuilder, which is an open source code parser. Compiling and analyzing the syntax tree through the AstBuilder to finally obtain an execution plan.
In the embodiment, the keywords and the identifiers in the SQL query statement are identified through the lexical analyzer, the syntax construction is carried out on the keywords and the identifiers according to the syntax analyzer to generate the syntax tree, the compiler is used for compiling and analyzing the syntax tree to obtain the execution plan, the syntax construction and the execution plan obtaining of the SQL query statement are achieved, the query table and the filtering information can be conveniently obtained subsequently, and the data query efficiency of big data can be improved.
Referring to fig. 5, fig. 5 shows an embodiment of step S23, where in step S23, a specific implementation process of performing data filtering processing on metadata information to obtain a lookup table and filter information is described as follows:
s231: header information, date screening information, and file field information in the metadata information are identified.
Specifically, the header information refers to first cell information in the table, and the corresponding query table can be obtained through the header information; the date screening information refers to a file set of files to be inquired in a certain date range, and the date screening information defines the date scanning range of the scanned files; the file field information refers to corresponding field information of a part of files to be inquired contained in the scanned files.
S232: and acquiring a lookup table in the metadata information according to the header information.
S233: and combining the date screening information and the file field information to generate filtering information.
Specifically, the table header information is obtained to obtain the lookup table in the metadata information. And date screening information and file field information are combined to form a range of a scanned file, namely filtering information, which can reduce the range of the subsequent scanned file.
In the embodiment, the header information, the date screening information and the file field information in the metadata information are identified, the query table in the metadata information is obtained according to the header information, the date screening information and the file field information are combined to generate the filtering information, the query table and the filtering information are obtained, the file range is limited, the range of subsequent scanning files is narrowed, and the efficiency of querying the data of the big data is improved.
Referring to fig. 6, fig. 6 shows a specific implementation manner of step S3, where in step S3, associated file information corresponding to the query table is obtained, and a specific implementation process of generating the target query table according to the associated file information and the filtering information is described as follows:
s31: and extracting the file information in the query table, and acquiring the associated file information according to the file information.
Specifically, the file information in the lookup table points to the externally stored file having an association relationship with the file to be queried, so that the relationship file information is obtained from the file information.
S32: and integrating the associated file information and the filtering information to generate a file list.
S33: and constructing and generating a target query table according to the file list.
Specifically, the associated file information and the filtering information are integrated to generate a file list, and then a target query table is constructed and generated through the file list, so that the range of scanning files in subsequent file query is further reduced, and the efficiency of querying the data of the big data is improved.
In the embodiment, the file information in the query table is extracted, the associated file information is obtained according to the file information, the associated file information and the filtering information are integrated to generate the file list, and the generated target query table is constructed according to the file list, so that the high-efficiency target query table is obtained, the file range is reduced, and the data query efficiency of big data is improved.
Referring to fig. 6, fig. 6 shows a specific implementation manner of step S5, where in step S5, the target query table is replaced by the corresponding query table in the matched SQL query statement to obtain a specific implementation process of the target SQL query statement, which is described in detail as follows:
s51: and identifying a corresponding query table in the matched SQL query statement as a basic query table.
Specifically, header information in the matched SQL query statement is identified, a corresponding query table is obtained according to the header information, and the query table is used as a basic query table which needs to be replaced.
S52: and replacing the basic query table with the target query table to obtain a target SQL query statement.
In particular, since the basic query table is identified in the above steps, the target SQL query statement with the high efficiency query file is obtained by replacing the target query table,
in this embodiment, the corresponding query table in the matched SQL query statement is identified and used as the basic query table, and the basic query table is replaced with the target query table to obtain the target SQL query statement, so that the target SQL query statement is replaced, and the target SQL query statement has the optimized efficient SQL query statement with a reduced file range, thereby facilitating improvement of data query efficiency on big data.
Referring to fig. 7, fig. 7 shows a specific implementation manner of step S6, in step S6, a big data query engine performs data query on a target SQL query statement to obtain a query result, and sends the query result to a user side, which is described in detail as follows:
s61: and analyzing the target SQL query statement to obtain file header information.
Specifically, the files to be scanned are determined according to the associated file information and the filtering information in the target SQL query statement by analyzing the target SQL query statement, so that the file header information of the files is obtained. The header information is data at the beginning of a file that is responsible for a certain task, and is generally at the beginning, and the corresponding file can be specified by the header information.
S62: and generating a file list to be scanned according to the file header information.
S63: and scanning the files of the file list to be scanned by the big data query engine to obtain a query result, and sending the query result to the user side.
Specifically, the corresponding file is obtained through the file header information, so that a file list to be scanned is generated. And scanning the files of the file list to be scanned by a big data query engine to obtain a final query result, and returning the query result to the user side so as to realize a data query task of big data.
In this embodiment, the target SQL query statement is analyzed to obtain header information, a to-be-scanned file list is generated according to the header information, the to-be-scanned file list is scanned by the big data query engine to obtain a query result, and the query result is sent to the user side, so that a data query task for big data is implemented, the query result is provided to the user side, and the improvement of data query efficiency for the big data is facilitated.
It is emphasized that, in order to further ensure the privacy and security of the query result, the query result may also be stored in a node of a blockchain.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and can include the processes of the embodiments of the methods described above when the computer program is executed. The storage medium may be a non-volatile storage medium such as a magnetic disk, an optical disk, a Read-Only Memory (ROM), or a Random Access Memory (RAM).
Referring to fig. 9, as an implementation of the method shown in fig. 2, the present application provides an embodiment of a data query apparatus based on a big data query engine, where the embodiment of the apparatus corresponds to the embodiment of the method shown in fig. 2, and the apparatus may be applied to various electronic devices.
As shown in fig. 9, the data query apparatus based on big data query engine of the present embodiment includes: an item identifier obtaining module 71, a configuration file selecting module 72, a target version branching module 73, a warehouse source code pulling module 74, a judgment result obtaining module 75, and a successfully constructed item module 76, where:
a query statement acquisition module 71, configured to acquire an SQL query statement from a user side;
a query statement parsing module 72, configured to parse the SQL query statement to obtain a corresponding query table and filter information in the SQL query statement;
a target query table generating module 73, configured to obtain associated file information corresponding to the query table, and generate a target query table according to the associated file information and the filtering information;
the query statement matching module 74 is configured to obtain a preset matching rule, and perform matching processing on the SQL query statement according to the preset matching rule to obtain a matched SQL query statement;
a query table replacing module 75, configured to replace the corresponding query table in the matched SQL query statement with the target query table to obtain a target SQL query statement;
and the query result generating module 76 is configured to perform data query on the target SQL query statement by using the big data query engine to obtain a query result, and send the query result to the user side.
Further, the query statement parsing module 72 includes:
the execution plan acquisition unit is used for analyzing the SQL query statement to acquire an execution plan in the SQL query statement;
the metadata extraction unit is used for extracting metadata from the execution plan to obtain metadata information in the execution plan;
and the data screening processing unit is used for carrying out data screening processing on the metadata information to obtain a query table and filtering information.
Further, the execution plan acquisition unit includes:
the query statement identification subunit is used for identifying the keywords and the identifiers in the SQL query statement through the lexical analyzer;
the grammar tree generation subunit is used for carrying out grammar building on the keywords and the identifiers according to the grammar analyzer to generate a grammar tree;
and the syntax tree compiling and analyzing subunit is used for compiling and analyzing the syntax tree by using a compiler to obtain an execution plan.
Further, the data screening processing unit comprises:
the metadata information identification subunit is used for identifying header information, date screening information and file field information in the metadata information;
the query table acquiring subunit is used for acquiring a query table in the metadata information according to the header information;
and the filtering information generating unit is used for combining the date screening information and the file field information to generate filtering information.
Further, the target look-up table generating module 73 includes:
the associated file information acquisition unit is used for extracting the file information in the query table and acquiring the associated file information according to the file information;
the file list generating unit is used for integrating the associated file information and the filtering information to generate a file list;
and the target query table constructing unit is used for constructing and generating a target query table according to the file list.
Further, the look-up table replacing module 75 includes:
the basic query table acquisition unit is used for identifying a corresponding query table in the matched SQL query statement as a basic query table;
and the target SQL query statement generating unit is used for replacing the basic query table with the target query table to obtain the target SQL query statement.
Further, the query result generation module 76 includes:
the file header information acquisition unit is used for analyzing the target SQL query statement to obtain file header information;
the file list generating unit is used for generating a file list to be scanned according to the file header information;
and the query result sending unit is used for scanning the files of the file list to be scanned through the big data query engine to obtain a query result and sending the query result to the user side.
It is emphasized that, in order to further ensure the privacy and security of the query result, the query result may also be stored in a node of a blockchain.
In order to solve the technical problem, an embodiment of the present application further provides a computer device. Referring to fig. 10, fig. 10 is a block diagram of a basic structure of a computer device according to the present embodiment.
The computer device 8 includes a memory 81, a processor 82, and a network interface 83 communicatively connected to each other via a system bus. It is noted that only a computer device 8 having three components, a memory 81, a processor 82, and a network interface 83, is shown, but it is understood that not all of the shown components are required to be implemented, and that more or fewer components may be implemented instead. As will be understood by those skilled in the art, the computer device is a device capable of automatically performing numerical calculation and/or information processing according to a preset or stored instruction, and the hardware includes, but is not limited to, a microprocessor, an Application Specific Integrated Circuit (ASIC), a Programmable Gate Array (FPGA), a Digital Signal Processor (DSP), an embedded device, and the like.
The computer device may be a desktop computer, a notebook, a palm computer, a cloud server, or other computing devices. The computer equipment can carry out man-machine interaction with a user through a keyboard, a mouse, a remote controller, a touch panel or voice control equipment and the like.
The memory 81 includes at least one type of readable storage medium including a flash memory, a hard disk, a multimedia card, a card type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Read Only Memory (ROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a Programmable Read Only Memory (PROM), a magnetic memory, a magnetic disk, an optical disk, etc. In some embodiments, the storage 81 may be an internal storage unit of the computer device 8, such as a hard disk or a memory of the computer device 8. In other embodiments, the memory 81 may be an external storage device of the computer device 8, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), or the like provided on the computer device 8. Of course, the memory 81 may also include both internal and external storage devices of the computer device 8. In this embodiment, the memory 81 is generally used for storing an operating system installed in the computer device 8 and various types of application software, such as program codes of a data query method based on a big data query engine. Further, the memory 81 may also be used to temporarily store various types of data that have been output or are to be output.
Processor 82 may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor, or other data Processing chip in some embodiments. The processor 82 is typically used to control the overall operation of the computer device 8. In this embodiment, the processor 82 is configured to execute the program code stored in the memory 81 or process data, for example, execute the program code of the data query method based on the big data query engine, so as to implement various embodiments of the data query method based on the big data query engine.
The network interface 83 may include a wireless network interface or a wired network interface, and the network interface 83 is generally used to establish communication connections between the computer device 8 and other electronic devices.
The present application further provides another embodiment, which is to provide a computer-readable storage medium, wherein the computer-readable storage medium stores a computer program, and the computer program can be executed by at least one processor, so as to cause the at least one processor to execute the steps of the data query method based on the big data query engine.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method of the embodiments of the present application.
The block chain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.
It is to be understood that the above-described embodiments are merely illustrative of some, but not restrictive, of the broad invention, and that the appended drawings illustrate preferred embodiments of the invention and do not limit the scope of the invention. This application is capable of embodiments in many different forms and is provided for the purpose of enabling a thorough understanding of the disclosure of the application. Although the present application has been described in detail with reference to the foregoing embodiments, it will be apparent to one skilled in the art that the present application may be practiced without modification or with equivalents of some of the features described in the foregoing embodiments. All equivalent structures made by using the contents of the specification and the drawings of the present application are directly or indirectly applied to other related technical fields and are within the protection scope of the present application.

Claims (10)

1. A data query method based on a big data query engine is characterized by comprising the following steps:
acquiring SQL query sentences from a user side;
analyzing the SQL query statement to acquire a corresponding query table and filtering information in the SQL query statement;
acquiring associated file information corresponding to the query table, and generating a target query table according to the associated file information and the filtering information;
acquiring a preset matching rule, and performing matching processing on the SQL query statement through the preset matching rule to obtain a matched SQL query statement;
replacing the corresponding query table in the matched SQL query statement by the target query table to obtain a target SQL query statement;
and performing data query on the target SQL query statement through a big data query engine to obtain a query result, and sending the query result to the user side.
2. The big data query engine-based data query method according to claim 1, wherein the parsing the SQL query statement to obtain the corresponding query table and filtering information in the SQL query statement comprises:
analyzing the SQL query statement to acquire an execution plan in the SQL query statement;
extracting metadata from the execution plan to obtain metadata information in the execution plan;
and performing data screening processing on the metadata information to obtain the query table and the filtering information.
3. The big data query engine-based data query method according to claim 1, wherein the parsing the SQL query statement to obtain the execution plan in the SQL query statement comprises:
identifying keywords and identifiers in the SQL query statement through a lexical analyzer;
according to a grammar analyzer, grammar building is carried out on the key words and the identifiers, and a grammar tree is generated;
and compiling and analyzing the syntax tree by using a compiler to obtain the execution plan.
4. The big data query engine-based data query method according to claim 2, wherein the performing data filtering processing on the metadata information to obtain the query table and the filtering information comprises:
identifying header information, date screening information and file field information in the metadata information;
acquiring the query table in the metadata information according to the header information;
and combining the date screening information and the file field information to generate the filtering information.
5. The big data query engine-based data query method according to claim 1, wherein the obtaining of associated file information corresponding to the query table and the generating of the target query table according to the associated file information and the filtering information comprises:
extracting file information in the query table, and acquiring the associated file information according to the file information;
integrating the associated file information and the filtering information to generate a file list;
and constructing and generating the target query table according to the file list.
6. The big data query engine-based data query method according to claim 1, wherein the replacing the target query table with the corresponding query table in the matched SQL query statement to obtain the target SQL query statement comprises:
identifying a corresponding query table in the matched SQL query statement as a basic query table;
and replacing the basic query table with the target query table to obtain the target SQL query statement.
7. The big data query engine-based data query method according to any one of claims 1 to 6, wherein the data query of the target SQL query statement by the big data query engine to obtain a query result, and the sending of the query result to the user side includes:
analyzing the target SQL query statement to obtain file header information;
generating a file list to be scanned according to the file header information;
and scanning the files of the file list to be scanned by the big data query engine to obtain the query result, and sending the query result to the user side.
8. A data query device based on a big data query engine is characterized by comprising:
the query statement acquisition module is used for acquiring SQL query statements from a user side;
the query statement analysis module is used for analyzing the SQL query statement to acquire a corresponding query table and filtering information in the SQL query statement;
the target query table generating module is used for acquiring the associated file information corresponding to the query table and generating a target query table according to the associated file information and the filtering information;
the query statement matching module is used for acquiring a preset matching rule and matching the SQL query statement according to the preset matching rule to obtain a matched SQL query statement;
the query table replacing module is used for replacing the corresponding query table in the matched SQL query statement by the target query table to obtain the target SQL query statement;
and the query result generation module is used for performing data query on the target SQL query statement through a big data query engine to obtain a query result and sending the query result to the user side.
9. A computer device comprising a memory and a processor, wherein the memory stores a computer program, and the processor implements the big data query engine-based data query method according to any one of claims 1 to 7 when executing the computer program.
10. A computer-readable storage medium, wherein a computer program is stored on the computer-readable storage medium, and when executed by a processor, the computer program implements the big data query engine-based data query method according to any one of claims 1 to 7.
CN202110193443.7A 2021-02-20 2021-02-20 Data query method, device, equipment and medium based on big data query engine Active CN112860727B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110193443.7A CN112860727B (en) 2021-02-20 2021-02-20 Data query method, device, equipment and medium based on big data query engine

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110193443.7A CN112860727B (en) 2021-02-20 2021-02-20 Data query method, device, equipment and medium based on big data query engine

Publications (2)

Publication Number Publication Date
CN112860727A true CN112860727A (en) 2021-05-28
CN112860727B CN112860727B (en) 2024-01-12

Family

ID=75988378

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110193443.7A Active CN112860727B (en) 2021-02-20 2021-02-20 Data query method, device, equipment and medium based on big data query engine

Country Status (1)

Country Link
CN (1) CN112860727B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114238286A (en) * 2022-02-28 2022-03-25 连连(杭州)信息技术有限公司 Data warehouse data processing method and device, electronic equipment and storage medium
CN114443691A (en) * 2022-01-18 2022-05-06 苏州浪潮智能科技有限公司 Database query tuning method, system and computer equipment
CN114760369A (en) * 2022-04-14 2022-07-15 曙光网络科技有限公司 Protocol metadata extraction method, device, equipment and storage medium
CN115563167A (en) * 2022-12-02 2023-01-03 浙江大华技术股份有限公司 Data query method, electronic device and computer-readable storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180218030A1 (en) * 2017-01-31 2018-08-02 Salesforce.Com, Inc. Systems, methods, and apparatuses for implementing a by orgid command term within a multi-tenant aware structured query language
CN109656947A (en) * 2018-11-09 2019-04-19 金蝶软件(中国)有限公司 Data query method, apparatus, computer equipment and storage medium
CN111522816A (en) * 2020-04-16 2020-08-11 云和恩墨(北京)信息技术有限公司 Data processing method, device, terminal and medium based on database engine
US20200272614A1 (en) * 2017-06-19 2020-08-27 Huawei Technologies Co., Ltd. Index Update Method and System, and Related Apparatus
CN111639078A (en) * 2020-05-25 2020-09-08 北京百度网讯科技有限公司 Data query method and device, electronic equipment and readable storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180218030A1 (en) * 2017-01-31 2018-08-02 Salesforce.Com, Inc. Systems, methods, and apparatuses for implementing a by orgid command term within a multi-tenant aware structured query language
US20200272614A1 (en) * 2017-06-19 2020-08-27 Huawei Technologies Co., Ltd. Index Update Method and System, and Related Apparatus
CN109656947A (en) * 2018-11-09 2019-04-19 金蝶软件(中国)有限公司 Data query method, apparatus, computer equipment and storage medium
CN111522816A (en) * 2020-04-16 2020-08-11 云和恩墨(北京)信息技术有限公司 Data processing method, device, terminal and medium based on database engine
CN111639078A (en) * 2020-05-25 2020-09-08 北京百度网讯科技有限公司 Data query method and device, electronic equipment and readable storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
吴洁明;周锦;: "基于Oracle数据库SQL查询语句优化规则的研究", 陕西理工学院学报(自然科学版), no. 04 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114443691A (en) * 2022-01-18 2022-05-06 苏州浪潮智能科技有限公司 Database query tuning method, system and computer equipment
CN114443691B (en) * 2022-01-18 2024-01-23 苏州浪潮智能科技有限公司 Database query optimization method, system and computer equipment
CN114238286A (en) * 2022-02-28 2022-03-25 连连(杭州)信息技术有限公司 Data warehouse data processing method and device, electronic equipment and storage medium
CN114238286B (en) * 2022-02-28 2022-08-05 连连(杭州)信息技术有限公司 Data warehouse data processing method and device, electronic equipment and storage medium
CN114760369A (en) * 2022-04-14 2022-07-15 曙光网络科技有限公司 Protocol metadata extraction method, device, equipment and storage medium
CN114760369B (en) * 2022-04-14 2023-12-19 曙光网络科技有限公司 Protocol metadata extraction method, device, equipment and storage medium
CN115563167A (en) * 2022-12-02 2023-01-03 浙江大华技术股份有限公司 Data query method, electronic device and computer-readable storage medium

Also Published As

Publication number Publication date
CN112860727B (en) 2024-01-12

Similar Documents

Publication Publication Date Title
CN112860727B (en) Data query method, device, equipment and medium based on big data query engine
CN109582691B (en) Method and apparatus for controlling data query
CN105868204B (en) A kind of method and device for converting Oracle scripting language SQL
CN112506946A (en) Service data query method, device, equipment and storage medium
CN109522341B (en) Method, device and equipment for realizing SQL-based streaming data processing engine
CN111309760A (en) Data retrieval method, system, device and storage medium
CN113407785B (en) Data processing method and system based on distributed storage system
JP2012113706A (en) Computer-implemented method, computer program, and data processing system for optimizing database query
CN101055566B (en) Function collection method and device of electronic data table
CN113312377A (en) Automatic-association SQL query statement processing method and device and electronic equipment
CN110727659A (en) Decision tree model generation method, device, equipment and medium based on SQL (structured query language) statement
CN1601524A (en) Fuzzy inquiry system and method
CN113297251A (en) Multi-source data retrieval method, device, equipment and storage medium
CN104778232A (en) Searching result optimizing method and device based on long query
CN111782820A (en) Knowledge graph creating method and device, readable storage medium and electronic equipment
CN116010662A (en) Construction method, device and medium of energy consumption-carbon emission query system
CN113934430A (en) Data retrieval analysis method and device, electronic equipment and storage medium
CN112527880B (en) Method, device, equipment and medium for collecting metadata information of big data cluster
CN114756532A (en) Multi-source heterogeneous data acquisition method and device based on cultural Tianmao and electronic equipment
CN111126034B (en) Medical variable relation processing method and device, computer medium and electronic equipment
CN113505143A (en) Statement type conversion method and device, storage medium and electronic device
CN111178025A (en) Editing method and device of nuclear power plant operation guide rules, computer equipment and storage medium
CN113626867A (en) Data permission filtering method and device, computer equipment and storage medium
CN117251472B (en) Cross-source data processing method, device, equipment and storage medium
CN115168408A (en) Query optimization method, device, equipment and storage medium based on reinforcement learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant