CN113836175A - Data access method, device, equipment and storage medium - Google Patents

Data access method, device, equipment and storage medium Download PDF

Info

Publication number
CN113836175A
CN113836175A CN202010592659.6A CN202010592659A CN113836175A CN 113836175 A CN113836175 A CN 113836175A CN 202010592659 A CN202010592659 A CN 202010592659A CN 113836175 A CN113836175 A CN 113836175A
Authority
CN
China
Prior art keywords
instruction
data
data access
rule
data processing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010592659.6A
Other languages
Chinese (zh)
Inventor
徐陇浙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Uniview Technologies Co Ltd
Original Assignee
Zhejiang Uniview Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Uniview Technologies Co Ltd filed Critical Zhejiang Uniview Technologies Co Ltd
Priority to CN202010592659.6A priority Critical patent/CN113836175A/en
Publication of CN113836175A publication Critical patent/CN113836175A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24564Applying rules; Deductive queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • G06F16/24534Query rewriting; Transformation
    • G06F16/24549Run-time optimisation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the application discloses a data access method, a data access device, data access equipment and a storage medium. The method comprises the following steps: determining whether a data access instruction hits a candidate instruction rule, and determining the candidate instruction rule hit by the data access instruction as a target instruction rule; determining a target data processing engine according to the target instruction rule; and processing the data access instruction through the target data processing engine. According to the scheme, the data processing engine suitable for processing the data access instruction can be automatically determined according to the data access instruction, so that the data access instruction is processed through the engine, technicians do not need to manually select the data processing engine, the data access instruction processing efficiency is improved, and the difficulty of data service development is reduced.

Description

Data access method, device, equipment and storage medium
Technical Field
The embodiment of the application relates to the technical field of data processing, in particular to a data access method, a data access device, data access equipment and a storage medium.
Background
In the current big data ecology, different data processing engines can provide different functional services. For example, a relational data processing engine may achieve high concurrency performance, but may not support access to large amounts of data; the nosql database can support the access of mass data, but cannot support the query of complex sentences; the search engine can support access of mass data and query of richer query grammar, but the performance is more moderate; the hadoop-based offline computing engine can process a large amount of data, but the processing efficiency is low.
When the business personnel use the data processing engine, the proper data processing engine needs to be selected according to the business scene of the business personnel so as to formulate a reasonable data storage and query strategy. In addition, the data processing engine often provides different interfaces or DSLs, so that in order to ensure more efficient work, the data processing engine needs to be deeply understood when a service is developed, and the development difficulty of a big data service is improved.
Disclosure of Invention
The embodiment of the application provides a data access method, a data access device, data access equipment and a storage medium, so that an applicable data processing engine is automatically determined according to a data access instruction, and data processing is directly performed through the engine.
In one embodiment, an embodiment of the present application provides a data access method, including:
determining whether a data access instruction hits a candidate instruction rule, and determining the candidate instruction rule hit by the data access instruction as a target instruction rule;
determining a target data processing engine according to the target instruction rule;
and processing the data access instruction through the target data processing engine.
In another embodiment, an embodiment of the present application further provides a data access apparatus, including:
the target instruction rule determining module is used for determining whether the data access instruction hits the candidate instruction rule or not and determining the candidate instruction rule hit by the data access instruction as the target instruction rule;
the target data processing engine determining module is used for determining a target data processing engine according to the target instruction rule;
and the processing module is used for processing the data access instruction through the target data processing engine.
In another embodiment, an embodiment of the present application further provides an apparatus, including: one or more processors;
a memory for storing one or more programs;
when the one or more programs are executed by the one or more processors, the one or more processors implement the data access method of any one of the embodiments of the present application.
In yet another embodiment, the present application further provides a computer-readable storage medium, on which a computer program is stored, which when executed by a processor implements the data access method according to any one of the embodiments of the present application.
In the embodiment of the application, whether a data access instruction hits a candidate instruction rule or not is determined, the candidate instruction rule hit by the data access instruction is determined as a target instruction rule, so that a rule condition corresponding to the data access instruction is determined, a target data processing engine suitable for processing the data access instruction is determined according to the rule condition corresponding to the data processing engine, the data processing access instruction is processed through the target data processing engine, a data processing engine does not need to be manually selected by a technician, the efficiency of processing the data access instruction is improved, and the difficulty of data service development is reduced.
Drawings
FIG. 1 is a flow chart of a data access method provided by an embodiment of the invention;
FIG. 2 is a flow chart of a data access method according to another embodiment of the present invention;
FIG. 3 is a schematic structural diagram of a data access device according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of a data access device according to an embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.
Fig. 1 is a flowchart of a data access method according to an embodiment of the present invention. The data access method provided by the embodiment can be applied to the case of processing the data access instruction through different data processing engines, and typically, the embodiment of the present application can be applied to the case of automatically selecting an applicable target data processing engine according to the data access instruction so as to process the data access instruction through the target data processing engine. The method may particularly be performed by a data access arrangement, which may be implemented in software and/or hardware, which may be integrated in a data access device. Referring to fig. 1, the method of the embodiment of the present application specifically includes:
s110, determining whether the data access instruction hits the candidate instruction rule, and determining the candidate instruction rule hit by the data access instruction as a target instruction rule.
The data access instruction may be an instruction for operating on data, such as an SQL statement, where the candidate instruction rule may be an instruction rule written in advance according to a functional characteristic of the candidate data processing engine, where the instruction rule includes an access type and an access condition of the instruction. For example, the functional characteristics of the HBase include that only data access instructions without functions in the select clause can be responded to and processed, and the corresponding candidate instruction rule can be set as "Project rule: match data selection operations, and no sub-functions are included. And if the data access instruction meets the condition of the candidate instruction rule, determining the candidate instruction rule as the target instruction rule.
In an embodiment of the present application, determining whether a data access instruction hits in a candidate instruction rule includes: matching the access type in the data access instruction with the access type in the candidate instruction rule; if the matching is successful, determining whether the access condition in the data access instruction hits the access condition in the candidate instruction rule.
Illustratively, for a data access instruction "select c1from t", it is compared with a candidate instruction rule, where the "select" access type in the data access instruction hits "data selecting operation" in the Project rule, and the data access instruction satisfies the condition that the Project rule does not include a subfunction, and therefore, the data access instruction hits the candidate instruction rule "Project rule: match data selection operations, and no sub-functions are included. The candidate instruction rule is determined as the target instruction rule.
Whether the data access instruction hits the candidate instruction rule or not is determined, and the hit candidate instruction rule is determined as the target instruction rule, so that the condition met by the data access instruction is analyzed, a data processing engine capable of meeting the condition of the data access instruction is selected conveniently, the data access instruction is processed, the automation and the intellectualization of the data processing engine selection are realized, and the data processing efficiency is improved.
And S120, determining a target data processing engine according to the target instruction rule.
Since different data processing engines have respective functional characteristics, and the functional characteristics of different data processing engines may be different, for a data access instruction, a data processing engine suitable for processing the data access instruction needs to be selected according to the functional characteristics of the data processing engine. For example, the HBase data processing engine is selected when the following conditions are met: no function in the select clause; the where clause contains only the primary key as a condition or no condition, and the condition operators are ═ < >, |! The condition operand is a literal quantity; there is no function, expression or field reference, where the Like operation can only be a left Like operation, and not a right Like operation or a left-right Like operation. The orderby clause only contains a main key or a timestamp field; there are no group by clauses; the Join statement has no multi-table Join; there is no union all operation, but there may be union operations. When the conditions of the HBase data processing engine are not met, the HBase + ElasticSearch data processing engine is selected to be used: the select clause has no function or has a function, but the function is at least one of count, sum, aver, max and min. Only semi-join the semi join statement, and the data volume of the join is not more than a preset number threshold; does not contain other types of Join statements; there are no union statements and union all statements. When the condition of HBase + ElasticSearch is not satisfied and the SQL statement satisfies the following condition, the GreenPlum data processing engine is used: there is no multi-table Join operation. When the condition of the greenplus data processing engine is not satisfied, the Hadoop queue data processing engine is used.
In the embodiment of the present application, since the candidate instruction rule is written in advance according to the functional characteristics of the candidate data processing engine, the corresponding target data processing engine may be determined according to the candidate instruction rule hit by the data access instruction.
In this embodiment of the present application, determining a target data processing engine according to the target instruction rule includes: and determining the target data processing engine from the candidate data processing engines according to the target instruction rule and the incidence relation between the candidate instruction rule and the candidate data processing engines. Before determining a target data processing engine from the candidate data processing engines based on the target instruction rule and the association of the candidate instruction rule with the candidate data processing engines, the method further comprises: writing a candidate instruction rule corresponding to the candidate data processing engine according to the functional characteristics of the candidate data processing engine; and establishing the incidence relation between the candidate data processing engine and the corresponding candidate instruction rule.
The candidate data processing engine can be an HBase data processing engine, an HBase + ElasticSearch data processing engine, a GreePlum data processing engine, a Hadoop partial data processing engine and the like. For example, since the association relationship between the candidate instruction rule and the candidate data processing engine is established in advance, the target data processing engine corresponding to the target instruction rule hit by the data access instruction may be determined according to the association relationship. And determining the corresponding target data processing engine according to the target rule instruction and the incidence relation, thereby realizing automatic selection of the applicable target data processing engine according to the condition corresponding to the target instruction rule and improving the data processing efficiency.
And S130, processing the data access instruction through the target data processing engine.
Because the data access instruction hits the target instruction rule, and the target instruction rule corresponds to the target data processing engine, the target data processing engine can meet the processing requirement on the data access instruction, has the functional characteristic of processing the data access instruction, and can process the data access instruction through the target data processing engine, so that the data access instruction can be timely and efficiently processed, a technician does not need to manually select an appropriate data processing engine according to the data access instruction for processing, the problems of strict technical requirements and limitations on the technician are solved, and the data processing efficiency is improved.
In the embodiment of the application, before the data query is performed, a data warehousing operation is further included. Specifically, data enters an SQL analysis module in the form of SQL insert sentences, the SQL analysis module analyzes SQL into AST and performs data verification, the AST is analyzed into structured data again, the structured data is stored in a Kafka database in a serialized mode, the Kafka database is inserted into a database corresponding to an HBase data processing engine and a database corresponding to a Hadoop queue data processing engine through a Flink consumption engine, and data of the HBase is synchronously inserted into a database corresponding to an ElasticSearch data processing engine through an HBase coprocessor. And cleaning the data in the database corresponding to the Hadoop partial data processing engine at intervals by using Spark, such as data deduplication and file sorting, so as to ensure the data consistency with other data processing engine databases. Similarly, the timestamp field of the database corresponding to the Hbase data processing engine is also assigned as the warehousing time of the data entering the database, different Kafka Commumergroup is used for data consumption of different data processing engines, and offset values consumed by different data and the warehousing time of currently consumed data can be obtained through the group, so that the current data warehousing progress of the whole system is calculated.
In the embodiment of the application, whether a data access instruction hits a candidate instruction rule or not is determined, the candidate instruction rule hit by the data access instruction is determined as a target instruction rule, so that a rule condition corresponding to the data access instruction is determined, a target data processing engine suitable for processing the data access instruction is determined according to the rule condition corresponding to the data processing engine, the data processing access is processed through the target data processing engine, a data processing engine does not need to be manually selected by a technician, the efficiency of processing the data access instruction is improved, and the difficulty of data service development is reduced.
Fig. 2 is a flowchart of a data access method according to another embodiment of the present invention. In the embodiment of the present application, details that are not described in detail in the embodiment are described in detail in order to optimize the embodiment. Referring to fig. 2, the data access method provided in this embodiment may include:
and S210, writing a candidate instruction rule corresponding to the candidate data processing engine according to the functional characteristics of the candidate data processing engine.
For example, since different data processing engines may have different functional characteristics, for example, if a function is not included in the select clause, the HBase data processing engine may be selected for processing, and if a function is included, the HBase data processing engine may not perform processing, and another data processing engine may be selected for processing. In general, a technician needs to know the functional characteristics of each data processing engine, determine whether the data processing engine can process the data access instruction according to the functional characteristics of each data processing engine, if so, select the data processing engine to process, and if not, select the data access instruction to process according to the functional characteristics of other data processing engines. In the embodiment of the application, the candidate instruction rules corresponding to the candidate data processing engines are compiled according to the functional characteristics of the candidate data processing engines, and the candidate instruction rules are stored without the need of deep understanding of the functional characteristics of the data processing engines by technicians, so that the limitation on the technicians is reduced, the target data processing engines are rapidly determined, and the data access instructions are processed in time.
S220, establishing the incidence relation between the candidate data processing engine and the corresponding candidate instruction rule.
In order to facilitate subsequent determination of target data processing engines corresponding to target instruction rules, in the embodiment of the present application, after determining the candidate instruction rules of each candidate data processing engine, an association relationship between the candidate data processing engine and the corresponding candidate instruction rule is established and stored. Because the candidate instruction rule comprises the target instruction rule, the target data processing engine corresponding to the target instruction rule can be determined according to the incidence relation between the candidate instruction rule and the candidate data processing engine, so that the intelligent adaptive selection of the data processing engine is realized without manual selection.
S230, determining whether the data access instruction hits the candidate instruction rule, and determining the candidate instruction rule hit by the data access instruction as a target instruction rule.
In an embodiment of the present application, determining whether a data access instruction hits in a candidate instruction rule, and determining the candidate instruction rule hit by the data access instruction as a target instruction rule includes: determining whether the data access instruction hits a candidate instruction rule corresponding to the HBase data processing engine; if not, determining whether the data access instruction hits candidate instruction rules corresponding to the HBase and the ElasticSearch data processing engine; if not, determining whether the data access instruction hits a candidate instruction rule corresponding to the GreenPlum data processing engine; and if not, determining whether the data access instruction hits a candidate instruction rule corresponding to the Hadoop partial data processing engine.
Illustratively, since the processing speed of the HBase data processing engine is fastest, but the range capable of processing is smallest, it is determined preferentially whether the processing of the data access instruction can be performed by the HBase data processing engine. The processing speed of each data processing engine is sequentially ordered from high to low as follows: HBase, HBase and elastic search, GreenPlum, Hadoop park. Thus, the candidate instruction rules for data access instruction hits may be determined according to the above ordered priority. Of course, the different data processing engines may also be ordered according to actual needs, and are not limited specifically here.
Determining whether the data access instruction hits a candidate instruction rule corresponding to the HBase data processing engine includes: if the data access instruction hits a data scanning operation in a TableScan rule, and/or if the data access instruction hits a data selecting operation in a Project rule and does not include a subfunction, and/or if the data access instruction hits a data sorting operation in a Sort rule and only one sorting field exists, and/or if the data access instruction hits a data filtering operation in a Filter rule and a condition of the data filtering operation only includes a primary key value or an unconditional condition, and/or if the data access instruction hits a data half-Join operation in a Join rule and no multi-table Join operation, and/or if the data access instruction hits a data merging operation in a Union rule, determining a candidate instruction rule corresponding to the data access instruction hit HBase data processing engine. Determining whether the data access instruction hits candidate instruction rules corresponding to the HBase and the ElasticSearch data processing engine includes: if the data access instruction does not hit the candidate instruction rule corresponding to the HBase data processing engine, and if the data access instruction hits the data selection operation of the Project rule, and no subfunction or function is at least one of count, sum, aver, max and min, and/or if the data access instruction hits the data filtering operation of the Filter rule, and does not contain a function, fuzzy query exists, and/or if the data access instruction hits the data half-Join operation of the Join rule, and the data volume of the Join is smaller than a preset number threshold, it is determined that the data access instruction hits the candidate instruction rule corresponding to the HBase and the ElasticSearch data processing engine. Determining whether the data access instruction hits a candidate instruction rule corresponding to the greenplus data processing engine includes: and if the data access instruction does not hit the candidate instruction rule corresponding to the HBase data processing engine or the candidate instruction rules corresponding to the HBase and ElasticSearch data processing engines, and when the data access instruction hits the data half-Join operation of the Join rule, no multi-table Join operation exists, determining that the data access instruction hits the candidate instruction rule corresponding to the GreenPlum data processing engine. Determining whether the data access instruction hits a candidate instruction rule corresponding to the Hadoop partial data processing engine, wherein the determining includes: and if the data access instruction does not hit the candidate instruction rule corresponding to the HBase data processing engine, the candidate instruction rules corresponding to the HBase and ElasticSearch data processing engines or the candidate instruction rule corresponding to the GreenPlum data processing engine, determining that the data access instruction hits the candidate instruction rule corresponding to the Hadoop partial data processing engine.
Specifically, the SQL query statement is parsed into AST by the call Core SQL, and is processed for Project, TableScan, Join, Filter, Sort, Union operations:
1) TableScan rule: and (3) matching data scanning operation, namely a TableScan node, replacing the TableScan object matched into the statement with HBaseTablescan, namely, selecting to use an HBase data processing engine for processing a data access instruction by default.
2) Project rule: and matching data selection operation, namely a select clause in SQL, judging whether the Project comprises a function or not when the rule is satisfied, if so, judging whether the function comprises at least one of count, sum, aver, max and min, if so, copying a replacing Project subtree into a new equivalent subtree OptProject, and replacing the tableCan object with an ElasticSearchBaseTableCan object, namely, selecting to use an HBase + ElasticSearch data processing engine for data processing. If not, traversing the Project subtree nodes, copying and replacing the Project subtree to be a new equivalent subtree OptProject, and replacing the TableScan object with the ParquetTableScan object, namely, selecting to use a Hadoop partial data processing engine to process the data access instruction. And if the function is not contained, selecting to use the HBase data processing engine for processing the data access instruction.
3) The Sort rule: and (3) matching the data sorting operation, judging whether the sorting field has only a unique key or not when the rule is met, and selecting to use the HBase data processing engine to process the data access instruction if the sorting field has only the unique key. If the order is other, copying the replacing Sort subtree to be a new equivalent subtree OptSort, and replacing the TableScan with an elastic SearchBaseTableScan, namely, selecting to use an HBase + elastic Search data processing engine for data processing.
4) Filter rule: and matching data filtering operation, namely corresponding to the where condition in the SQL, and judging whether the query condition comprises a function or not when the rule is satisfied: if the function is included, copying the replacement Filter subtree into a new equivalent subtree OptFilter, and replacing the TableScan with GreePlum TableScan, namely, selecting to use a GreePlum data processing engine to process the data access instruction. If the function is not included, judging whether fuzzy query exists, if so, copying and replacing a Filter subtree to be a new equivalent subtree OptFilter, and replacing the TableScan with an elastic SearchBaseTablescan, namely, selecting to use an HBase + elastic search data processing engine for data processing. If the fuzzy query does not exist, if the query condition only has the unique ID, the HBase data processing engine is selected to process the data access instruction, otherwise, the Filter subtree is copied and replaced to be a new equivalent subtree OptFilter, and the TableScan is replaced by the GreePlum TableScan, namely, the GreePlum data processing engine is selected to process the data access instruction. If the LIKE operation is included, if the LIKE operation is the left LIKE operation, the HBase data processing engine is selected to process the data access instruction, otherwise, the Filter subtree is replaced to be a new equivalent subtree OptFilter, and the TableScan is replaced by the GreePlum TableScan, namely, the GreePlum data processing engine is selected to process the data access instruction.
5) The Join rule: matching Join operations between data tables, such as left Join, right Join, innerjoin or outer Join, when the rule is satisfied, checking the matched Join object and the left and right subtrees thereof, determining whether the Join object is a SEMI Join operation, namely in, exit and the like, if the Join object is the SEMI Join operation, acquiring a Filter condition of a query on the right side of the SEMI Join operation, and calling an ElasticSearch native API according to the condition to query the total number meeting the condition. If the total number is larger than the preset number threshold value, copying the replacement Join subtree into a new equivalent subtree Optjoin, and replacing the ElasticSearchHBaseTableScan on the left side and the right side of the SEMI JOIN operation with ParquetTableScan, namely selecting to use a Hadoop partial data processing engine for processing a data access instruction. And if the sum is not greater than the threshold value, selecting to use the HBase data processing engine for processing the data access instruction. If the JOIN statement is of other types, the replacing JOIN subtree is copied to be a new equivalent subtree Optjoin, and the ElasticSearchHBaseTableSescan on the left side and the right side of the JOIN statement is replaced by the ParquetTableScan, namely, the Hadoop partial data processing engine is selected to process the data access instruction.
6) Union rules: and matching the Union and Union ALL operations, checking whether the operation is a Union ALL operation or not when the rule is satisfied, copying and replacing a Union subtree to be a new equivalent subtree OptUnion and replacing ALL TableScan in the subtree with ParquetTablscan if the operation is the Union ALL operation, namely selecting to use a Hadoop partial data processing engine to process the data access instruction. And if the Union operation is a common type Union operation, selecting to use an HBase data processing engine to process the data access instruction.
S240, determining a target data processing engine according to the target instruction rule.
And S250, processing the data access instruction through the target data processing engine.
Prior to processing the data access instruction by the target data processing engine, the method further comprises: and carrying out standardized packaging on the interfaces corresponding to the candidate data processing engines, and determining a standard interface for receiving the data access instruction.
The interfaces of the candidate data processing engines are packaged, so that a standard interface is provided for the outside, each data processing engine is abstracted to the outside as a database, a user only needs to create a data table, a data query scheme after different data storage schemes do not need to be considered, and unified convenient processing is achieved.
According to the technical scheme of the embodiment of the application, the candidate instruction rules corresponding to the candidate data processing engines are compiled according to the functional characteristics of the candidate data processing engines, and the candidate instruction rules are stored without the need of deep understanding of the functional characteristics of the data processing engines by technicians, so that the requirements on the technicians are reduced, the target data processing engines can be rapidly determined, and the data access instructions can be processed in time. According to the incidence relation between the candidate instruction rule and the candidate data processing engine, the target data processing engine corresponding to the target instruction rule can be determined, intelligent adaptive selection of the data processing engine is achieved, manual selection is not needed, and processing efficiency is improved.
Fig. 3 is a schematic structural diagram of a data access device according to an embodiment of the present invention. The device can be suitable for the situation that data access instructions are processed through different data processing engines, and typically, the embodiment of the application can be suitable for the situation that an applicable target data processing engine is automatically selected according to the data access instructions so as to process the data access instructions through the target data processing engine. The apparatus may be implemented in software and/or hardware, and the apparatus may be integrated in a data access device. Referring to fig. 3, the apparatus specifically includes:
a target instruction rule determining module 310, configured to determine whether a data access instruction hits in a candidate instruction rule, and determine the candidate instruction rule hit by the data access instruction as a target instruction rule;
a target data processing engine determining module 320, configured to determine a target data processing engine according to the target instruction rule;
the processing module 330 is configured to process the data access instruction through the target data processing engine.
In this embodiment, the target instruction rule determining module 310 includes:
the access type matching unit is used for matching the access type in the data access instruction with the access type in the candidate instruction rule;
and the access condition matching unit is used for determining whether the access condition in the data access instruction hits the access condition in the candidate instruction rule or not if the matching is successful.
In this embodiment, the target instruction rule determining module 310 includes:
the first hit unit is used for determining whether the data access instruction hits a candidate instruction rule corresponding to the HBase data processing engine;
the second hit unit is used for determining whether the data access instruction hits candidate instruction rules corresponding to the HBase and the ElasticSearch data processing engine if the data access instruction does not hit the candidate instruction rules;
the third hit unit is used for determining whether the data access instruction hits a candidate instruction rule corresponding to the GreenPlum data processing engine or not if the data access instruction does not hit the candidate instruction rule;
and the fourth hit unit is used for determining whether the data access instruction hits the candidate instruction rule corresponding to the Hadoop partial data processing engine if the data access instruction does not hit the candidate instruction rule.
In an embodiment of the present application, the first hit unit is specifically configured to:
if the data access command hits in the TableScan rule, and/or,
if the data access instruction hits in a data select operation in the Project rule and does not contain a subfunction, and/or,
if the data access instruction hits in the data Sort operation of the Sort rule, and there is only one Sort field, and/or,
if the data access instruction hits in the data filtering operation of the Filter rule, and the condition of the data filtering operation only includes the primary key value or unconditional, and/or,
if the data access instruction hits in the Join rule's data half Join operation and there is no multi-table Join operation, and/or,
and if the data access instruction hits the data merging operation of the Union rule, determining that the data access instruction hits a candidate instruction rule corresponding to the HBase data processing engine.
In an embodiment of the present application, the second hit unit is specifically configured to:
if the candidate instruction rule corresponding to the HBase data processing engine is not hit by the data access instruction, and,
if the data access instruction hits in the data selection operation of the Project rule, and no subfunction or function is at least one of count, sum, aver, max, and min, and/or,
if the data access instruction hits in a data filtering operation of a Filter rule and does not contain a function, there is a fuzzy query, and/or,
and if the data access instruction hits the data half-connection operation of the Join rule and the connected data amount is smaller than a preset number threshold, determining that the data access instruction hits candidate instruction rules corresponding to the HBase and the ElasticSearch data processing engine.
In an embodiment of the present application, the third hit unit is specifically configured to:
and if the data access instruction does not hit the candidate instruction rule corresponding to the HBase data processing engine or the candidate instruction rules corresponding to the HBase and ElasticSearch data processing engines, and when the data access instruction hits the data half-Join operation of the Join rule, no multi-table Join operation exists, determining that the data access instruction hits the candidate instruction rule corresponding to the GreenPlum data processing engine.
In an embodiment of the present application, the fourth hit unit is specifically configured to:
and if the data access instruction does not hit the candidate instruction rule corresponding to the HBase data processing engine, the candidate instruction rules corresponding to the HBase and ElasticSearch data processing engines or the candidate instruction rule corresponding to the GreenPlum data processing engine, determining that the data access instruction hits the candidate instruction rule corresponding to the Hadoop partial data processing engine.
In this embodiment of the application, the target data processing engine determining module 320 includes:
and the association determining unit is used for determining the target data processing engine from the candidate data processing engines according to the target instruction rule and the association relation between the candidate instruction rule and the candidate data processing engines.
In an embodiment of the present application, the apparatus further includes:
the target instruction rule compiling module is used for compiling candidate instruction rules corresponding to the candidate data processing engines according to the functional characteristics of the candidate data processing engines;
and the incidence relation establishing module is used for establishing the incidence relation between the candidate data processing engine and the corresponding candidate instruction rule.
In an embodiment of the present application, the apparatus further includes:
and the standard interface determining module is used for carrying out standardized packaging on the interface corresponding to the candidate data processing engine and determining a standard interface for receiving the data access instruction.
The data access device provided by the embodiment of the application can execute the data access method provided by any embodiment of the application, and has corresponding functional modules and beneficial effects of the execution method.
Fig. 4 is a schematic structural diagram of a data access device according to an embodiment of the present invention. FIG. 4 illustrates a block diagram of an exemplary data access device 412 suitable for use in implementing embodiments of the present application. The data access device 412 shown in fig. 4 is only an example and should not bring any limitations to the functionality or scope of use of the embodiments of the present application.
As shown in fig. 4, data access device 412 may include: one or more processors 416; the memory 428 is configured to store one or more programs, and when the one or more programs are executed by the one or more processors 416, the one or more processors 416 are enabled to implement the data access method provided in the embodiment of the present application, including:
determining whether a data access instruction hits a candidate instruction rule, and determining the candidate instruction rule hit by the data access instruction as a target instruction rule;
determining a target data processing engine according to the target instruction rule;
and processing the data access instruction through the target data processing engine.
The components of data access device 412 may include, but are not limited to: one or more processors or processors 416, a memory 428, and a bus 418 that couples the various device components including the memory 428 and the processors 416.
Bus 418 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, and a processor or local bus using any of a variety of bus architectures. By way of example, such architectures include, but are not limited to, Industry Standard Architecture (ISA) bus, micro-channel architecture (MAC) bus, enhanced ISA bus, Video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus.
Data access device 412 typically includes a variety of computer device readable storage media. These storage media may be any available storage media that can be accessed by data access device 412 and includes both volatile and nonvolatile storage media, removable and non-removable storage media.
Memory 428 can include computer-device readable storage media in the form of volatile memory, such as Random Access Memory (RAM)430 and/or cache memory 432. The data access device 412 may further include other removable/non-removable, volatile/nonvolatile computer device storage media. By way of example only, storage system 434 may be used to read from and write to non-removable, nonvolatile magnetic storage media (not shown in FIG. 4, commonly referred to as a "hard drive"). Although not shown in FIG. 4, a magnetic disk drive for reading from and writing to a removable, nonvolatile magnetic disk (e.g., a "floppy disk") and an optical disk drive for reading from or writing to a removable, nonvolatile optical disk (e.g., a CD-ROM, DVD-ROM, or other optical storage medium) may be provided. In these cases, each drive may be connected to bus 418 by one or more data storage media interfaces. Memory 428 can include at least one program product having a set (e.g., at least one) of program modules that are configured to carry out the functions of embodiments of the invention.
A program/utility 440 having a set (at least one) of program modules 442 may be stored, for instance, in memory 428, such program modules 442 including, but not limited to, an operating device, one or more application programs, other program modules, and program data, each of which examples or some combination thereof may comprise an implementation of a network environment. The program modules 442 generally perform the functions and/or methodologies of the described embodiments of the invention.
The data access device 412 may also communicate with one or more external devices 414 (e.g., keyboard, pointing device, display 426, etc.), with one or more devices that enable a user to interact with the data access device 412, and/or with any devices (e.g., network card, modem, etc.) that enable the data access device 412 to communicate with one or more other computing devices. Such communication may occur via input/output (I/O) interfaces 422. Also, data access device 412 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the internet) through network adapter 420. As shown in FIG. 4, network adapter 420 communicates with the other modules of data access device 412 via bus 418. It should be appreciated that although not shown in FIG. 4, other hardware and/or software modules may be used in conjunction with data access device 412, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID devices, tape drives, and data backup storage devices, among others.
The processor 416 executes various functional applications and data processing, such as implementing a data access method provided by embodiments of the present application, by executing at least one of the other programs stored in the memory 428.
One embodiment of the present invention provides a storage medium containing computer-executable instructions that when executed by a computer processor perform a data access method, comprising:
determining whether a data access instruction hits a candidate instruction rule, and determining the candidate instruction rule hit by the data access instruction as a target instruction rule;
determining a target data processing engine according to the target instruction rule;
and processing the data access instruction through the target data processing engine.
The computer storage media of the embodiments of the present application may take any combination of one or more computer-readable storage media. The computer readable storage medium may be a computer readable signal storage medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor device, apparatus, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In embodiments of the present application, a computer readable storage medium may be any tangible storage medium that can contain, or store a program for use by or in connection with an instruction execution apparatus, device, or apparatus.
A computer readable signal storage medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal storage medium may also be any computer readable storage medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution apparatus, device, or apparatus.
Program code embodied on a computer readable storage medium may be transmitted using any appropriate storage medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or device. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims (13)

1. A method of data access, the method comprising:
determining whether a data access instruction hits a candidate instruction rule, and determining the candidate instruction rule hit by the data access instruction as a target instruction rule;
determining a target data processing engine according to the target instruction rule;
and processing the data access instruction through the target data processing engine.
2. The method of claim 1, wherein determining whether a data access instruction hits in a candidate instruction rule comprises:
matching the access type in the data access instruction with the access type in the candidate instruction rule;
if the matching is successful, determining whether the access condition in the data access instruction hits the access condition in the candidate instruction rule.
3. The method of claim 1, wherein determining whether a data access instruction hits in a candidate instruction rule and determining the candidate instruction rule that the data access instruction hits in as a target instruction rule comprises:
determining whether the data access instruction hits a candidate instruction rule corresponding to the HBase data processing engine;
if not, determining whether the data access instruction hits candidate instruction rules corresponding to the HBase and the ElasticSearch data processing engine;
if not, determining whether the data access instruction hits a candidate instruction rule corresponding to the GreenPlum data processing engine;
and if not, determining whether the data access instruction hits a candidate instruction rule corresponding to the Hadoop partial data processing engine.
4. The method of claim 3, wherein determining whether the data access instruction hits in a candidate instruction rule corresponding to the HBase data processing engine comprises:
if the data access command hits in the TableScan rule, and/or,
if the data access instruction hits in a data select operation in the Project rule and does not contain a subfunction, and/or,
if the data access instruction hits in the data Sort operation of the Sort rule, and there is only one Sort field, and/or,
if the data access instruction hits in the data filtering operation of the Filter rule, and the condition of the data filtering operation only includes the primary key value or unconditional, and/or,
if the data access instruction hits in the Join rule's data half Join operation and there is no multi-table Join operation, and/or,
and if the data access instruction hits the data merging operation of the Union rule, determining that the data access instruction hits a candidate instruction rule corresponding to the HBase data processing engine.
5. The method of claim 3, wherein determining whether the data access instruction hits in candidate instruction rules corresponding to the HBase and the ElasticSearch data processing engines comprises:
if the candidate instruction rule corresponding to the HBase data processing engine is not hit by the data access instruction, and,
if the data access instruction hits in the data selection operation of the Project rule, and no subfunction or function is at least one of count, sum, aver, max, and min, and/or,
if the data access instruction hits in a data filtering operation of a Filter rule and does not contain a function, there is a fuzzy query, and/or,
and if the data access instruction hits the data half-connection operation of the Join rule and the connected data amount is smaller than a preset number threshold, determining that the data access instruction hits candidate instruction rules corresponding to the HBase and the ElasticSearch data processing engine.
6. The method of claim 3, wherein determining whether the data access instruction hits in a candidate instruction rule corresponding to a GreenPlum data processing engine comprises:
and if the data access instruction does not hit the candidate instruction rule corresponding to the HBase data processing engine or the candidate instruction rules corresponding to the HBase and ElasticSearch data processing engines, and when the data access instruction hits the data half-Join operation of the Join rule, no multi-table Join operation exists, determining that the data access instruction hits the candidate instruction rule corresponding to the GreenPlum data processing engine.
7. The method of claim 3, wherein determining whether the data access instruction hits in a candidate instruction rule corresponding to the Hadoop partial data processing engine comprises:
and if the data access instruction does not hit the candidate instruction rule corresponding to the HBase data processing engine, the candidate instruction rules corresponding to the HBase and ElasticSearch data processing engines or the candidate instruction rule corresponding to the GreenPlum data processing engine, determining that the data access instruction hits the candidate instruction rule corresponding to the Hadoop partial data processing engine.
8. The method of claim 1, wherein determining a target data processing engine based on the target instruction rules comprises:
and determining the target data processing engine from the candidate data processing engines according to the target instruction rule and the incidence relation between the candidate instruction rule and the candidate data processing engines.
9. The method of claim 8, wherein prior to determining a target data processing engine from the candidate data processing engines based on the target instruction rule and the association of the candidate instruction rule with the candidate data processing engines, the method further comprises:
writing a candidate instruction rule corresponding to the candidate data processing engine according to the functional characteristics of the candidate data processing engine;
and establishing the incidence relation between the candidate data processing engine and the corresponding candidate instruction rule.
10. The method of any of claims 1-9, wherein prior to processing the data access instruction by the target data processing engine, the method further comprises:
and carrying out standardized packaging on the interfaces corresponding to the candidate data processing engines, and determining a standard interface for receiving the data access instruction.
11. A data access apparatus, characterized in that the apparatus comprises:
the target instruction rule determining module is used for determining whether the data access instruction hits the candidate instruction rule or not and determining the candidate instruction rule hit by the data access instruction as the target instruction rule;
the target data processing engine determining module is used for determining a target data processing engine according to the target instruction rule;
and the processing module is used for processing the data access instruction through the target data processing engine.
12. A data access device, characterized in that the device comprises:
one or more processors;
a memory for storing one or more programs;
when executed by the one or more processors, cause the one or more processors to implement a data access method as claimed in any one of claims 1-10.
13. A computer-readable storage medium, on which a computer program is stored, which program, when being executed by a processor, carries out the data access method according to any one of claims 1 to 10.
CN202010592659.6A 2020-06-24 2020-06-24 Data access method, device, equipment and storage medium Pending CN113836175A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010592659.6A CN113836175A (en) 2020-06-24 2020-06-24 Data access method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010592659.6A CN113836175A (en) 2020-06-24 2020-06-24 Data access method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN113836175A true CN113836175A (en) 2021-12-24

Family

ID=78965027

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010592659.6A Pending CN113836175A (en) 2020-06-24 2020-06-24 Data access method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113836175A (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106293887A (en) * 2015-05-21 2017-01-04 中兴通讯股份有限公司 Data base processing method and device
CN106897467A (en) * 2017-04-24 2017-06-27 成都四方伟业软件股份有限公司 A kind of database adaptation method of big data analysis engine
CN109033123A (en) * 2018-05-31 2018-12-18 康键信息技术(深圳)有限公司 Querying method, device, computer equipment and storage medium based on big data
US20190026335A1 (en) * 2017-07-23 2019-01-24 AtScale, Inc. Query engine selection
CN109376988A (en) * 2018-09-11 2019-02-22 阿里巴巴集团控股有限公司 A kind for the treatment of method and apparatus of business datum
CN109492053A (en) * 2018-11-08 2019-03-19 北京百度网讯科技有限公司 Method and apparatus for accessing data
CN109614427A (en) * 2018-10-23 2019-04-12 平安科技(深圳)有限公司 The access method and device of Various database, storage medium and electronic equipment
CN110297840A (en) * 2019-05-22 2019-10-01 平安银行股份有限公司 Data processing method, device, equipment and the storage medium of rule-based engine

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106293887A (en) * 2015-05-21 2017-01-04 中兴通讯股份有限公司 Data base processing method and device
CN106897467A (en) * 2017-04-24 2017-06-27 成都四方伟业软件股份有限公司 A kind of database adaptation method of big data analysis engine
US20190026335A1 (en) * 2017-07-23 2019-01-24 AtScale, Inc. Query engine selection
CN109033123A (en) * 2018-05-31 2018-12-18 康键信息技术(深圳)有限公司 Querying method, device, computer equipment and storage medium based on big data
CN109376988A (en) * 2018-09-11 2019-02-22 阿里巴巴集团控股有限公司 A kind for the treatment of method and apparatus of business datum
CN109614427A (en) * 2018-10-23 2019-04-12 平安科技(深圳)有限公司 The access method and device of Various database, storage medium and electronic equipment
CN109492053A (en) * 2018-11-08 2019-03-19 北京百度网讯科技有限公司 Method and apparatus for accessing data
CN110297840A (en) * 2019-05-22 2019-10-01 平安银行股份有限公司 Data processing method, device, equipment and the storage medium of rule-based engine

Similar Documents

Publication Publication Date Title
CN101021874B (en) Method and apparatus for optimizing request to poll SQL
US8332389B2 (en) Join order for a database query
US7051034B1 (en) Dynamic optimization for processing a restartable sub-tree of a query execution plan
US8396852B2 (en) Evaluating execution plan changes after a wakeup threshold time
US10733184B2 (en) Query planning and execution with source and sink operators
CN109947804B (en) Data set query optimization method and device, server and storage medium
WO2017019879A1 (en) Multi-query optimization
US11907220B2 (en) Optimizing query processing and routing in a hybrid workload optimized database system
US20060259457A1 (en) Apparatus and method for optimizing a computer database query that Fetches n rows
CN109522341B (en) Method, device and equipment for realizing SQL-based streaming data processing engine
CN108694221B (en) Data real-time analysis method, module, equipment and device
CN111008020B (en) Method for analyzing logic expression into general query statement
CN111190932B (en) Privacy cluster query method and device and electronic equipment
US8032514B2 (en) SQL distinct optimization in a computer database system
CN115809063B (en) Storage process compiling method, system, electronic equipment and storage medium
CN112988782A (en) Hive-supported interactive query method and device and storage medium
CN110580255A (en) method and system for storing and retrieving data
US7174553B1 (en) Increasing parallelism of function evaluation in a database
CN110580170B (en) Method and device for identifying software performance risk
US20080162413A1 (en) Accelerating queries using temporary enumeration representation
CN113836175A (en) Data access method, device, equipment and storage medium
US20070220058A1 (en) Management of statistical views in a database system
CN114443699A (en) Information query method and device, computer equipment and computer readable storage medium
CN115878654A (en) Data query method, device, equipment and storage medium
US20080162414A1 (en) Accelerating queries using delayed value projection of enumerated storage

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination