CN111309751A - Big data processing method and device - Google Patents

Big data processing method and device Download PDF

Info

Publication number
CN111309751A
CN111309751A CN201811428193.5A CN201811428193A CN111309751A CN 111309751 A CN111309751 A CN 111309751A CN 201811428193 A CN201811428193 A CN 201811428193A CN 111309751 A CN111309751 A CN 111309751A
Authority
CN
China
Prior art keywords
query
query statement
engine
storage engine
statement
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811428193.5A
Other languages
Chinese (zh)
Inventor
刘思源
朱海龙
李铭
徐胜国
徐皓
李铮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Qihoo Technology Co Ltd
Original Assignee
Beijing Qihoo Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Qihoo Technology Co Ltd filed Critical Beijing Qihoo Technology Co Ltd
Priority to CN201811428193.5A priority Critical patent/CN111309751A/en
Publication of CN111309751A publication Critical patent/CN111309751A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a big data processing method and device. The method comprises the following steps: receiving a query statement in an input specific language format; performing syntax analysis on the query statement, and determining a calculation engine and/or a storage engine corresponding to the query statement; determining whether at least one storage engine supports query processing in a particular language format; if yes, routing the query statement to a calculation engine and/or a storage engine; if not, the query statement is converted and adapted and then is routed to a calculation engine and/or a storage engine; and the computing engine and/or the storage engine executes corresponding query processing according to the received query statement to obtain and output a query result. By adopting the scheme, the user can realize the processing of the big data only through the input query statement with the specific language format, and the user does not need to compile the corresponding query code according to the grammatical characteristics of the required calculation engine and the required storage engine, so that the learning cost of the user is reduced, and the user experience is improved.

Description

Big data processing method and device
Technical Field
The invention relates to the technical field of computers, in particular to a big data processing method and device.
Background
With the continuous development of science and technology and society, various data are increasing in a blowout mode. At present, in the process of processing mass data, a user needs to select a computing engine or a storage engine required for processing data query and the like, and then compile corresponding execution codes according to the engine characteristics, grammar rules and the like of the selected computing engine or storage engine, so as to realize the query of the data.
However, by adopting the existing big data processing mode, a user needs to learn a large amount of knowledge of the calculation engine and the storage engine, so that the learning cost of the user is greatly increased, and the user experience is reduced; moreover, the technology is rapidly developed, the iterative updating of a computing engine and a storage engine is faster, and the existing big data platform increases the learning cost of a user and is also easy to cause the situation that the selected computing engine or storage engine is not matched with the actual business logic due to insufficient cognition of the user on the computing engine or storage engine, so that the defect of data processing efficiency is reduced; in addition, because the existing big data platform data processing logic is coupled with the computing engine or the storage engine too densely, when the computing engine or the storage engine is replaced, the codes need to be recompiled, thereby improving the maintenance cost of the business.
Disclosure of Invention
In view of the above, the present invention has been made to provide a big data processing method and apparatus that overcomes or at least partially solves the above-mentioned problems.
According to an aspect of the present invention, there is provided a big data processing method, including:
receiving a query statement in a specific language format input by using any external calling mode;
parsing the query statement to determine at least one computing engine and/or at least one storage engine corresponding to the query statement;
determining whether the at least one storage engine supports query processing in the particular language format;
if so, routing the query statement to the at least one compute engine and/or at least one storage engine; if not, the query statement is converted and adapted and then is routed to the at least one computing engine and/or the at least one storage engine; and the at least one computing engine and/or the at least one storage engine execute corresponding query processing according to the received query statement to obtain and output a query result.
According to another aspect of the present invention, there is provided a big data processing apparatus including:
the receiving module is suitable for receiving the query statement in a specific language format input by any external calling mode;
the determining module is suitable for performing syntax analysis on the query statement and determining at least one computing engine and/or at least one storage engine corresponding to the query statement;
the judging module is suitable for judging whether the at least one storage engine supports query processing in the specific language format;
the routing module is suitable for routing the query statement to the at least one computing engine and/or the at least one storage engine if the judgment result of the judgment module is positive; if the judgment result of the judgment module is negative, the query statement is converted and adapted and then is routed to the at least one calculation engine and/or the at least one storage engine;
and the plurality of computing engines and/or the plurality of storage engines are suitable for executing corresponding query processing according to the received query statement, and obtaining and outputting a query result.
According to yet another aspect of the present invention, there is provided a computing device comprising: the system comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface complete mutual communication through the communication bus;
the memory is used for storing at least one executable instruction, and the executable instruction enables the processor to execute the operation corresponding to the big data processing method.
According to still another aspect of the present invention, there is provided a computer storage medium having at least one executable instruction stored therein, the executable instruction causing a processor to perform operations corresponding to the big data processing method.
According to the big data processing method and the big data processing device, the query statement in the specific language format is received and input; performing syntax analysis on the query statement, and determining a calculation engine and/or a storage engine corresponding to the query statement; determining whether at least one storage engine supports query processing in a particular language format; if yes, routing the query statement to a calculation engine and/or a storage engine; if not, the query statement is converted and adapted and then is routed to a calculation engine and/or a storage engine; and the computing engine and/or the storage engine executes corresponding query processing according to the received query statement to obtain and output a query result. By adopting the scheme, the user can realize the processing of the big data only through the input query statement with the specific language format, and the user does not need to compile the corresponding query code according to the grammatical characteristics of the required calculation engine and the required storage engine, so that the learning cost of the user is reduced, and the user experience is improved.
The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:
FIG. 1 is a flow chart illustrating a big data processing method according to an embodiment of the present invention;
FIG. 2 is a flow chart illustrating a big data processing method according to another embodiment of the present invention;
FIG. 3 is a functional block diagram of a big data processing apparatus according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of a computing device according to an embodiment of the present invention.
Detailed Description
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
Fig. 1 is a flowchart illustrating a big data processing method according to an embodiment of the present invention. As shown in fig. 1, the method includes:
step S110, receiving a query statement in a specific language format input by using any external calling method.
The invention provides a uniform query language for users, thereby reducing the coupling degree of the query statement input by the users and the grammar structure of a calculation engine or a storage engine. Optionally, the hybrid query statement in the specific language format input by using any external calling manner is specifically an SQL statement in the specific language format.
Wherein the at least one external calling mode comprises: a command line call mode, a JDBC call mode, and/or a proprietary API call mode. Optionally, in order to further improve user experience, the embodiment may provide corresponding external calling modes for different user groups. For example, a command line calling mode can be provided for a group of end users; for the developer user group, JDBC (Java DataBase Connectivity) calling mode and/or special API calling mode may be provided.
Step S120, parsing the query statement to determine at least one computing engine and/or at least one storage engine corresponding to the query statement.
Specifically, the query statement may be parsed to generate a corresponding logical query plan. Based on the logical query plan, at least one compute engine and/or at least one storage engine corresponding to the query statement is determined.
Step S130, judging whether the storage engine supports the query processing in a specific language format; if yes, go to step S140; otherwise, step S150 is executed.
Since the query statement irrelevant to the syntactic structure of the calculation engine or the storage engine can be input by the user in the application, in order to ensure the normal processing of the query statement, whether the storage engine corresponding to the query statement supports the query processing in a specific language format is further determined in the step, and if so, the step S140 is executed; otherwise, step S150 is executed.
Step S140 routes the query statement to at least one compute engine and/or at least one storage engine.
If the at least one storage engine supports query processing in a specific language format, the query statement which is not converted and adapted is routed to the at least one computing engine and/or the at least one storage engine, so that the at least one computing engine and/or the at least one storage engine executes corresponding query processing according to the received query statement, and a query result is obtained and output. The query statement routed to the at least one computing engine and/or the at least one storage engine may be an unprocessed query statement itself or a translated query statement. Preferably, in order to facilitate the query processing by the computing engine and/or the storage engine, the query statement may be translated into a query statement in a corresponding engine language, and the translated query statement may be routed to the at least one computing engine and/or the at least one storage engine.
Step S150, the query statement is converted and adapted, and then routed to at least one computing engine and/or at least one storage engine.
If the at least one storage engine does not support the storage engine which does not support the query processing in the specific language format, the query statement is further subjected to adaptation conversion and then is routed to the at least one computing engine and/or the at least one storage engine, so that the at least one computing engine and/or the at least one storage engine execute the corresponding query processing according to the received query statement, and a query result is obtained and output.
Therefore, the embodiment receives the query statement in a specific language format input by using any external calling mode; performing syntax analysis on the query statement, and determining at least one computing engine and/or at least one storage engine corresponding to the query statement; determining whether at least one storage engine supports query processing in a particular language format; if so, routing the query statement to at least one compute engine and/or at least one storage engine; if not, the query statement is converted and adapted and then is routed to at least one computing engine and/or at least one storage engine; and the at least one computing engine and/or the at least one storage engine execute corresponding query processing according to the received query statement to obtain and output a query result. By adopting the scheme, the user can realize the processing of the big data only through the input query statement with the specific language format, and the user does not need to compile the corresponding query code according to the grammatical characteristics of the required calculation engine and the required storage engine, so that the learning cost of the user is reduced, and the user experience is improved.
Fig. 2 is a flowchart illustrating a big data processing method according to another embodiment of the present invention. As shown in fig. 2, the method includes:
step S210, receiving a query statement in a specific language format input by using any external calling method.
Wherein, the query statement in the specific language format is an SQL statement. Specifically, the query statement in the specific language format can be based on the SQL99 standard, and is compatible with SQL syntax such as MySQL engine and Hive engine. When a user inputs a query statement, the user does not need to compile by combining the grammatical characteristics of a required calculation engine or a required storage engine, and the user can respectively adopt different grammatical structures in different sub-statements in the unified query statement.
Optionally, in order to guarantee data query efficiency and avoid system resource waste, syntax checking may be further performed on the query statement in this embodiment. For example, the syntax correctness of the inputted query sentence in the specific language format may be checked during the process of receiving the inputted query sentence in the specific language format, or the syntax check may be performed on the entire query sentence after receiving the inputted query sentence in the specific voice format. And after the verification is qualified, executing the subsequent steps.
Step S220, performing syntax analysis on the query statement, and judging whether the query statement is a mixed query statement; if not, go to step S230; if yes, go to step S250.
Specifically, in order to improve the accuracy of the query result, the step further determines whether the query statement is a mixed query statement, so that the subsequent steps can process the query statement in a corresponding manner based on the determination result.
The mixed query statement specifically means that at least two data sources in data source information corresponding to the mixed query statement correspond to different classes of storage engines; and/or at least two data sources in the data source information corresponding to the mixed query statement correspond to different clusters; at least two data sources in the data source information corresponding to the mixed query statement correspond to different service connections.
In the process of determining whether the query statement is the mixed query statement, the corresponding meta information can be queried according to the data table information corresponding to the query statement and according to the data table information, and the data source information corresponding to the query statement is determined, so that whether the query statement is the mixed query statement is determined. In a specific implementation process, the query statement may be converted into a corresponding logic tree, and the logic tree is split to obtain at least one logic sub-tree, and further, whether the query statement is a hybrid query statement is determined according to a splitting result of the logic tree.
Step S230, generating a single data query plan, and determining at least one storage engine corresponding to the single data query plan according to the single data query plan.
Step S240, judging whether at least one storage engine corresponding to the single data query plan supports query processing in a specific language format, if so, routing the query statement to at least one storage engine corresponding to the single data query plan; if not, the query statement is converted and adapted and then routed to at least one storage engine corresponding to the single data query plan.
Finally, the query statement is routed to at least one storage engine corresponding to the single data query plan, and the query statement is executed by the at least one storage engine to obtain a query result. Specifically, the query statement is translated into a query statement in an engine language corresponding to the at least one storage engine, the translated query statement is routed to the at least one storage engine corresponding to the query statement, the at least one storage engine executes the translated query statement, and a query result is obtained.
To ensure normal processing of the query statement, it is further determined whether the storage engine supports query processing in a specific language format before routing the query statement to the at least one storage engine corresponding to the single data query plan. And if not, converting and adapting the query statement and then routing the query statement to at least one storage engine corresponding to the single data query plan. For example, if the storage engine corresponding to the single data query plan is determined to be a non-SQL storage engine, such as an HBase engine or a Reddis engine, etc., the query statement is converted into a statement that is adapted for recognition and processing by the HBase engine or the Reddis engine, etc. The embodiment does not limit the specific conversion adaptation method.
Step S250, generating a mixed query plan, and determining at least one storage engine and at least one calculation engine corresponding to the mixed query plan according to the mixed query plan.
Step S260, judging whether at least one storage engine corresponding to the mixed query plan supports query processing in a specific language format, if so, routing the query statement to at least one storage engine and at least one calculation engine corresponding to the mixed query plan; if not, after conversion and adaptation are carried out on the query statement, the query statement is routed to at least one storage engine and at least one calculation engine corresponding to the mixed query plan.
The query statement is routed to at least one storage engine and at least one calculation engine corresponding to the mixed query plan, so that the at least one storage engine corresponding to the mixed query plan performs query processing according to the query statement to obtain an intermediate query result, and the at least one calculation engine corresponding to the mixed query plan performs calculation processing according to the intermediate query result to obtain a final query result. Specifically, the query statement is translated into a query statement in an engine language corresponding to the at least one storage engine and the at least one computing engine, and then the translated query statement is routed to the corresponding at least one storage engine and at least one computing engine, and the translated query statement is executed by the at least one storage engine and the at least one computing engine, and a query result is obtained.
To ensure normal processing of the query statement, it is further determined whether the storage engine supports query processing in a particular language format before routing the query statement to the at least one storage engine and the at least one compute engine corresponding to the hybrid query plan. And if not, converting and adapting the query statement and then routing the query statement to at least one storage engine and at least one calculation engine corresponding to the hybrid query plan.
In a specific implementation process, in order to improve query efficiency, reduce system overhead, and save system computing resources, the present embodiment may split the query statement, and split the query statement into a plurality of query clauses with complete semantics, where the plurality of query clauses may be executed by at least one storage engine and/or at least one computing engine, respectively. Determining whether the query clause corresponds to a storage engine which does not support query processing in a specific language format or not for each query clause, if so, converting and adapting the query clause, and then routing the query clause to the storage engine which does not support query processing in the specific language format so that the storage engine executes the query clause to obtain an intermediate processing result; if not, the query clause is routed to the corresponding storage engine so that the storage engine can execute the corresponding query clause to obtain an intermediate processing result, and finally the calculation engine performs calculation processing according to the intermediate processing result fed back by each storage engine to obtain a final query result.
In an optional implementation manner, to improve the data query efficiency, the embodiment may further optimize the generated logical query plan according to an optimization rule. Therefore, at least one storage engine and/or at least one calculation engine corresponding to the logic query plan can be determined according to the optimized logic query plan, and further processing of the logic query statement is achieved. Specifically, in the optimization process, not only the sub-statements of the data acquisition part in the query statement are executed by the storage engine, but also the sub-statements of the data acquisition part in the query statement and other computation (such as filtering, selecting and the like) sub-statements supported by the storage engine are merged and then are executed by the storage engine, so that the computing power of the storage engine can be fully utilized, the interaction frequency between the storage engine and the computation engine is reduced, and the disadvantage of increased storage overhead caused by the fact that the computation engine needs to read in a large amount of unprocessed source data of the storage engine before optimization and then process the source data can be avoided. The specific optimization method can be set by a person skilled in the art, and the present invention is not limited thereto.
Therefore, the query statement in the specific language format input by any external calling mode is received, so that the user does not need to compile corresponding query codes according to the grammatical characteristics of a required calculation engine and a required storage engine, the learning cost of the user is reduced, and the user experience is improved; further performing syntax analysis on the query statement to generate a single data query plan or a mixed query plan, determining a storage engine and/or a calculation engine corresponding to the query statement based on the single data query plan or the mixed query plan, and judging whether at least one storage engine supports query processing in a specific language format; if not, the query statement is converted and adapted and then routed to at least one computing engine and/or at least one storage engine. By adopting the scheme, the user can realize the processing of the big data only through the input query statement with the specific language format, and the user does not need to compile the corresponding query code according to the grammatical characteristics of the required calculation engine and the required storage engine, so that the learning cost of the user is reduced, and the user experience is improved.
Fig. 3 is a schematic structural diagram of a big data processing apparatus according to an embodiment of the present invention. As shown in fig. 3, the apparatus includes: a receiving module 31, a determining module 32, a judging module 33, a routing module 34, a plurality of computing engines 35 and a plurality of storage engines 36.
A receiving module 31, adapted to receive a query statement in a specific language format input by using any external calling manner;
a determining module 32 adapted to parse the query statement and determine at least one computing engine and/or at least one storage engine corresponding to the query statement;
a determining module 33 adapted to determine whether the at least one storage engine supports query processing in the specific language format;
the routing module 34 is adapted to route the query statement to the at least one computing engine and/or the at least one storage engine if the judgment result of the judgment module is yes; if the judgment result of the judgment module is negative, the query statement is converted and adapted and then is routed to the at least one calculation engine and/or the at least one storage engine;
and the plurality of calculation engines 35 and the plurality of storage engines 36 are adapted to execute corresponding query processing according to the received query statement, and obtain and output a query result.
Optionally, the determining module 32 is further adapted to: performing syntax analysis on the query statement to generate a logic query plan; determining at least one compute engine and/or at least one storage engine corresponding to the query statement according to the logical query plan.
Optionally, the determining module 32 is further adapted to: judging whether the query statement is a mixed query statement; if not, generating a single data query plan; and if so, generating a mixed query plan.
Optionally, if the query statement is not a mixed query statement; the determination module 32 is further adapted to: determining at least one storage engine corresponding to a single data query plan according to the single data query plan;
the routing module 34 is further adapted to: if the judgment result of the judgment module is yes, routing the query statement to at least one storage engine corresponding to the single data query plan; if the judgment result of the judgment module is negative, the query statement is converted and adapted and then is routed to at least one storage engine corresponding to the single data query plan.
Optionally, if the query statement is a mixed query statement; the determination module 32 is further adapted to: determining at least one storage engine and at least one computation engine corresponding to a hybrid query plan according to the hybrid query plan;
the routing module 34 is further adapted to: if the judgment result of the judgment module is yes, routing the query statement to at least one storage engine and at least one calculation engine corresponding to the mixed query plan so that the at least one storage engine corresponding to the mixed query plan can perform query processing according to the query statement to obtain an intermediate query result, and the at least one calculation engine corresponding to the mixed query plan can perform calculation processing according to the intermediate query result to obtain a final query result; if the judgment result is negative, after conversion and adaptation are carried out on the query statement, the query statement is routed to at least one storage engine and at least one calculation engine corresponding to the mixed query plan, so that the at least one storage engine corresponding to the mixed query plan carries out query processing according to the query statement to obtain an intermediate query result, and the at least one calculation engine corresponding to the mixed query plan carries out calculation processing according to the intermediate query result to obtain a final query result;
optionally, the apparatus further comprises: a checking module (not shown in the figure) adapted to perform syntax checking on the query statement before syntax parsing on the query statement.
Optionally, the apparatus further comprises: an optimization module (not shown in the figure) adapted to optimize the logical query plan.
Optionally, the query statement is an SQL statement.
In this embodiment, reference may be made to the description of corresponding steps in the method embodiment shown in fig. 1 and/or fig. 2 for specific implementation of each module of the big data processing apparatus, which is not described in detail in this embodiment.
Therefore, the embodiment receives the query statement in a specific language format input by using any external calling mode; performing syntax analysis on the query statement, and determining at least one computing engine and/or at least one storage engine corresponding to the query statement; determining whether at least one storage engine supports query processing in a particular language format; if so, routing the query statement to at least one compute engine and/or at least one storage engine; if not, the query statement is converted and adapted and then is routed to at least one computing engine and/or at least one storage engine; and the at least one computing engine and/or the at least one storage engine execute corresponding query processing according to the received query statement to obtain and output a query result. By adopting the scheme, the user can realize the processing of the big data only through the input query statement with the specific language format, and the user does not need to compile the corresponding query code according to the grammatical characteristics of the required calculation engine and the required storage engine, so that the learning cost of the user is reduced, and the user experience is improved.
According to an embodiment of the present invention, a non-volatile computer storage medium is provided, where at least one executable instruction is stored, and the computer executable instruction can execute the big data processing method in any of the above method embodiments.
Fig. 4 is a schematic structural diagram of a computing device according to an embodiment of the present invention, and the specific embodiment of the present invention does not limit the specific implementation of the computing device.
As shown in fig. 4, the computing device may include: a processor (processor)402, a Communications Interface 404, a memory 406, and a Communications bus 408.
Wherein:
the processor 402, communication interface 404, and memory 406 communicate with each other via a communication bus 408.
A communication interface 404 for communicating with network elements of other devices, such as clients or other servers.
The processor 402 is configured to execute the program 410, and may specifically perform relevant steps in the above-described embodiment of the big data processing method.
In particular, program 410 may include program code comprising computer operating instructions.
The processor 402 may be a central processing unit CPU, or an application specific Integrated circuit asic, or one or more Integrated circuits configured to implement an embodiment of the present invention. The computing device includes one or more processors, which may be the same type of processor, such as one or more CPUs; or may be different types of processors such as one or more CPUs and one or more ASICs.
And a memory 406 for storing a program 410. Memory 406 may comprise high-speed RAM memory, and may also include non-volatile memory (non-volatile memory), such as at least one disk memory.
The program 410 may specifically be configured to cause the processor 402 to perform the following operations:
receiving a query statement in a specific language format input by using any external calling mode;
parsing the query statement to determine at least one computing engine and/or at least one storage engine corresponding to the query statement;
determining whether the at least one storage engine supports query processing in the particular language format;
if so, routing the query statement to the at least one compute engine and/or at least one storage engine; if not, the query statement is converted and adapted and then is routed to the at least one computing engine and/or the at least one storage engine; and the at least one computing engine and/or the at least one storage engine execute corresponding query processing according to the received query statement to obtain and output a query result.
In an alternative embodiment, the program 410 may be specifically configured to cause the processor 402 to perform the following operations:
performing syntax analysis on the query statement to generate a logic query plan;
determining at least one compute engine and/or at least one storage engine corresponding to the query statement according to the logical query plan.
In an alternative embodiment, the program 410 may be specifically configured to cause the processor 402 to perform the following operations:
judging whether the query statement is a mixed query statement;
if not, generating a single data query plan; and if so, generating a mixed query plan.
In an alternative embodiment, if the query statement is not a hybrid query statement;
the program 410 may specifically be configured to cause the processor 402 to perform the following operations:
determining at least one storage engine corresponding to a single data query plan according to the single data query plan;
said routing said query statement to said at least one compute engine and/or at least one storage engine further comprises: routing the query statement to at least one storage engine corresponding to the single data query plan;
the routing the query statement after the conversion adaptation to the at least one computing engine and/or at least one storage engine further comprises: and after conversion and adaptation are carried out on the query statement, the query statement is routed to at least one storage engine corresponding to the single data query plan.
In an alternative embodiment, if the query statement is a hybrid query statement;
the program 410 may specifically be configured to cause the processor 402 to perform the following operations:
determining at least one storage engine and at least one computation engine corresponding to a hybrid query plan according to the hybrid query plan;
said routing said query statement to said at least one compute engine and/or at least one storage engine further comprises: routing the query statement to at least one storage engine and at least one calculation engine corresponding to the mixed query plan, so that the at least one storage engine corresponding to the mixed query plan performs query processing according to the query statement to obtain an intermediate query result, and the at least one calculation engine corresponding to the mixed query plan performs calculation processing according to the intermediate query result to obtain a final query result;
the routing the query statement after the conversion adaptation to the at least one computing engine and/or at least one storage engine further comprises: after the query statement is converted and adapted, routing the query statement to at least one storage engine and at least one calculation engine corresponding to the mixed query plan so that the at least one storage engine corresponding to the mixed query plan performs query processing according to the query statement to obtain an intermediate query result, and the at least one calculation engine corresponding to the mixed query plan performs calculation processing according to the intermediate query result to obtain a final query result;
in an alternative embodiment, the program 410 may be specifically configured to cause the processor 402 to perform the following operations:
and carrying out syntax check on the query statement.
In an alternative embodiment, the program 410 may be specifically configured to cause the processor 402 to perform the following operations:
optimizing the logical query plan.
In an alternative embodiment, the query statement is an SQL statement.
The algorithms and displays presented herein are not inherently related to any particular computer, virtual machine, or other apparatus. Various general purpose systems may also be used with the teachings herein. The required structure for constructing such a system will be apparent from the description above. Moreover, the present invention is not directed to any particular programming language. It is appreciated that a variety of programming languages may be used to implement the teachings of the present invention as described herein, and any descriptions of specific languages are provided above to disclose the best mode of the invention.
In the description provided herein, numerous specific details are set forth. It is understood, however, that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.
Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be interpreted as reflecting an intention that: that the invention as claimed requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.
Those skilled in the art will appreciate that the modules in the device in an embodiment may be adaptively changed and disposed in one or more devices different from the embodiment. The modules or units or components of the embodiments may be combined into one module or unit or component, and furthermore they may be divided into a plurality of sub-modules or sub-units or sub-components. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where at least some of such features and/or processes or elements are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.
Furthermore, those skilled in the art will appreciate that while some embodiments described herein include some features included in other embodiments, rather than other features, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, in the claims, any of the claimed embodiments may be used in any combination.
The various component embodiments of the invention may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art will appreciate that a microprocessor or Digital Signal Processor (DSP) may be used in practice to implement some or all of the functionality of some or all of the components in a large data processing apparatus according to embodiments of the present invention. The present invention may also be embodied as apparatus or device programs (e.g., computer programs and computer program products) for performing a portion or all of the methods described herein. Such programs implementing the present invention may be stored on computer-readable media or may be in the form of one or more signals. Such a signal may be downloaded from an internet website or provided on a carrier signal or in any other form.
It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names.
The invention discloses: A1. a big data processing method comprises the following steps:
receiving a query statement in a specific language format input by using any external calling mode;
parsing the query statement to determine at least one computing engine and/or at least one storage engine corresponding to the query statement;
determining whether the at least one storage engine supports query processing in the particular language format;
if so, routing the query statement to the at least one compute engine and/or at least one storage engine; if not, the query statement is converted and adapted and then is routed to the at least one computing engine and/or the at least one storage engine; and the at least one computing engine and/or the at least one storage engine execute corresponding query processing according to the received query statement to obtain and output a query result.
A2. The method of a1, wherein the parsing the query statement, determining at least one compute engine and/or at least one storage engine corresponding to the query statement, further comprises:
performing syntax analysis on the query statement to generate a logic query plan;
determining at least one compute engine and/or at least one storage engine corresponding to the query statement according to the logical query plan.
A3. The method of a2, wherein the parsing the query statement to generate a logical query plan further comprises:
judging whether the query statement is a mixed query statement;
if not, generating a single data query plan; and if so, generating a mixed query plan.
A4. The method of a3, wherein, if the query statement is not a hybrid query statement; the parsing the query statement to determine at least one compute engine and/or at least one storage engine corresponding to the query statement further comprises:
determining at least one storage engine corresponding to a single data query plan according to the single data query plan;
said routing said query statement to said at least one compute engine and/or at least one storage engine further comprises: routing the query statement to at least one storage engine corresponding to the single data query plan;
the routing the query statement after the conversion adaptation to the at least one computing engine and/or at least one storage engine further comprises: and after conversion and adaptation are carried out on the query statement, the query statement is routed to at least one storage engine corresponding to the single data query plan.
A5. The method of a3, wherein, if the query statement is a hybrid query statement; the parsing the query statement to determine at least one compute engine and/or at least one storage engine corresponding to the query statement further comprises:
determining at least one storage engine and at least one computation engine corresponding to a hybrid query plan according to the hybrid query plan;
said routing said query statement to said at least one compute engine and/or at least one storage engine further comprises: routing the query statement to at least one storage engine and at least one calculation engine corresponding to the mixed query plan, so that the at least one storage engine corresponding to the mixed query plan performs query processing according to the query statement to obtain an intermediate query result, and the at least one calculation engine corresponding to the mixed query plan performs calculation processing according to the intermediate query result to obtain a final query result;
the routing the query statement after the conversion adaptation to the at least one computing engine and/or at least one storage engine further comprises: after the query statement is converted and adapted, routing the query statement to at least one storage engine and at least one calculation engine corresponding to the mixed query plan so that the at least one storage engine corresponding to the mixed query plan performs query processing according to the query statement to obtain an intermediate query result, and the at least one calculation engine corresponding to the mixed query plan performs calculation processing according to the intermediate query result to obtain a final query result;
A6. the method of any one of A1-A5, wherein, prior to the parsing the query statement, the method further comprises:
and carrying out syntax check on the query statement.
A7. The method of any one of a2-a5, wherein the method further comprises:
optimizing the logical query plan.
A8. The method of any of a1-a7, wherein the query statement is an SQL statement.
The invention also discloses: B9. a big data processing apparatus, comprising:
the receiving module is suitable for receiving the query statement in a specific language format input by any external calling mode;
the determining module is suitable for performing syntax analysis on the query statement and determining at least one computing engine and/or at least one storage engine corresponding to the query statement;
the judging module is suitable for judging whether the at least one storage engine supports query processing in the specific language format;
the routing module is suitable for routing the query statement to the at least one computing engine and/or the at least one storage engine if the judgment result of the judgment module is positive; if the judgment result of the judgment module is negative, the query statement is converted and adapted and then is routed to the at least one calculation engine and/or the at least one storage engine;
and the plurality of computing engines and/or the plurality of storage engines are suitable for executing corresponding query processing according to the received query statement, and obtaining and outputting a query result.
B10. The apparatus of B9, wherein the determination module is further adapted to:
performing syntax analysis on the query statement to generate a logic query plan;
determining at least one compute engine and/or at least one storage engine corresponding to the query statement according to the logical query plan.
B11. The apparatus of B10, wherein the determination module is further adapted to:
judging whether the query statement is a mixed query statement;
if not, generating a single data query plan; and if so, generating a mixed query plan.
B12. The apparatus of B11, wherein if the query statement is not a hybrid query statement;
the determination module is further adapted to: determining at least one storage engine corresponding to a single data query plan according to the single data query plan;
the routing module is further adapted to: if the judgment result of the judgment module is yes, routing the query statement to at least one storage engine corresponding to the single data query plan; if the judgment result of the judgment module is negative, the query statement is converted and adapted and then is routed to at least one storage engine corresponding to the single data query plan.
B13. The apparatus of B11, wherein, if the query statement is a hybrid query statement;
the determination module is further adapted to: determining at least one storage engine and at least one computation engine corresponding to a hybrid query plan according to the hybrid query plan;
the routing module is further adapted to: if the judgment result of the judgment module is yes, routing the query statement to at least one storage engine and at least one calculation engine corresponding to the mixed query plan so that the at least one storage engine corresponding to the mixed query plan can perform query processing according to the query statement to obtain an intermediate query result, and the at least one calculation engine corresponding to the mixed query plan can perform calculation processing according to the intermediate query result to obtain a final query result; if the judgment result is negative, after conversion and adaptation are carried out on the query statement, the query statement is routed to at least one storage engine and at least one calculation engine corresponding to the mixed query plan, so that the at least one storage engine corresponding to the mixed query plan carries out query processing according to the query statement to obtain an intermediate query result, and the at least one calculation engine corresponding to the mixed query plan carries out calculation processing according to the intermediate query result to obtain a final query result;
B14. the apparatus of any one of B9-B13, wherein the apparatus further comprises:
and the checking module is suitable for checking the grammar of the query statement before parsing the grammar of the query statement.
B15. The apparatus of any one of B10-B13, wherein the apparatus further comprises:
and the optimization module is suitable for optimizing the logic query plan.
B16. The apparatus of any one of B9-B15, wherein the query statement is an SQL statement.
The invention also discloses: C17. a computing device, comprising: the system comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface complete mutual communication through the communication bus;
the memory is used for storing at least one executable instruction which causes the processor to execute the operation corresponding to the big data processing method as any one of A1-A8.
The invention also discloses: D18. a computer storage medium having stored therein at least one executable instruction for causing a processor to perform operations corresponding to the big data processing method as described in any one of a 1-A8.

Claims (10)

1. A big data processing method comprises the following steps:
receiving a query statement in a specific language format input by using any external calling mode;
parsing the query statement to determine at least one computing engine and/or at least one storage engine corresponding to the query statement;
determining whether the at least one storage engine supports query processing in the particular language format;
if so, routing the query statement to the at least one compute engine and/or at least one storage engine; if not, the query statement is converted and adapted and then is routed to the at least one computing engine and/or the at least one storage engine; and the at least one computing engine and/or the at least one storage engine execute corresponding query processing according to the received query statement to obtain and output a query result.
2. The method of claim 1, wherein the parsing the query statement, determining at least one compute engine and/or at least one storage engine corresponding to the query statement, further comprises:
performing syntax analysis on the query statement to generate a logic query plan;
determining at least one compute engine and/or at least one storage engine corresponding to the query statement according to the logical query plan.
3. The method of claim 2, wherein the parsing the query statement to generate a logical query plan further comprises:
judging whether the query statement is a mixed query statement;
if not, generating a single data query plan; and if so, generating a mixed query plan.
4. The method of claim 3, wherein if the query statement is not a hybrid query statement; the parsing the query statement to determine at least one compute engine and/or at least one storage engine corresponding to the query statement further comprises:
determining at least one storage engine corresponding to a single data query plan according to the single data query plan;
said routing said query statement to said at least one compute engine and/or at least one storage engine further comprises: routing the query statement to at least one storage engine corresponding to the single data query plan;
the routing the query statement after the conversion adaptation to the at least one computing engine and/or at least one storage engine further comprises: and after conversion and adaptation are carried out on the query statement, the query statement is routed to at least one storage engine corresponding to the single data query plan.
5. The method of claim 3, wherein if the query statement is a hybrid query statement; the parsing the query statement to determine at least one compute engine and/or at least one storage engine corresponding to the query statement further comprises:
determining at least one storage engine and at least one computation engine corresponding to a hybrid query plan according to the hybrid query plan;
said routing said query statement to said at least one compute engine and/or at least one storage engine further comprises: routing the query statement to at least one storage engine and at least one calculation engine corresponding to the mixed query plan, so that the at least one storage engine corresponding to the mixed query plan performs query processing according to the query statement to obtain an intermediate query result, and the at least one calculation engine corresponding to the mixed query plan performs calculation processing according to the intermediate query result to obtain a final query result;
the routing the query statement after the conversion adaptation to the at least one computing engine and/or at least one storage engine further comprises: and after the query statement is converted and adapted, routing the query statement to at least one storage engine and at least one calculation engine corresponding to the mixed query plan so that the at least one storage engine corresponding to the mixed query plan performs query processing according to the query statement to obtain an intermediate query result, and the at least one calculation engine corresponding to the mixed query plan performs calculation processing according to the intermediate query result to obtain a final query result.
6. The method of any of claims 1-5, wherein prior to the parsing the query statement, the method further comprises:
and carrying out syntax check on the query statement.
7. The method according to any one of claims 2-5, wherein the method further comprises:
optimizing the logical query plan.
8. A big data processing apparatus, comprising:
the receiving module is suitable for receiving the query statement in a specific language format input by any external calling mode;
the determining module is suitable for performing syntax analysis on the query statement and determining at least one computing engine and/or at least one storage engine corresponding to the query statement;
the judging module is suitable for judging whether the at least one storage engine supports query processing in the specific language format;
the routing module is suitable for routing the query statement to the at least one computing engine and/or the at least one storage engine if the judgment result of the judgment module is positive; if the judgment result of the judgment module is negative, the query statement is converted and adapted and then is routed to the at least one calculation engine and/or the at least one storage engine;
and the plurality of computing engines and/or the plurality of storage engines are suitable for executing corresponding query processing according to the received query statement, and obtaining and outputting a query result.
9. A computing device, comprising: the system comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface complete mutual communication through the communication bus;
the memory is used for storing at least one executable instruction, and the executable instruction causes the processor to execute the operation corresponding to the big data processing method according to any one of claims 1-7.
10. A computer storage medium having stored therein at least one executable instruction for causing a processor to perform operations corresponding to the big data processing method according to any one of claims 1 to 7.
CN201811428193.5A 2018-11-27 2018-11-27 Big data processing method and device Pending CN111309751A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811428193.5A CN111309751A (en) 2018-11-27 2018-11-27 Big data processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811428193.5A CN111309751A (en) 2018-11-27 2018-11-27 Big data processing method and device

Publications (1)

Publication Number Publication Date
CN111309751A true CN111309751A (en) 2020-06-19

Family

ID=71157863

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811428193.5A Pending CN111309751A (en) 2018-11-27 2018-11-27 Big data processing method and device

Country Status (1)

Country Link
CN (1) CN111309751A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111767304A (en) * 2020-09-01 2020-10-13 北京安帝科技有限公司 Cross-database data query method, query device and readable medium
CN112214517A (en) * 2020-11-04 2021-01-12 微医云(杭州)控股有限公司 Stream data processing method and device, electronic device and storage medium
CN112632333A (en) * 2020-12-17 2021-04-09 杭州迪普科技股份有限公司 Query statement generation method, device, equipment and computer readable storage medium
CN113064914A (en) * 2021-04-22 2021-07-02 中国工商银行股份有限公司 Data extraction method and device
CN113704291A (en) * 2021-09-03 2021-11-26 北京火山引擎科技有限公司 Data query method and device, storage medium and electronic equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090019000A1 (en) * 2007-07-12 2009-01-15 Mitchell Jon Arends Query based rule sets
CN102982075A (en) * 2012-10-30 2013-03-20 北京京东世纪贸易有限公司 Heterogeneous data source access supporting system and method thereof
CN103440303A (en) * 2013-08-21 2013-12-11 曙光信息产业股份有限公司 Heterogeneous cloud storage system and data processing method thereof
CN106777108A (en) * 2016-12-15 2017-05-31 贵州电网有限责任公司电力科学研究院 A kind of data query method and apparatus based on mixing storage architecture
CN108519914A (en) * 2018-04-09 2018-09-11 腾讯科技(深圳)有限公司 Big data computational methods, system and computer equipment

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090019000A1 (en) * 2007-07-12 2009-01-15 Mitchell Jon Arends Query based rule sets
CN102982075A (en) * 2012-10-30 2013-03-20 北京京东世纪贸易有限公司 Heterogeneous data source access supporting system and method thereof
CN103440303A (en) * 2013-08-21 2013-12-11 曙光信息产业股份有限公司 Heterogeneous cloud storage system and data processing method thereof
CN106777108A (en) * 2016-12-15 2017-05-31 贵州电网有限责任公司电力科学研究院 A kind of data query method and apparatus based on mixing storage architecture
CN108519914A (en) * 2018-04-09 2018-09-11 腾讯科技(深圳)有限公司 Big data computational methods, system and computer equipment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
向红: "基于本体的异构数据集成系统的研究与实现", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111767304A (en) * 2020-09-01 2020-10-13 北京安帝科技有限公司 Cross-database data query method, query device and readable medium
CN112214517A (en) * 2020-11-04 2021-01-12 微医云(杭州)控股有限公司 Stream data processing method and device, electronic device and storage medium
CN112632333A (en) * 2020-12-17 2021-04-09 杭州迪普科技股份有限公司 Query statement generation method, device, equipment and computer readable storage medium
CN113064914A (en) * 2021-04-22 2021-07-02 中国工商银行股份有限公司 Data extraction method and device
CN113704291A (en) * 2021-09-03 2021-11-26 北京火山引擎科技有限公司 Data query method and device, storage medium and electronic equipment

Similar Documents

Publication Publication Date Title
CN111309751A (en) Big data processing method and device
CN111221842A (en) Big data processing system and method
US8806452B2 (en) Transformation of computer programs and eliminating errors
CN111221852A (en) Mixed query processing method and device based on big data
CN108694221B (en) Data real-time analysis method, module, equipment and device
CN110244941B (en) Task development method and device, electronic equipment and computer readable storage medium
CN114357276A (en) Data query method and device, electronic equipment and storage medium
CN111767304B (en) Cross-database data query method, query device and readable medium
CN115809063B (en) Storage process compiling method, system, electronic equipment and storage medium
CN112860730A (en) SQL statement processing method and device, electronic equipment and readable storage medium
CN117093599A (en) Unified SQL query method for heterogeneous data sources
CN113504900A (en) Programming language conversion method and device
CN110851514B (en) ETL (extract transform load) processing method based on FLINK (Linear rotation injection)
CN111221888A (en) Big data analysis system and method
CN112988163B (en) Intelligent adaptation method, intelligent adaptation device, intelligent adaptation electronic equipment and intelligent adaptation medium for programming language
WO2021259290A1 (en) Stored procedure conversion method and apparatus, and device and storage medium
CN111221841A (en) Real-time processing method and device based on big data
CN113220710A (en) Data query method and device, electronic equipment and storage medium
CN111221860A (en) Mixed query optimization method and device based on big data
CN110489124B (en) Source code execution method, source code execution device, storage medium and computer equipment
CN116204550A (en) Database query statement optimization method, storage medium and device
CN110879710A (en) Method for automatically converting RPG program into JAVA program
CN111221843A (en) Big data processing method and device
US20130103668A1 (en) Question conversion for information searching
CN113064914A (en) Data extraction method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200619