CN117708136A

CN117708136A - Spark SQL processing method, device, storage medium and system

Info

Publication number: CN117708136A
Application number: CN202311828903.4A
Authority: CN
Inventors: 郝天龙; 韩明宵
Original assignee: Postal Savings Bank of China Ltd
Current assignee: Postal Savings Bank of China Ltd
Priority date: 2023-12-27
Filing date: 2023-12-27
Publication date: 2024-03-15

Abstract

The application provides a Spark SQL processing method, device, storage medium and system, wherein the method sequentially carries out logic extraction processing and splitting processing on a Spark SQL development script by adopting a Scala script to obtain a plurality of sub logics; and each sub logic is divided to at least obtain a plurality of extraction modules SELECT, a plurality of source modules FROM and a plurality of screening modules WHERE, so as to achieve the purpose of refining Spark SQL development scripts, finally, a plurality of loading results are obtained by loading all the extraction modules SELECT, the source modules FROM and the screening modules WHERE, and all the loading results are stored in the corresponding positions of the system code table, so that the final system code table is obtained, and the problem that the SQL with a complex scheme has complex logic, so that the consumed time is long, and the operation is easy to fail is solved.

Description

Spark SQL processing method, device, storage medium and system

Technical Field

The application relates to the technical field of SQL processing, in particular to a Spark SQL processing method, device, storage medium and system.

Background

Along with the continuous accumulation of bank data volume and the continuous improvement of informatization technology, big data application is more and more widely, and a Spark and other quick general calculation engines specially designed for mass data processing are developed, so that the development of the original RDD (remote data storage) can be simplified through Spark SQL, and the problems of coarse parameter configuration granularity, long complex logic time consumption, operation failure and the like still generally exist.

The problem that the prior proposal has complex SQL with complex logic, which causes long time consumption and is easy to cause operation failure.

Disclosure of Invention

The main purpose of the application is to provide a Spark SQL processing method, device, storage medium and system, so as to at least solve the problems that the prior art has complex SQL with complex logic, which causes long time consumption and is easy to cause operation failure.

To achieve the above object, according to one aspect of the present application, there is provided a Spark SQL processing method, including: acquiring Spark SQL development scripts for characterizing the contents of a database; sequentially carrying out logic extraction processing and splitting processing on the Spark SQL development script by adopting a Scala script to obtain a plurality of sub logics; dividing each sub-logic to at least obtain a plurality of extraction modules SELECT, a plurality of source modules FROM and a plurality of screening modules WHERE; and loading all the extraction modules SELECT, the source modules FROM and the screening modules WHERE to obtain a plurality of loading results, and storing all the loading results into corresponding positions of a system code table to obtain a final system code table.

Optionally, performing logic extraction processing and splitting processing on the Spark SQL development script sequentially by adopting a scale script to obtain a plurality of sub-logics, including: performing logic extraction processing on the Spark SQL development script by adopting the scale script to obtain an SQL logic ensemble; and splitting the SQL logic totality by adopting the Scala script to obtain a plurality of sub logics.

Optionally, after storing the loading result in a corresponding position of the system code table to obtain a final system code table, the method further includes: storing all contents of the final system code table into an entity table under the condition that early warning identifiers exist in the final system code table, wherein the early warning identifiers are used for representing the importance degree of the contents of the final system code table; and under the condition that the early warning identification does not exist in the final system code table, storing all contents of the final system code table into the entity table or storing all contents of the final system code table into a temporary view according to the data quantity of the final system code table.

Optionally, storing all contents of the final system code table into the entity table or storing all contents of the final system code table into a temporary view according to the data amount of the final system code table, including: storing all contents of the final system code table into the entity table or storing all contents of the final system code table into the temporary view according to the content repetition degree of the final system code table under the condition that the data amount of the final system code table is greater than or equal to a data amount threshold; and storing all contents of the final system code table into the temporary view in the case that the data amount of the final system code table is smaller than the data amount threshold.

Optionally, storing all contents of the final system code table in the entity table or storing all contents of the final system code table in the temporary view according to the content repetition degree of the final system code table, including: storing all contents of the final system code table into the temporary view under the condition that the content repeatability of the final system code table is greater than or equal to a repeatability threshold; and storing all contents of the final system code table into the entity table under the condition that the content repetition degree of the final system code table is smaller than the repetition degree threshold value.

Optionally, dividing each sub-logic to obtain at least a plurality of extraction modules SELECT, a plurality of source modules FROM, and a plurality of screening modules WHERE, respectively, including: dividing each sub-logic to obtain a plurality of extraction modules SELECT, a plurality of source modules FROM, a plurality of screening modules WHERE, a plurality of grouping modules GROUP and a plurality of association modules JOIN, wherein the association modules JOIN are two sub-logics with association relations in all the sub-logics.

Optionally, after obtaining the plurality of extraction modules SELECT, the plurality of source modules FROM, the plurality of screening modules WHERE, the plurality of grouping modules GROUP, and the plurality of association modules JOIN, the method further includes: adding a master table mark at the rear end of front end position sub-logic of the association module JOIN under the condition that the type of the association module JOIN is left association, and adding a slave table mark at the rear end of rear end position sub-logic of the association module JOIN, wherein the rear end position sub-logic is positioned at the rear end of the front end position sub-logic; adding a slave table mark at the rear end of the front end position sub-logic and adding a master table mark at the rear end of the rear end position sub-logic under the condition that the type of the association module JOIN is right association; and under the condition that the type of the association module JOIN is the inner association, adding main table marks at the rear end of the front end position sub-logic and the rear end of the rear end position sub-logic respectively.

According to another aspect of the present application, there is provided a Spark SQL processing device, including:

the acquisition unit is used for acquiring Spark SQL development scripts for representing the contents of the database;

the first processing unit is used for sequentially carrying out logic extraction processing and splitting processing on the Spark SQL development script by adopting a Scala script to obtain a plurality of sub logics; dividing each sub-logic to at least obtain a plurality of extraction modules SELECT, a plurality of source modules FROM and a plurality of screening modules WHERE;

the second processing unit is used for loading all the extraction module SELECT, the source module FROM and the screening module WHERE to obtain a plurality of loading results, and storing all the loading results into corresponding positions of a system code table to obtain a final system code table.

According to another aspect of the present application, there is provided a computer readable storage medium, where the computer readable storage medium includes a stored program, and when the program runs, the device in which the computer readable storage medium is controlled to execute any one of the Spark SQL processing methods.

According to another aspect of the present application, there is provided a Spark SQL processing system, the system comprising: one or more processors, a memory, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs comprising a processing method for executing any one of the Spark SQL.

By applying the technical scheme, the Spark SQL development script is sequentially subjected to logic extraction processing and splitting processing by adopting a scale script, so that a plurality of sub logics are obtained; and dividing each sub-logic to obtain at least a plurality of extraction modules SELECT, a plurality of source modules FROM and a plurality of screening modules WHERE, respectively, so as to achieve the purpose of refining Spark SQL development scripts, improve the processing speed of Spark SQL, avoid directly processing more complex logic, finally obtain a plurality of loading results by loading all the extraction modules SELECT, the source modules FROM and the screening modules WHERE, and store all the loading results into corresponding positions of a system code table, thereby obtaining a final system code table, so that the system code table represents key information in the data table, and further solve the problems that the prior complex SQL scheme has complex logic, so that the time consumption is long, and further the operation is easy to fail.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute an undue limitation to the application. In the drawings:

FIG. 1 shows a flowchart of a Spark SQL processing method according to an embodiment of the present application;

FIG. 2 is a flow chart of another Spark SQL processing method according to an embodiment of the present application;

fig. 3 shows a block diagram of a Spark SQL processing device according to an embodiment of the present application.

Detailed Description

It should be noted that, in the case of no conflict, the embodiments and features in the embodiments may be combined with each other. The present application will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

In order to make the present application solution better understood by those skilled in the art, the following description will be made in detail and with reference to the accompanying drawings in the embodiments of the present application, it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments herein without making any inventive effort, shall fall within the scope of the present application.

It should be noted that the terms "first," "second," and the like in the description and claims of the present application and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate in order to describe the embodiments of the present application described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

As introduced in the background art, along with the continuous accumulation of bank data volume and the continuous improvement of informatization technology, big data application is more and more widely developed, spark and other quick general calculation engines specially designed for mass data processing are developed, the development of native RDD (remote data storage) of the quick general calculation engines can be simplified through Spark SQL, but the problems that parameter configuration granularity is coarse, complex logic takes long time, operation fails and the like are still commonly existed, and the problems that the complex SQL of the existing scheme has complex logic so that the time is long and the operation fails easily are solved.

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention.

In this embodiment, a Spark SQL processing method is provided, and it should be noted that the steps illustrated in the flowchart of the drawing may be performed in a computer system such as a set of computer executable instructions, and that although a logic order is illustrated in the flowchart, in some cases, the steps illustrated or described may be performed in an order different from that herein.

Fig. 1 is a flow chart of a Spark SQL processing method according to an embodiment of the present application. As shown in fig. 1, the method comprises the steps of:

step S101, acquiring Spark SQL development scripts for representing the contents of a database;

step S102, adopting a Scala script to sequentially perform logic extraction processing and splitting processing on the Spark SQL development script to obtain a plurality of sub-logics; dividing each sub-logic to at least obtain a plurality of extraction modules SELECT, a plurality of source modules FROM and a plurality of screening modules WHERE;

specifically, SQL script logic is obtained through a Scala script to achieve the purpose of logic extraction processing, then the SQL logic is split into a plurality of sections of sub-logic according to SQL specifications, the SQL logic is in a complex structure, the SQL processing speed is improved through splitting, the read sub-logic is split into a syntax tree simpler than the SQL logic, the syntax tree is divided into a plurality of extraction modules SELECT, a plurality of source modules FROM and a plurality of screening modules WHERE, and one sub-logic corresponds to the plurality of extraction modules SELECT, the plurality of source modules FROM and the plurality of screening modules WHERE;

SELECT is used to SELECT the field to be extracted, FROM is used to determine the source table of the data, WHERE is used to describe the screening conditions of the data; the GROUP is used for judging the basis of the data packet, and the four modules can realize different functions through different combinations.

In step S102, the logical extraction processing and the splitting processing are sequentially performed on the Spark SQL development script by using a scale script to obtain a plurality of sub-logics, including: performing logic extraction processing on the Spark SQL development script by adopting the Scala script to obtain an SQL logic ensemble; and splitting the SQL logic overall by adopting the Scala script to obtain a plurality of sub-logics.

Specifically, SQL script logic is obtained through the Scala script to achieve the purpose of logic extraction processing, then the SQL logic is split into multiple sections of sub-logic according to SQL specifications, the SQL logic is complex, and the purpose of improving SQL processing speed is achieved through the splitting.

In step S102, the division processing is performed on each of the sub-logics to at least obtain a plurality of extraction modules SELECT, a plurality of source modules FROM, and a plurality of screening modules WHERE, respectively, including: dividing each sub-logic to obtain a plurality of extraction modules SELECT, a plurality of source modules FROM, a plurality of screening modules WHERE, a plurality of grouping modules GROUP and a plurality of association modules JOIN, wherein the association modules JOIN are two sub-logics with association relations in all the sub-logics;

Specifically, the association module JOIN is used to characterize the association between two sub-modules.

If the type of the association module JOIN is left association, adding a master table flag to the rear end of the front end position sub-logic of the association module JOIN, adding a slave table flag to the rear end of the rear end position sub-logic of the association module JOIN, wherein the rear end position sub-logic is positioned at the rear end of the front end position sub-logic; adding a slave table mark at the rear end of the front end position sub-logic and adding a master table mark at the rear end of the rear end position sub-logic under the condition that the type of the association module JOIN is right association; and when the type of the association module JOIN is the internal association, adding main table marks at the rear end of the front end position sub-logic and the rear end of the rear end position sub-logic respectively.

Specifically, the method is convenient for the subsequent staff to intuitively see the main table mark and the slave table representation, and is convenient for the subsequent staff to directly modify the corresponding main table or slave table.

Step S103, loading all the extraction module SELECT, the source module FROM and the screening module WHERE to obtain a plurality of loading results, and storing all the loading results in the corresponding positions of the system code table to obtain a final system code table.

Load SELECT, FROM, WHERE (labeled B), splice SQL statement "SELECT SUM (a. Table name) FROM B WHERE a. Table name = B. Source table name AND B. Constraint 1AND B. Constraint 2.; ", query system code table content.

Specifically, constraint 1 may be a customer type code IN ("retail", "business") and constraint 2 may be a loan balance > 0.

And loading a WHERE module and a JOIN module to obtain a correlation mode (corresponding to the type of the JOIN of the correlation module).

A system code table (labeled a) is created to record the following information for the database table (taking the slice partition table as an example): table name, number of partitions (i.e. total number of data time partitions), number of data lines of the latest partition, latest partition file size, high-frequency field name, high-frequency field repetition (i.e. lower content repetition), and early warning identification.

In the step, the Spark SQL development script is sequentially subjected to logic extraction processing and splitting processing by adopting a Scala script to obtain a plurality of sub logics; and dividing each sub-logic to obtain at least a plurality of extraction modules SELECT, a plurality of source modules FROM and a plurality of screening modules WHERE, respectively, so as to achieve the purpose of refining Spark SQL development scripts, improve the processing speed of Spark SQL, avoid directly processing more complex logic, finally obtain a plurality of loading results by loading all the extraction modules SELECT, the source modules FROM and the screening modules WHERE, and store all the loading results into corresponding positions of a system code table to obtain a final system code table, thereby enabling the system code table to represent key information in the data table, and further solving the problems that the prior complex SQL scheme has complex logic, so that the time consumption is long, and further the operation is easy to fail.

Specifically, according to different data scenes, the data are judged and stored in an entity table or a temporary view, the simplicity of SQL language development and the processing efficiency of a Scala grammar operator are combined, meanwhile, independent Spark optimization parameters can be set for the split sub-logic, the operation efficiency is remarkably improved, and powerful support is provided for data requirements in various service scenes.

After step S103, that is, after storing the loading result in the corresponding position of the system code table to obtain the final system code table, the method further includes: storing all contents of the final system code table into an entity table under the condition that early warning marks exist in the final system code table, wherein the early warning marks are used for representing the importance degree of the contents of the final system code table; and under the condition that the early warning mark does not exist in the final system code table, storing all contents of the final system code table into the entity table or storing all contents of the final system code table into a temporary view according to the data volume of the final system code table.

Specifically, the manual early warning mark compensates for the defect of high priority adjustment of the service.

In one embodiment of the present application, storing all contents of the final systematic code table in the entity table or storing all contents of the final systematic code table in the temporary view according to the data amount of the final systematic code table includes:

Storing all contents of the final system code table into the entity table or storing all contents of the final system code table into the temporary view according to the content repetition degree of the final system code table when the data amount of the final system code table is greater than or equal to the data amount threshold;

in one embodiment of the present application, storing all contents of the final systematic code table in the entity table or storing all contents of the final systematic code table in the temporary view according to the content repetition degree of the final systematic code table includes: storing all contents of the final system code table into the temporary view under the condition that the content repeatability of the final system code table is greater than or equal to a repeatability threshold; and storing all contents of the final system code table into the entity table when the content repetition degree of the final system code table is smaller than the repetition degree threshold.

Specifically, by setting the repetition threshold, all contents of the final system code table with higher repetition are stored in the temporary view (the temporary view is constructed before storage, and the temporary view is only temporarily used and is not permanently stored), so that the contents with higher repetition do not occupy a large amount of space of an entity table, and all contents of the final system code table with lower repetition are stored in the entity table.

And storing all contents of the final system code table into the temporary view when the data amount of the final system code table is smaller than the data amount threshold.

Specifically, in order to realize that all contents of the final system code table with the data volume smaller than 100W are put into the temporary view, the size of the data volume is used as a judgment standard, so that the processing speed is improved.

In order to enable those skilled in the art to more clearly understand the technical solutions of the present application, the implementation process of the Spark SQL processing method of the present application will be described in detail below with reference to specific embodiments.

The embodiment relates to a specific Spark SQL processing method, as shown in fig. 2, comprising the following steps:

step S1: acquiring Spark SQL development scripts for characterizing the contents of a database;

step S2: performing logic extraction processing on the Spark SQL development script by adopting a Scala script to obtain an SQL logic overall; splitting the SQL logic overall by adopting Scala scripts to obtain a plurality of sub-logics;

step S3: loading all the extraction modules SELECT, the source modules FROM and the screening modules WHERE to obtain a plurality of loading results, and storing all the loading results into corresponding positions of a system code table to obtain a final system code table;

Step S4: storing all contents of the final system code table into an entity table under the condition that the early warning identification exists in the final system code table, wherein the early warning identification is used for representing the importance degree of the contents of the final system code table; and (5) under the condition that the early warning mark does not exist in the final system code table, performing step S5:

step S5: step S6 is performed when the data volume of the final system code table is greater than or equal to the data volume threshold; storing all contents of the final system code table into the temporary view under the condition that the data amount of the final system code table is smaller than the data amount threshold value;

step S6: storing all contents of the final system code table into the temporary view under the condition that the content repeatability of the final system code table is greater than or equal to a repeatability threshold value; and storing all contents of the final system code table into an entity table under the condition that the content repetition degree of the final system code table is smaller than a repetition degree threshold value.

It should be noted that the steps illustrated in the flowcharts of the figures may be performed in a computer system such as a set of computer executable instructions, and that although a logical order is illustrated in the flowcharts, in some cases the steps illustrated or described may be performed in an order other than that illustrated herein.

The embodiment of the application also provides a Spark SQL processing device, and it should be noted that the Spark SQL processing device of the application embodiment can be used for executing the Spark SQL processing method provided by the application embodiment. The device is used for realizing the above embodiments and preferred embodiments, and is not described in detail. As used below, the term "module" may be a combination of software and/or hardware that implements a predetermined function. While the means described in the following embodiments are preferably implemented in software, implementation in hardware, or a combination of software and hardware, is also possible and contemplated.

The following describes a Spark SQL processing device provided in an embodiment of the present application.

Fig. 3 is a block diagram of a Spark SQL processing device according to an embodiment of the present application. As shown in fig. 3, the apparatus includes:

an obtaining unit 31, configured to obtain Spark SQL development scripts for characterizing contents of the database;

a first processing unit 32, configured to sequentially perform a logic extraction process and a splitting process on the Spark SQL development script by using a scale script, so as to obtain a plurality of sub-logics; dividing each sub-logic to at least obtain a plurality of extraction modules SELECT, a plurality of source modules FROM and a plurality of screening modules WHERE;

The second processing unit 33 is configured to load all the extraction module SELECT, the source module FROM, and the screening module WHERE to obtain a plurality of loading results, and store all the loading results into corresponding positions of a system code table to obtain a final system code table.

In the device, the Spark SQL development script is sequentially subjected to logic extraction processing and splitting processing by adopting a Scala script to obtain a plurality of sub logics; and dividing each sub-logic to obtain at least a plurality of extraction modules SELECT, a plurality of source modules FROM and a plurality of screening modules WHERE, respectively, so as to achieve the purpose of refining Spark SQL development scripts, improve the processing speed of Spark SQL, avoid directly processing more complex logic, finally obtain a plurality of loading results by loading all the extraction modules SELECT, the source modules FROM and the screening modules WHERE, and store all the loading results into corresponding positions of a system code table to obtain a final system code table, thereby enabling the system code table to represent key information in the data table, and further solving the problems that the prior complex SQL scheme has complex logic, so that the time consumption is long, and further the operation is easy to fail.

In an embodiment of the present application, the first processing unit includes a first processing module and a second processing module, where the first processing module is configured to perform logic extraction processing on the Spark SQL development script by using the scale script to obtain an SQL logic overall; and the second processing module is used for splitting the SQL logic overall by adopting the Scala script to obtain a plurality of sub-logics.

In an embodiment of the present application, the apparatus further includes a third processing module and a fourth processing module, where after storing the loading result in a corresponding position of a system code table to obtain a final system code table, the third processing module is configured to store, in a case where an early warning identifier exists in the final system code table, all contents of the final system code table into an entity table, where the early warning identifier is used to characterize an importance degree of the contents of the final system code table; and the fourth processing module is used for storing all contents of the final system code table into the entity table or storing all contents of the final system code table into a temporary view according to the data volume of the final system code table under the condition that the early warning mark does not exist in the final system code table.

In an embodiment of the present application, the fourth processing module includes a first processing sub-module and a second processing sub-module, where the first processing sub-module is configured to store, in the case where the data size of the final system code table is greater than or equal to the data size threshold, all contents of the final system code table into the entity table or store all contents of the final system code table into the temporary view according to the content repetition degree of the final system code table; the second processing sub-module is configured to store all contents of the final system code table in the temporary view when the data amount of the final system code table is less than the data amount threshold.

In an embodiment of the present application, the first processing sub-module includes a third processing sub-module and a fourth processing sub-module, where the third processing sub-module is configured to store all contents of the final system code table in the temporary view when the content repetition degree of the final system code table is greater than or equal to a repetition degree threshold; and the fourth processing sub-module is used for storing all contents of the final system code table into the entity table when the content repetition degree of the final system code table is smaller than the repetition degree threshold.

In an embodiment of the present application, the first processing unit includes a fifth processing module, WHERE the fifth processing module is configured to perform division processing on each of the sub-logics to obtain a plurality of extraction modules SELECT, a plurality of source modules FROM, a plurality of screening modules WHERE, a plurality of grouping modules GROUP, and a plurality of association modules JOIN, WHERE the association modules JOIN are two of the sub-logics having an association relationship.

In an embodiment of the present application, the first processing unit includes a sixth processing module, a seventh processing module, and an eighth processing module, WHERE after obtaining the plurality of extraction modules SELECT, the plurality of source modules FROM, the plurality of screening modules WHERE, the plurality of grouping modules GROUP, and the plurality of association modules JOIN, the sixth processing module is configured to add a master table flag at a rear end of a front end position sub-logic of the association module JOIN, and add a slave table flag at a rear end of a rear end position sub-logic of the association module JOIN, WHERE the rear end position sub-logic is located at a rear end of the front end position sub-logic; the seventh processing module is configured to add a slave table flag at a rear end of the front end position sub-logic and add a master table flag at a rear end of the rear end position sub-logic when the type of the association module JOIN is right association; the eighth processing module is configured to add a main table flag to a rear end of the front end position sub-logic and a rear end of the rear end position sub-logic, respectively, when the type of the association module JOIN is a JOIN.

The Spark SQL processing device comprises a processor and a memory, wherein the acquisition unit, the first processing unit, the second processing unit and the like are stored in the memory as program units, and the processor executes the program units stored in the memory to realize corresponding functions. The modules are all located in the same processor; alternatively, the above modules may be located in different processors in any combination.

The processor includes a kernel, and the kernel fetches the corresponding program unit from the memory. The kernel can be provided with one or more than one kernel, and the problem that the prior art has long time consumption due to complex logic of the SQL and is easy to cause operation failure is solved by adjusting kernel parameters.

The memory may include volatile memory, random Access Memory (RAM), and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM), among other forms in computer readable media, the memory including at least one memory chip.

The embodiment of the invention provides a computer readable storage medium, which comprises a stored program, wherein the program is used for controlling a device where the computer readable storage medium is located to execute the Spark SQL processing method.

The embodiment of the invention provides a processor, which is used for running a program, wherein the processing method of Spark SQL is executed when the program runs.

The embodiment of the invention provides equipment, which comprises a processor, a memory and a program stored in the memory and capable of running on the processor, wherein the processor realizes at least the following steps when executing the program: acquiring Spark SQL development scripts for characterizing the contents of a database; sequentially carrying out logic extraction processing and splitting processing on the Spark SQL development script by adopting a Scala script to obtain a plurality of sub logics; dividing each sub-logic to at least obtain a plurality of extraction modules SELECT, a plurality of source modules FROM and a plurality of screening modules WHERE; and loading all the extraction modules SELECT, the source modules FROM and the screening modules WHERE to obtain a plurality of loading results, and storing all the loading results into corresponding positions of a system code table to obtain a final system code table. The device herein may be a server, PC, PAD, cell phone, etc.

The present application also provides a computer program product adapted to perform a program initialized with at least the following method steps when executed on a data processing device: acquiring Spark SQL development scripts for characterizing the contents of a database; sequentially carrying out logic extraction processing and splitting processing on the Spark SQL development script by adopting a Scala script to obtain a plurality of sub logics; dividing each sub-logic to at least obtain a plurality of extraction modules SELECT, a plurality of source modules FROM and a plurality of screening modules WHERE; and loading all the extraction modules SELECT, the source modules FROM and the screening modules WHERE to obtain a plurality of loading results, and storing all the loading results into corresponding positions of a system code table to obtain a final system code table.

The application also provides a Spark SQL processing system, which comprises: the system comprises one or more processors, a memory, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, and the one or more programs comprise a processing method for executing any one of the Spark SQL. Sequentially carrying out logic extraction processing and splitting processing on the Spark SQL development script by adopting a Scala script to obtain a plurality of sub logics; and dividing each sub-logic to obtain at least a plurality of extraction modules SELECT, a plurality of source modules FROM and a plurality of screening modules WHERE, respectively, so as to achieve the purpose of refining Spark SQL development scripts, improve the processing speed of Spark SQL, avoid directly processing more complex logic, finally obtain a plurality of loading results by loading all the extraction modules SELECT, the source modules FROM and the screening modules WHERE, and store all the loading results into corresponding positions of a system code table to obtain a final system code table, thereby enabling the system code table to represent key information in the data table, and further solving the problems that the prior complex SQL scheme has complex logic, so that the time consumption is long, and further the operation is easy to fail.

It will be appreciated by those skilled in the art that the modules or steps of the invention described above may be implemented in a general purpose computing device, they may be concentrated on a single computing device, or distributed across a network of computing devices, they may be implemented in program code executable by computing devices, so that they may be stored in a storage device for execution by computing devices, and in some cases, the steps shown or described may be performed in a different order than that shown or described herein, or they may be separately fabricated into individual integrated circuit modules, or multiple modules or steps of them may be fabricated into a single integrated circuit module. Thus, the present invention is not limited to any specific combination of hardware and software.

It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, etc., such as Read Only Memory (ROM) or flash RAM. Memory is an example of a computer-readable medium.

Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises an element.

From the above description, it can be seen that the above embodiments of the present application achieve the following technical effects:

1) According to the Spark SQL processing method, the Scala script is adopted to sequentially perform logic extraction processing and splitting processing on the Spark SQL development script, so that a plurality of sub logics are obtained; and dividing each sub-logic to obtain at least a plurality of extraction modules SELECT, a plurality of source modules FROM and a plurality of screening modules WHERE, respectively, so as to achieve the purpose of refining Spark SQL development scripts, improve the processing speed of Spark SQL, avoid directly processing more complex logic, finally obtain a plurality of loading results by loading all the extraction modules SELECT, the source modules FROM and the screening modules WHERE, and store all the loading results into corresponding positions of a system code table to obtain a final system code table, thereby enabling the system code table to represent key information in the data table, and further solving the problems that the prior complex SQL scheme has complex logic, so that the time consumption is long, and further the operation is easy to fail.

2) The Spark SQL processing device sequentially performs logic extraction processing and splitting processing on the Spark SQL development script by adopting a Scala script to obtain a plurality of sub logics; and dividing each sub-logic to obtain at least a plurality of extraction modules SELECT, a plurality of source modules FROM and a plurality of screening modules WHERE, respectively, so as to achieve the purpose of refining Spark SQL development scripts, improve the processing speed of Spark SQL, avoid directly processing more complex logic, finally obtain a plurality of loading results by loading all the extraction modules SELECT, the source modules FROM and the screening modules WHERE, and store all the loading results into corresponding positions of a system code table to obtain a final system code table, thereby enabling the system code table to represent key information in the data table, and further solving the problems that the prior complex SQL scheme has complex logic, so that the time consumption is long, and further the operation is easy to fail.

The foregoing description is only of the preferred embodiments of the present application and is not intended to limit the same, but rather, various modifications and variations may be made by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principles of the present application should be included in the protection scope of the present application.

Claims

1. A Spark SQL processing method, comprising:

acquiring Spark SQL development scripts for characterizing the contents of a database;

sequentially carrying out logic extraction processing and splitting processing on the Spark SQL development script by adopting a Scala script to obtain a plurality of sub logics; dividing each sub-logic to at least obtain a plurality of extraction modules SELECT, a plurality of source modules FROM and a plurality of screening modules WHERE;

and loading all the extraction modules SELECT, the source modules FROM and the screening modules WHERE to obtain a plurality of loading results, and storing all the loading results into corresponding positions of a system code table to obtain a final system code table.

2. The method of claim 1, wherein the sequentially performing logic extraction processing and splitting processing on the Spark SQL development script by using a scale script to obtain a plurality of sub-logics, comprises:

performing logic extraction processing on the Spark SQL development script by adopting the scale script to obtain an SQL logic ensemble;

and splitting the SQL logic totality by adopting the Scala script to obtain a plurality of sub logics.

3. The method of claim 1, wherein after storing the loading result in a corresponding location of a system code table, resulting in a final system code table, the method further comprises:

Storing all contents of the final system code table into an entity table under the condition that early warning identifiers exist in the final system code table, wherein the early warning identifiers are used for representing the importance degree of the contents of the final system code table;

and under the condition that the early warning identification does not exist in the final system code table, storing all contents of the final system code table into the entity table or storing all contents of the final system code table into a temporary view according to the data quantity of the final system code table.

4. A method according to claim 3, wherein storing all contents of the final system code table into the entity table or storing all contents of the final system code table into a temporary view according to the data amount of the final system code table comprises:

storing all contents of the final system code table into the entity table or storing all contents of the final system code table into the temporary view according to the content repetition degree of the final system code table under the condition that the data amount of the final system code table is greater than or equal to a data amount threshold;

and storing all contents of the final system code table into the temporary view in the case that the data amount of the final system code table is smaller than the data amount threshold.

5. The method of claim 4, wherein storing all contents of the final systematic code table into the entity table or storing all contents of the final systematic code table into the temporary view according to the content repetition of the final systematic code table comprises:

storing all contents of the final system code table into the temporary view under the condition that the content repeatability of the final system code table is greater than or equal to a repeatability threshold;

and storing all contents of the final system code table into the entity table under the condition that the content repetition degree of the final system code table is smaller than the repetition degree threshold value.

6. The method of claim 1, wherein dividing each of the sub-logics at least respectively obtains a plurality of extraction modules SELECT, a plurality of source modules FROM, and a plurality of screening modules WHERE, comprising:

dividing each sub-logic to obtain a plurality of extraction modules SELECT, a plurality of source modules FROM, a plurality of screening modules WHERE, a plurality of grouping modules GROUP and a plurality of association modules JOIN, wherein the association modules JOIN are two sub-logics with association relations in all the sub-logics.

7. The method of claim 6, wherein after obtaining the plurality of extraction modules SELECT, the plurality of source modules FROM, the plurality of screening modules WHERE, the plurality of grouping modules GROUP, and the plurality of association modules JOIN, the method further comprises:

adding a master table mark at the rear end of front end position sub-logic of the association module JOIN under the condition that the type of the association module JOIN is left association, and adding a slave table mark at the rear end of rear end position sub-logic of the association module JOIN, wherein the rear end position sub-logic is positioned at the rear end of the front end position sub-logic;

adding a slave table mark at the rear end of the front end position sub-logic and adding a master table mark at the rear end of the rear end position sub-logic under the condition that the type of the association module JOIN is right association;

and under the condition that the type of the association module JOIN is the inner association, adding main table marks at the rear end of the front end position sub-logic and the rear end of the rear end position sub-logic respectively.

8. A Spark SQL processing device, comprising:

9. A computer readable storage medium, characterized in that the computer readable storage medium comprises a stored program, wherein the program when run controls a device in which the computer readable storage medium is located to perform the Spark SQL processing method according to any one of claims 1 to 7.

10. A Spark SQL processing system, comprising: one or more processors, a memory, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs comprising a processing method for executing the Spark SQL of any of claims 1 to 7.