CN117632911A

CN117632911A - Database process language migration method and device

Info

Publication number: CN117632911A
Application number: CN202311658358.9A
Authority: CN
Inventors: 林舒杨
Original assignee: China Construction Bank Corp; CCB Finetech Co Ltd
Current assignee: China Construction Bank Corp; CCB Finetech Co Ltd
Priority date: 2023-12-05
Filing date: 2023-12-05
Publication date: 2024-03-01

Abstract

The invention discloses a database process language migration method and a device, which relate to the field of big data, wherein the method comprises the following steps: abstracting the grammar of the original procedural code into five types of code blocks; analyzing the original procedural code and converting the original procedural code into an original grammar tree; assigning each node with a code block type to obtain a language structure tree with a mark; traversing the structure tree root node downwards in breadth first until all control blocks are eliminated, and obtaining a language structure tree after preliminary adjustment; and eliminating the definition blocks and the circulation blocks of the preliminarily adjusted structure tree according to the pre-configured conversion structure identifier and the relation between the pre-configured conversion structure identifier and the conversion template, and obtaining the language structure tree expressed in the SQL form and converting the language structure tree into language output of the target database. The invention can efficiently and safely convert the process language into standard SQL sentences by adopting a flexible configuration method and a conversion template when the process language is migrated to the distributed database and is required to be developed by adopting the SQL language.

Description

Database process language migration method and device

Technical Field

The invention relates to the technical field of big data, in particular to a database process language migration method and device.

Background

This section is intended to provide a background or context to the embodiments of the invention that are recited in the claims. The description herein is not admitted to be prior art by inclusion in this section.

The original database, for example, the OLTP database, is mainly used in transaction type business, such as the Oracle database, and generally, when some batch processing statistics (such as checking, counting and the like of the transaction amount of the day by the end of the day) is needed in the business processing, in the case that the data amount is not large, in order to simplify the development logic, the more common method is to use the storage process of Oracle or MySQL for processing. However, with the development of business, the business logic is continuously increased, the complexity between systems is continuously increased, the storage process in part of transaction system databases is overloaded due to overlarge data volume, particularly, the OLTP databases are not distributed databases generally, the processing performance cannot be amplified by expanding the server mode, and the only solution is to migrate the part of statistical logic to a target database, such as an OLAP database, and the rapid calculation is completed by using the multi-server computing power of the distributed databases.

The prior art scheme mainly solves the problem that the storage process of one Oracle is converted into a Java+sql form for execution. This approach is logically straightforward, simply by translating judgement, looping, etc. statements into Java during storage, and then into Java. However, the problem is that the storage process of the OLTP database is usually closer to the process type language such as C, java in writing, and is often processed in a vernier mode (i.e. one record is taken at a time for processing), and most of logic is to read, judge, generate a result and write a result table row by row.

The prior art scheme has a certain technical disadvantage in that the storage process can be compiled into local codes in a database such as Oracle, and the performance is relatively good, but when the same writing method is directly carried to an OLAP type database, the conversion logic is not considered to continuously process piece by piece, a large bottleneck is formed under the condition of large data volume, and the performance may even be inferior to the result processed on the original database. Therefore, when processing on the OLAP database, how to integrate the original logic is generally considered, the original logic is processed as much as possible in a way of processing a batch of data by using one piece of SQL, the SQL is developed by using the thinking of the set theory, the advantage of distributed computation of the OLAP database can be exerted when the execution is performed, and the execution performance is effectively improved.

As shown in FIG. 1, the OLTP and OLAP databases perform the storage process in the above manner, the OLTP type is generally processed in one node, and the single-node configuration has high computing performance and relatively high efficiency; the OLAP type is generally processed by a multi-node parallel processing mode to form scale benefits, and a single node configuration is generally adopted, but when a storage process is executed on the processing unit as shown in the figure, one processing unit is generally processed and then the next processing unit is processed again, meanwhile, only one processing unit is executed, and other units are waiting, so that the advantages of distributed parallel execution cannot be exerted.

The prior art scheme mainly aims at extracting complex grammar codes in the Oracle storage process, changing the complex grammar codes into external Java grammar codes, submitting independent SQL sentences to a database for execution, and solving the problem that the Oracle grammar and the Greenplum grammar are incompatible, but not solving the performance problem, and extracting a large amount of data to a client under the condition of overlarge data volume of single query can cause serious performance problems and downtime accidents, so that the efficiency and the safety of language migration in the database process can not be ensured.

Disclosure of Invention

The embodiment of the invention provides a database process language migration method, which is used for efficiently and safely converting a process language into a standard SQL sentence by adopting a flexible configuration method and a conversion template when the process language is migrated to a distributed database and needs to be developed by adopting the SQL language, and comprises the following steps:

abstracting the syntax of original procedural code in the original database into five types of code blocks: a memory process block, a definition block, an execution block, a control block, and a loop block; wherein: the stored process block represents the stored process code of the database, the definition block represents intermediate variables in the original process code, the execution block represents the execution final result operation statement, the control block represents the statement supporting logic judgment, and the circulation block represents the circulation statement;

Analyzing the original procedural code, and converting the original procedural code into an original grammar tree;

labeling each node on the original grammar tree, and endowing each node with one code block type in the code blocks of five types according to a preset judging strategy to obtain a language structure tree with labels;

traversing from the root node of the language structure tree to the breadth first, finding each control block in the tree, pushing all child nodes under the control block under the judging condition in the control block until all the control blocks are eliminated, and obtaining a language structure tree after preliminary adjustment;

according to a pre-configured conversion structure identifier, the relation between the conversion structure identifier and a structure tree conversion template and a structure tree conversion function, gradually performing code block elimination operation on definition blocks and circulation blocks in the circulation structure of the language structure tree after preliminary adjustment to obtain a language structure tree expressed in an SQL (structured query language) form;

the language structure tree expressed in the form of SQL is converted into the language of the target database and SQL text is output.

The embodiment of the invention also provides a database process language migration device, which is used for efficiently and safely converting a process language into a standard SQL sentence by adopting a flexible configuration method and a conversion template when the process language is migrated to a distributed database and needs to be developed by adopting the SQL language, and comprises the following steps:

A code block abstraction unit for abstracting the syntax of the original procedural code in the original database into five types of code blocks: a memory process block, a definition block, an execution block, a control block, and a loop block; wherein: the stored process block represents the stored process code of the database, the definition block represents intermediate variables in the original process code, the execution block represents the execution final result operation statement, the control block represents the statement supporting logic judgment, and the circulation block represents the circulation statement;

the parsing unit is used for parsing the original procedural codes and converting the original procedural codes into original grammar trees;

the marking unit is used for marking each node on the original grammar tree, and endowing each node with one code block type in the five types of code blocks according to a preset judging strategy to obtain a language structure tree with marks;

the preliminary adjustment unit is used for traversing the language structure tree root node downwards in breadth first, finding each control block in the tree root node, pushing the judgment condition in the control block to all child nodes under the control block until all the control blocks are eliminated, and obtaining a language structure tree after preliminary adjustment;

the conversion unit is used for gradually performing the elimination operation of the code blocks on the definition blocks and the circulation blocks in the circulation structure of the language structure tree after preliminary adjustment according to the pre-configured conversion structure identifier, the relation between the conversion structure identifier and the structure tree conversion template and the structure tree conversion function, so as to obtain the language structure tree expressed in the SQL form;

And the output unit is used for converting the language structure tree expressed in the SQL form into the language of the target database and outputting the SQL text.

The embodiment of the invention also provides computer equipment, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor realizes the database process language migration method when executing the computer program.

The embodiment of the invention also provides a computer readable storage medium, wherein the computer readable storage medium stores a computer program, and the computer program realizes the database process language migration method when being executed by a processor.

The embodiment of the invention also provides a computer program product, which comprises a computer program, wherein the computer program realizes the database process language migration method when being executed by a processor.

In the embodiment of the invention, the database process language migration scheme is realized by: abstracting the syntax of original procedural code in the original database into five types of code blocks: a memory process block, a definition block, an execution block, a control block, and a loop block; wherein: the stored process block represents the stored process code of the database, the definition block represents intermediate variables in the original process code, the execution block represents the execution final result operation statement, the control block represents the statement supporting logic judgment, and the circulation block represents the circulation statement; analyzing the original procedural code, and converting the original procedural code into an original grammar tree; labeling each node on the original grammar tree, and endowing each node with one code block type in the code blocks of five types according to a preset judging strategy to obtain a language structure tree with labels; traversing from the root node of the language structure tree to the breadth first, finding each control block in the tree, pushing all child nodes under the control block under the judging condition in the control block until all the control blocks are eliminated, and obtaining a language structure tree after preliminary adjustment; according to a pre-configured conversion structure identifier, the relation between the conversion structure identifier and a structure tree conversion template and a structure tree conversion function, gradually performing code block elimination operation on definition blocks and circulation blocks in the circulation structure of the language structure tree after preliminary adjustment to obtain a language structure tree expressed in an SQL (structured query language) form; the method has the advantages that the language structure tree expressed in the SQL form is converted into the language of the target database, and the SQL text is output, so that when the process language is migrated to the distributed database and is required to be developed by adopting the SQL language, the process language can be efficiently and safely converted into standard SQL sentences by adopting a flexible configuration method and a conversion template.

Drawings

In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art. In the drawings:

FIG. 1 is a schematic diagram of an implementation manner of an OLTP and OLAP database on a storage process according to an embodiment of the present invention;

FIG. 2 is a flow chart of a method for migrating a database process language according to an embodiment of the present invention;

FIG. 3 is a diagram of Oracle storage process code processing data in an embodiment of the invention;

FIG. 4 is a schematic diagram of SQL in which original procedural code is logically equivalently written according to the idea of collective processing in an embodiment of the invention;

FIG. 5 is a schematic diagram of a storage process of two databases according to an embodiment of the present invention;

FIG. 6 is a schematic diagram of 5 types of code blocks in an embodiment of the present invention;

FIG. 7 is a schematic diagram of a language structure tree in an embodiment of the present invention;

FIG. 8 is a schematic diagram of an original syntax tree in an embodiment of the present invention;

FIG. 9 is a schematic diagram of control block elimination in an embodiment of the present invention;

FIG. 10 is a diagram illustrating the definition block elimination according to the embodiment of the present invention;

FIG. 11 is a schematic diagram of loop block elimination in an embodiment of the invention;

FIG. 12 is a schematic diagram of a matching sequence of a template in an embodiment of the invention;

FIG. 13 is a schematic diagram of matching and replacement cases in an embodiment of the present invention;

FIG. 14 is a schematic diagram of database process language migration in an embodiment of the present invention;

FIG. 15 is a schematic diagram of a database process language migration apparatus according to an embodiment of the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the embodiments of the present invention will be described in further detail with reference to the accompanying drawings. The exemplary embodiments of the present invention and their descriptions herein are for the purpose of explaining the present invention, but are not to be construed as limiting the invention.

Before describing the scheme provided by the embodiment of the invention, the detailed description of the terms related to the invention is first provided.

Database process language:

database process language refers to a class of programming languages that are specific to operating databases, which may be stored in a database and executed by database management system calls. Common database procedural languages include Oracle PL/SQL (procedural programming language provided by the transaction type database of Oracle, inc.), mySQL Stored Procedure (procedural programming language provided by the open source transaction type database), and the like. Typically for implementing database objects storing procedures, triggers, functions, etc. These objects may be called to accomplish specific tasks, such as querying, updating or deleting data, etc. The use of database process languages may encapsulate complex business logic in a storage process, thereby improving the performance and security of the application.

OLTP (Online Transaction Processing ) database:

a relational database for transaction processing is mainly used for supporting online transaction processing in an enterprise-level application system. OLTP databases typically have the following characteristics:

high concurrency: OLTP databases need to support access and manipulation of data by a large number of concurrent users, and thus need to have high concurrent processing capabilities.

Low latency: OLTP databases need to respond to user requests in a short period of time and thus need to have the ability to quickly read and write data.

Simple query: OLTP databases are typically used to perform simple add-drop operations and do not involve complex statistical analysis or the like.

In summary, the OLTP database is a relational database designed specifically to support online transaction processing (e.g., accounting) in enterprise-level application systems, and has high requirements in terms of high concurrency, low latency, data consistency, and the like. Such as Oracle (transaction type database of Oracle corporation), mySQL (open source transaction type database), etc.

Distributed OLAP (Online Analysis Processing, online analytical processing) database:

the relational database for large-scale data sets adopts a distributed architecture and parallel processing technology, can process a large amount of data on a plurality of nodes at the same time, and generally has the following characteristics:

A distributed architecture: the OLAP database adopts a distributed architecture to store data in a distributed manner on a plurality of nodes, thereby realizing lateral expansion.

Parallel processing: by adopting the parallel processing technology, the query and calculation operations are simultaneously executed on a plurality of nodes, so that the throughput and the response speed of the system are improved.

Large-scale data support: the storage, management and query of mass data are supported, and complex query requests can be responded quickly.

High value business scenarios: due to the characteristics of high performance, high reliability, mass data support and the like, the distributed OLAP database is generally applied to business scenes with high requirements on large-scale data processing in the fields of finance, telecommunications, medical treatment and the like.

In short, the OLAP database is a relational database oriented to a large-scale data set, and has strong advantages in the aspects of distributed architecture, parallel processing, high reliability and the like. Such as Greenplum (product of the OLAP database of Pivotal corporation), teradata (product of the OLAP database of Teradata corporation), and the like.

FIG. 2 is a flow chart of a method for migrating a database process language according to an embodiment of the present invention, as shown in FIG. 2, the method includes the following steps:

step 101: abstracting the syntax of original procedural code in the original database into five types of code blocks: a memory process block, a definition block, an execution block, a control block, and a loop block; wherein: the stored process block represents the stored process code of the database, the definition block represents intermediate variables in the original process code, the execution block represents the execution final result operation statement, the control block represents the statement supporting logic judgment, and the circulation block represents the circulation statement;

Step 102: analyzing the original procedural code, and converting the original procedural code into an original grammar tree;

step 103: labeling each node on the original grammar tree, and endowing each node with one code block type in the code blocks of five types according to a preset judging strategy to obtain a language structure tree with labels;

step 104: traversing from the root node of the language structure tree to the breadth first, finding each control block in the tree, pushing all child nodes under the control block under the judging condition in the control block until all the control blocks are eliminated, and obtaining a language structure tree after preliminary adjustment;

step 105: according to a pre-configured conversion structure identifier, the relation between the conversion structure identifier and a structure tree conversion template and a structure tree conversion function, gradually performing code block elimination operation on definition blocks and circulation blocks in the circulation structure of the language structure tree after preliminary adjustment to obtain a language structure tree expressed in an SQL (structured query language) form;

step 106: the language structure tree expressed in the form of SQL is converted into the language of the target database and SQL text is output.

The database process language migration method provided by the embodiment of the invention works in the following steps: abstracting the syntax of original procedural code in the original database into five types of code blocks: a memory process block, a definition block, an execution block, a control block, and a loop block; wherein: the stored process block represents the stored process code of the database, the definition block represents intermediate variables in the original process code, the execution block represents the execution final result operation statement, the control block represents the statement supporting logic judgment, and the circulation block represents the circulation statement; analyzing the original procedural code, and converting the original procedural code into an original grammar tree; labeling each node on the original grammar tree, and endowing each node with one code block type in the code blocks of five types according to a preset judging strategy to obtain a language structure tree with labels; traversing from the root node of the language structure tree to the breadth first, finding each control block in the tree, pushing all child nodes under the control block under the judging condition in the control block until all the control blocks are eliminated, and obtaining a language structure tree after preliminary adjustment; according to a pre-configured conversion structure identifier, the relation between the conversion structure identifier and a structure tree conversion template and a structure tree conversion function, gradually performing code block elimination operation on definition blocks and circulation blocks in the circulation structure of the language structure tree after preliminary adjustment to obtain a language structure tree expressed in an SQL (structured query language) form; the method has the advantages that the language structure tree expressed in the SQL form is converted into the language of the target database, and the SQL text is output, so that when the process language is migrated to the distributed database and is required to be developed by adopting the SQL language, the process language can be efficiently and safely converted into standard SQL sentences by adopting a flexible configuration method and a conversion template. The database process language migration method will be described in detail.

The embodiment of the invention provides a process type programming method for processing data piece by piece, which is converted into an integrated processing process, and the integrated processing method can maximize the distributed computing capacity of an OLAP (on-line analytical processing) database. The following is a detailed description.

For example, many Oracle storage processes write code processing data as shown in fig. 3, the whole logic of the code is to take out the data one by one, query a mapping table for the Value of flex_value_set_name to be converted into ID, and then perform association update according to ID, and the whole Set of logic is converted into the idea of collection processing, which can be equivalently written into SQL as shown in fig. 4.

The statement shown in fig. 4 converts a plurality of lines of codes into an SQL statement, and can exert the parallel execution capability of the server as much as possible when executing in the distributed database, thereby obtaining a faster execution effect.

The most central problem in embodiments of the present invention is to address how to convert various procedural language codes written to a user into aggregate processing code. The whole treatment thought is as follows:

1. code parsing and labeling, steps 101-103 described above.

Labeling the language blocks to be converted, wherein the grammar of the stored procedure is abstracted into 5 types of code blocks:

a) Storing a process block: there are different names for the storage process of various databases, such as functions, storage processes, etc., but the structures are different, and 2 databases are shown in fig. 5, for example.

The text framed by the square frame in fig. 5 is the defining key of the function and the beginning and end of the storage procedure, and the whole code part can be marked as a storage procedure block.

b) Definition block

The definition block is mainly used for defining some intermediate variables in the whole code logic, and various codes can form the definition block as shown in the following table 1:

TABLE 1

The code blocks listed in table 1 (the relation among the definition blocks, the definition variables and the definition value sources) can be marked as definition blocks, the grammar and the range of the actual definition blocks are different according to different languages, and in specific implementation, the configuration marking is required to be realized according to different scenes, namely in one embodiment, the intermediate variables in the original procedural code are abstracted as definition blocks based on the relation among the definition blocks, the definition variables and the definition value sources which are configured in advance according to different database scenes.

c) Execution block

The execution block mainly contains DML (Data Modify Language, data modification statement, such as INSERT record, UPDATE record, DELETE record) statement, and function call, such as general UPDATE/DELETE/INSERT statement, and various database specific grammars such as MERGE (an Oracle unique statement supporting automatic judgment of a record for INSERT or UPDATE) statement, etc., which need to be determined according to the database type, i.e., in one embodiment, the type of the final result operation statement is determined according to the database type. An execution block is a block of statements that perform the final result operation, on which the final generated language needs to be based.

d) Control block

There are some grammars in the storage process that support logical decisions, such as IF.. ELSE IF.. END.

e) Circulation block

A series of grammars, look, END, for. These loop-like grammars are the portions that need to be eliminated as much as possible during processing.

Various block examples are shown in fig. 6.

All that is needed is to parse the codes, convert the codes into a tree structure (original syntax tree as shown in fig. 8), and mark each node on the tree to determine its code block type and various attributes. In specific implementation, the preset determination policy may be a relationship between a node and a code block type, and the current node is input into the relationship, so that the code block type corresponding to the current node can be determined, that is, one code block type of a storage process block, a definition block, an execution block, a control block and a circulation block is determined, and the current node is marked with the corresponding code block type.

The above code is converted into a language structure tree with special marks as in fig. 7.

Structural tree generation algorithm:

1. and inputting and storing a process lexical and grammar configuration file by using an antlr/java tool (which is a construction tool of an open-source grammar parser commonly used in two industries and has no Chinese translation) to obtain an abstract grammar tree parser.

2. A segment of stored procedure is input to the parser to obtain a concrete abstract syntax tree (fig. 8).

3. The following "depth traversal grammar tree" algorithm is repeatedly called, the input parameter is tree, [ ], and the returned content is the structure tree (as shown in fig. 7).

Structural tree construction algorithm: inputting a tree, var_list;

1) Initializing a variable block to be empty;

2) If the root node of the tree is package_stmt, for_stmt, while_stmt, if_stmt:

i. if the current node is a package_stmt, initializing a block into a stored procedure block;

if the current node is for_stmt, while_stmt;

1. then block is initialized to a cyclic block;

2. taking out var variable and select_stmt syntax tree of syntax tree as cyclic variable name and variable definition of cyclic block;

3. the var variable and the initialization count value 0 are added into a var_list list, and the structure is as [ { rec_line: [0,

the block pointer ] ], for the subsequent calculation of the reference number;

4. searching whether a variable in the var_list exists in the select_stmt syntax tree, and counting +1 if the variable exists in the var_list;

recursively calling a structure tree construction algorithm one by one for all child nodes (marked as child_node) at the lower level, inputting child_node and var_list, obtaining a return value of the child nodes, and putting the child nodes into a code block array for storing process blocks or circulation blocks one by one;

if there is a select_intro_stmt type node in the lower child node, find its corresponding application times from var_list and assign to the corresponding definition block, and delete the variable;

v, taking out the variable belonging to the block in the var_list, recording the variable reference times of the block, and deleting the variable in the var_list;

3) If the root node of the tree is an update_stmt, insert_stmt, merge_stmt, or the like statement that performs data modification;

i. initializing a block as an execution block, and taking a tree as an execution syntax tree of the execution block;

judging whether a variable in the var_list exists in the tree, if so, counting +1, and adding a pointer at the variable to point to a definition block of the variable;

4) If the root node of the tree is select_intostmt;

i. initializing a block as a definition block, extracting INTO+VAR nodes as variable names, and taking the rest nodes as definition grammar trees;

and judging whether the variable in the var_list exists in the definition grammar tree, if so, counting +1, and adding a pointer to the definition block of the variable.

After the above construction algorithm is executed, the input original grammar tree of fig. 8 can be converted into the finally generated language structure tree of fig. 7.

From the foregoing, it can be seen that, in one embodiment, each node on the original syntax tree is labeled, and each node is assigned to one code block type of the five types of code blocks according to a preset determination policy, so as to obtain a language structure tree with a label, which includes obtaining a language structure tree with a label according to the following structure tree construction algorithm:

acquiring an original grammar tree and an intermediate variable list;

initializing a variable block to be empty;

if the root node of the original syntax tree is package_stmt, for_stmt, while_stmt, if_stmt:

if the current node is a package_stmt, initializing a block into a stored procedure block;

if the current node is for_stmt, while_stmt:

then block is initialized to a cyclic block;

taking out the intermediate variable of the grammar tree and the select_stmt grammar tree as the cyclic variable name and variable definition of the cyclic block;

adding the intermediate variable and the initialization count value 0 into an intermediate variable list for subsequent calculation of the reference number;

searching whether a variable in an intermediate variable list exists in the select_stmt syntax tree, and counting +1 if the variable exists in the intermediate variable list;

recursively calling a structural tree construction algorithm one by one for all the sub-nodes of the current node, inputting all the sub-nodes and intermediate variable lists of the lower level, obtaining return values corresponding to all the sub-nodes and the intermediate variable lists of the lower level, and putting the return values into a code block array for storing process blocks or circulation blocks one by one;

If the selection_intro_stmt type node exists in the lower-level child nodes, finding the corresponding application times of the selection_intro_stmt type node from the intermediate variable list, assigning the application times to the corresponding definition blocks, and deleting the variable;

taking out the variable belonging to the block in the intermediate variable list, recording the variable reference times of the block, and deleting the variable in the intermediate variable list;

if the root node of the original syntax tree is update_stmt, insert_stmt, merge_stmt, a statement of data modification is performed:

initializing a block as an execution block, and taking an original grammar tree as an execution grammar tree of the execution block;

judging whether the variables in the intermediate variable list exist in the original grammar tree, if so, counting +1,

adding a pointer at the variable to point to a definition block of the variable;

if the root node of the original syntax tree is select_intostmt:

initializing a block as a definition block, extracting INTO+VAR nodes as variable names, and taking the rest nodes as definition grammar trees;

judging whether the variables in the intermediate variable list exist in the definition grammar tree, if so, counting +1,

while adding pointers to the defined blocks of the variable until a markup-bearing linguistic structure tree is obtained.

2. Secondly, replacing and adjusting the grammar tree, gradually replacing the circulation structure of the grammar tree, namely, the steps 104-105 are divided into the following steps:

1. defining a new syntax tree list after conversion, marking the new syntax tree as a code tree, initializing the new syntax tree as a null, and storing the syntax tree after conversion;

2. control block elimination: traversing from the root node to the breadth first, finding each control block in the traversal, pushing the judgment condition in the control block to all child nodes under the control block, and specifically, the following operation is carried out:

1) Find if_stmt node, find the judging condition in IF-THEN in child node, as in this example, rec_line.flag= 1

2) Finding a body_stmt_list under the if_stmt node, and taking out the sub-blocks one by one:

a) For the select_inter_stmt/update_stmt/insert_stmt statement, a condition is added to the statement

select_inter_stmt/update_stmt/insert_stmt statement;

b) The present procedure is recursively performed for if_stmt, and a similar operation is performed to push down the condition for the loop block while_stmt/for_stmt;

the effect of the control block cancellation is shown in figure 9.

3. Definition block elimination

1) Find definition blocks and find all the same level code blocks down, call the structure tree transfer function, eliminate definition blocks, as shown in fig. 10.

4. Cyclic block cancellation

1) Searching for execution blocks and determining whether a loop variable exists in the execution syntax tree of the execution blocks (as in figure 11,

there is a variable reference to rec line in select _ inter _ stmt/update _ stmt,

there is also a reference to the variable of select_intro_stmt by update_stmt, so what is to do is to eliminate this reference

2) Starting from an execution block, finding out all internal variables and definition/circulation blocks where the variables are located, calling a structure tree conversion function, searching a conversion template list for a corresponding adaptive conversion method, performing structure conversion, eliminating the application of the definition variables, and simultaneously enabling the reference times in the definition blocks/circulation blocks to be-1 until the execution block has no variable.

From the foregoing, it can be seen that in one embodiment, the language structure tree is traversed from the root node of the language structure tree to the breadth first, each control block is found, the judgment conditions in the control block are pushed to all child nodes under the control block until all the control blocks are eliminated, and a language structure tree after preliminary adjustment is obtained, including: the control block elimination is carried out according to the following method, and a language structure tree after preliminary adjustment is obtained:

traversing from the root node to the breadth first, finding each control block in the traversal, pushing the judgment condition in the control block to all child nodes under the control block, and specifically, the following operation is carried out:

Finding an if_stmt node, and finding a judging condition in the IF-THEN in the child node;

finding a body_stmt_list under the if_stmt node, and taking out the sub-blocks one by one:

for select_inter_stmt/update_stmt/insert_stmt, a condition is added to the statement;

for if_stmt, the present procedure is recursively executed, and for loop block while_stmt/for_stmt, a similar operation is performed to push down the condition.

From the foregoing, in one embodiment, according to a pre-configured transformation structure identifier, a relation between the transformation structure identifier and a transformation template of a structure tree, and a transformation function of the structure tree, performing a code block elimination operation on a definition block and a loop block in a loop structure of the language structure tree after preliminary adjustment step by step, to obtain a language structure tree expressed in an SQL form, including: the definition block elimination is performed according to the following method, and a language structure tree expressed in SQL is obtained:

searching the definition block, searching all the code blocks of the same level downwards, calling the structure tree conversion function, and eliminating the definition block.

From the foregoing, in one embodiment, according to a pre-configured transformation structure identifier, a relation between the transformation structure identifier and a transformation template of a structure tree, and a transformation function of the structure tree, performing a code block elimination operation on a definition block and a loop block in a loop structure of the language structure tree after preliminary adjustment step by step, to obtain a language structure tree expressed in an SQL form, including: the loop block cancellation is performed as follows:

Searching an execution block and judging whether a circulation variable exists in an execution structure tree of the execution block;

starting from an execution block, finding out all internal variables and definition blocks/circulation blocks where the variables are located, calling a structure tree conversion function, searching a conversion template list for a corresponding adaptive conversion method, carrying out structure conversion, eliminating the application of the definition variables, and simultaneously, setting the reference times-1 in the definition blocks/circulation blocks until the execution block has no variable.

3. Structural tree transformation and variable elimination, which introduces how the elimination of definition blocks and loop blocks will be performed according to the transformation template.

1. Conversion template and conversion function

The translation templates define a series of possible execution block syntax tree structure + translation structure identifiers as follows:

the matching identifiers (translation structure identifiers) are shown in table 2 below:

TABLE 2

The matching sequence of templates is schematically shown in fig. 12.

Template matching function:

input: grammar tree, conversion template

a) Reading the top node type from the template, judging whether the top node type is consistent with the grammar tree root node type, if so, continuing, and if not, returning to mismatch

b) And circularly reading the contents from the beginning to the end of the second line in the template, and carrying out the following processing piece by piece:

a) Pointer pTree points to the first child node below the root node

b) Reading the token one by one from the beginning of the line (token refers to each element in the template, including the above-mentioned matching identifier and various words, each token is divided by space), and processing i. the current token is node type one by one

1. Matching token with current type pointed to by pTree

a) Mismatch: function return mismatch

b) Matching: continuing the cycle

The current token is ×/+:

1. taking down a token;

2. pTree points to the next node if the current node is x, points to the next two nodes if it is + (return no match if there is no next or next two nodes);

3. judging whether the pTrees are matched with the current token or not, if not, continuing to point to the next node by the pTrees until the pTrees are matched, and if the pTrees are moved to the last node of the hierarchy and still not matched, returning to the mismatch;

current token is

1. Take down a token

2. Repeating the downward matching, if the nodes are not matched, then pTrees continue to point to the next node until the nodes are matched, if pTrees move to the last node of the hierarchy and still are not matched, then searching the next layer downwards, recursively executing the content of the step until the positions of the matching points are found, and if all the nodes are traversed and no matching part is found, returning the non-matching

iv. whether the current token is three hierarchy identifiers = >/- >/+ >

1. If it is an identifier of the = >/- >) structure type: and calling the template matching function to carry out recursion processing, and outputting parameters: 1. the tree under the child node pointed by pTree is used as an input grammar tree, and 2, the text from the next token to the next level is used as a conversion template;

2. if the identifier is + > an identifier of a structure type, a syntax tree of the identifier of the structure type is inserted.

v. the start character of the current token is left bracket "(" A) "

1. Taking a token in a complete bracket, taking out a variable name and a matching template, adopting the matching template in the bracket and a tree where a current node pTree is located, calling a matching function of the template to carry out recursion matching, if matching,

the variable name and the number node pointed to by pTree are added to the global variable list.

From the above, in one embodiment, the structure tree transformation function includes:

obtaining a structure tree and a conversion template;

reading the top node type from the template, judging whether the top node type is consistent with the structural tree root node type, if so, continuing, and if not, returning to mismatch;

and circularly reading the contents from the beginning to the end of the second line in the template, and carrying out the following processing piece by piece:

Pointer pTree points to the first child node below the root node;

reading the token from the beginning of the line one by one, wherein the token refers to each element in the template, the element comprises a conversion structure identifier and various words, and each token is divided by a space and is processed one by one as follows:

if the current token is of the node type:

matching token with the current type pointed to by pTree:

if not, the function returns a mismatch;

if so, continuing to circulate;

if the current token is ×/+:

taking down a token;

pTree points to the next node if the current node is +, and points to the next two nodes if the current node is +; if there is no next or next two nodes, returning a mismatch;

judging whether the pTrees are matched with the current token, if not, continuing to point to the next node by the pTrees until the pTrees are matched, and if the last pTrees are still not matched to the nodes of the hierarchy, returning to the mismatch;

if the current token is:

taking down a token;

repeating the downward match, if not, pTree continues to point to the next node until a match,

if pTree moves to the last node of the nodes of the hierarchy and still does not match, the next layer is searched downwards, the content of the step is recursively executed until the matching point position is found, and if all nodes are traversed and no matching part is found yet, the mismatch is returned.

From the foregoing, in one embodiment, the token is read from the beginning of a line one by one, where the token refers to each element in the template, the element includes a transformation structure identifier and various words, and each token is divided by a space, and the following processing is performed one by one, and further includes:

judging whether the current token is three hierarchy identifiers of = >/- >/+).

If it is = >/- > -type: and calling the template transfer function to carry out recursion treatment, and outputting parameters:

adopting a tree under the child node pointed by pTree as an input structure tree;

the text from the next token to the next level is used as a conversion template;

if + >, a structural tree of the structure is inserted.

if the start character of the current token is left bracket "(":

taking a token in a complete bracket, taking out a variable name and a matching template, adopting the matching template in the bracket and a tree where the current node pTrees are located, calling the conversion function of the template to carry out recursion matching, and if matching is carried out, adding the variable name and the node pointed by the pTrees into a global variable list.

2. Variable cancellation

Based on the above conversion function, the conversion of the structure tree can be realized by combining with the conversion template, as shown in "fig. 13: the elimination of the work of defining blocks "may be performed by templates such as the following, specific matches and alternatives such as FIG. 13:

according to the method, the variable is extracted and tree structure converted one by using the template, finally, the structure tree with the adjusted structure is generated, and finally, the structure tree is restored into the code text.

The embodiment of the invention is mainly used for solving the problem of how to quickly convert the original procedural storage process of the OLTP database into the adaptive MPP database, and the converted sentences can be rewritten to realize efficient operation on the OLAP database. That is, in one embodiment, the original database is an OLTP database and the target database is an OLAP database.

As shown in fig. 14, the embodiment of the present invention includes several modules:

1. the procedural language parsing module (which may include the one described in the following embodiments) mainly implements the conversion of procedural codes in the above example into an Abstract Syntax Tree (AST), facilitating the subsequent modular processing, i.e. the conversion of the original database SQL (structured query language) into an abstract syntax tree, i.e. the module implements the above steps 101-103.

2. The sentence block conversion module mainly adjusts the structure of the AST generated in the previous step according to the conversion template to form a new AST, namely, converting the process language characteristics of circulation, judgment and the like in the abstract syntax tree into the abstract syntax tree expressed in the SQL form and outputting the abstract syntax tree, namely, the module realizes the steps 104-105.

3. The target language generating module generates codes according to the adjusted AST, namely, the codes are converted into the language of the target database according to the abstract syntax tree in the last step and SQL text is output, namely, the module realizes the step 106.

In summary, the embodiment of the invention adopts the template which can be customized by the user through the flexible collocation method and a corresponding set of template matching method to realize the conversion of procedural sentences in the database storage process into standard SQL sentences. The invention aims to solve the problem of automatic conversion of the processing program of the original database when the stored process code is migrated to the distributed database and needs to be developed by adopting a standard SQL language. The embodiment of the invention can automatically convert grammar of slow execution, gradual judgment and other types on the distributed database into SQL sentences for batch execution to the greatest extent, thereby improving the calculation performance; and secondly, the automatic conversion method can maximally reduce the manual labor and save the cost.

The embodiment of the invention also provides a database process language migration device, which is described in the following embodiment. Because the principle of the device for solving the problem is similar to that of the database process language migration method, the implementation of the device can refer to the implementation of the database process language migration method, and the repetition is omitted.

FIG. 15 is a schematic structural diagram of a database process language migration apparatus according to an embodiment of the present invention, as shown in FIG. 15, the apparatus includes:

a code block abstraction unit 01, configured to abstract syntax of original procedural code in an original database into five types of code blocks: a memory process block, a definition block, an execution block, a control block, and a loop block; wherein: the stored process block represents the stored process code of the database, the definition block represents intermediate variables in the original process code, the execution block represents the execution final result operation statement, the control block represents the statement supporting logic judgment, and the circulation block represents the circulation statement;

a parsing unit 02 for parsing the original procedural code and converting the original procedural code into an original syntax tree;

the labeling unit 03 is used for labeling each node on the original grammar tree, and assigning each node with one code block type in the five types of code blocks according to a preset judging strategy to obtain a language structure tree with a label;

The preliminary adjustment unit 04 is used for traversing from the root node of the language structure tree to the breadth first, finding each control block in the tree, pushing the judgment condition in the control block to all child nodes under the control block until all the control blocks are eliminated, and obtaining a language structure tree after preliminary adjustment;

a conversion unit 05, configured to gradually perform a code block elimination operation on a definition block and a circulation block in the circulation structure of the language structure tree after preliminary adjustment according to a pre-configured conversion structure identifier, a relation between the conversion structure identifier and a structure tree conversion template, and a structure tree conversion function, so as to obtain a language structure tree expressed in an SQL form;

and an output unit 06 for converting the language structure tree expressed in the form of SQL into the language of the target database and outputting SQL text.

In one embodiment, intermediate variables in the original procedural code are abstracted into defined blocks based on relationships between defined blocks, defined variables, and defined value sources that are pre-configured according to different database scenarios.

In one embodiment, the type of the final result operation statement is determined from a database type.

In one embodiment, the labeling unit is specifically configured to obtain a language structure tree with a label according to the following structure tree construction algorithm:

Acquiring an original grammar tree and an intermediate variable list;

initializing a variable block to be empty;

if the current node is for_stmt, while_stmt:

then block is initialized to a cyclic block;

judging whether a variable in an intermediate variable list exists in the original grammar tree, if so, counting +1, and adding a pointer at the variable to point to a definition block of the variable;

if the root node of the original syntax tree is select_intostmt:

In one embodiment, the preliminary adjustment unit is specifically configured to perform control block elimination according to the following method to obtain a language structure tree after preliminary adjustment:

In one embodiment, the conversion unit is specifically configured to perform defined block elimination according to the following method to obtain a language structure tree expressed in SQL form:

In an embodiment, the conversion unit is specifically configured to perform loop block cancellation according to the following method:

In one embodiment, the structure tree transformation function comprises:

obtaining a structure tree and a conversion template;

pointer pTree points to the first child node below the root node;

if the current token is of the node type:

matching token with the current type pointed to by pTree:

if not, the function returns a mismatch;

if so, continuing to circulate;

if the current token is ×/+:

taking down a token;

If the current token is:

taking down a token;

repeating the downward matching, if the nodes are not matched, then pTree continues to point to the next node until the nodes are matched, if pTree moves to the last node of the nodes of the hierarchy and still is not matched, then searching the next layer downwards, recursively executing the content of the step until the position of the matching point is found, and if all the nodes are traversed, then returning the non-matching part if the matching part is not found.

In one embodiment, the token is read one by one from the beginning of a line, the token refers to each element in the template, the element includes a transformation structure identifier and various words, each token is divided by a space, and the following processing is performed one by one, and the method further includes:

judging whether the current token is three hierarchy identifiers of = >/- >/+).

if + >, a structural tree of the structure is inserted.

If the start character of the current token is left bracket "(":

In one embodiment, the original database is an OLTP database and the target database is an OLAP database.

It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The foregoing description of the embodiments has been provided for the purpose of illustrating the general principles of the invention, and is not meant to limit the scope of the invention, but to limit the invention to the particular embodiments, and any modifications, equivalents, improvements, etc. that fall within the spirit and principles of the invention are intended to be included within the scope of the invention.

Claims

1. A method for migrating a database process language, comprising:

2. The method of claim 1, wherein intermediate variables in the original procedural code are abstracted into definition blocks based on relationships between definition blocks, definition variables, and definition value sources that are pre-configured according to different database scenarios.

3. The method of claim 1, wherein the type of the final result operation statement is determined based on a database type.

4. The method of claim 1, wherein labeling each node on the original syntax tree, assigning each node to one of five types of code blocks according to a preset decision strategy, obtaining a labeled language structure tree, comprising obtaining a labeled language structure tree according to the following structure tree construction algorithm:

acquiring an original grammar tree and an intermediate variable list;

initializing a variable block to be empty;

if the current node is for_stmt, while_stmt:

Then block is initialized to a cyclic block;

searching whether a variable in an intermediate variable list exists in the select_stmt syntax tree, and counting +1 if the variable exists in the intermediate variable list; recursively calling a structural tree construction algorithm one by one for all the sub-nodes of the current node, inputting all the sub-nodes and intermediate variable lists of the lower level, obtaining return values corresponding to all the sub-nodes and the intermediate variable lists of the lower level, and putting the return values into a code block array for storing process blocks or circulation blocks one by one;

if the root node of the original syntax tree is select_intostmt:

judging whether the variable in the intermediate variable list exists in the definition grammar tree, if so, counting +1, and adding a pointer to the definition block of the variable until the language structure tree with the mark is obtained.

5. The method of claim 1, wherein traversing from the language structure tree root node down breadth first, finding each control block therein, pushing down judgment conditions in the control block to all child nodes under the control block until all control blocks are eliminated, obtaining a preliminary adjusted language structure tree, comprising: the control block elimination is carried out according to the following method, and a language structure tree after preliminary adjustment is obtained:

for the select_inter_stmt/update_stmt/insert_stmt statement, a condition is added to the statement;

6. The method of claim 1, wherein the step of performing the code block elimination operation on the defined blocks and the loop blocks in the loop structure of the preliminarily adjusted language structure tree according to the pre-configured transformation structure identifier, the relation between the transformation structure identifier and the structure tree transformation template, and the structure tree transformation function, to obtain the language structure tree expressed in the form of SQL, comprises: the definition block elimination is performed according to the following method, and a language structure tree expressed in SQL is obtained:

7. The method of claim 1, wherein the step of performing the code block elimination operation on the defined blocks and the loop blocks in the loop structure of the preliminarily adjusted language structure tree according to the pre-configured transformation structure identifier, the relation between the transformation structure identifier and the structure tree transformation template, and the structure tree transformation function, to obtain the language structure tree expressed in the form of SQL, comprises: the loop block cancellation is performed as follows:

8. The method of claim 6 or 7, wherein the structure tree transformation function comprises:

obtaining a structure tree and a conversion template;

pointer pTree points to the first child node below the root node;

if the current token is of the node type:

Matching token with the current type pointed to by pTree:

if not, the function returns a mismatch;

if so, continuing to circulate;

if the current token is ×/+:

taking down a token;

judging whether the pTrees are matched with the current token, if not, continuing to point to the next node by the pTrees until the pTrees are matched, and if the last pTrees are still not matched to the nodes of the hierarchy, returning to the mismatch; if the current token is:

taking down a token;

9. The method of claim 8, wherein the token refers to each element in the template being read one by one from the beginning of a line, the element including a translation structure identifier and various words, each token being divided by space, one by one, further comprising:

Judging whether the current token is three hierarchical conversion structure identifiers of = >/- >/+:

if it is an identifier of = >/- >: and calling the template transfer function to carry out recursion treatment, and outputting parameters:

if it is an + > identifier, a structural tree of the identifier is inserted.

10. The method of claim 9, wherein the token refers to each element in the template being read one by one from the beginning of a line, the element including a translation structure identifier and various words, each token being divided by space, one by one, further comprising:

if the start character of the current token is left bracket "(":

11. The method of claim 1, wherein the original database is an OLTP database and the target database is an OLAP database.

12. A database process language migration apparatus, comprising:

13. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method of any of claims 1 to 11 when executing the computer program.

14. A computer readable storage medium, characterized in that the computer readable storage medium stores a computer program which, when executed by a processor, implements the method of any of claims 1 to 11.

15. A computer program product, characterized in that the computer program product comprises a computer program which, when executed by a processor, implements the method of any of claims 1 to 11.