CN112307050B - Identification method and device for repeated correlation calculation and computer system - Google Patents

Identification method and device for repeated correlation calculation and computer system Download PDF

Info

Publication number
CN112307050B
CN112307050B CN202010973509.XA CN202010973509A CN112307050B CN 112307050 B CN112307050 B CN 112307050B CN 202010973509 A CN202010973509 A CN 202010973509A CN 112307050 B CN112307050 B CN 112307050B
Authority
CN
China
Prior art keywords
sql statement
association
query
data table
calculation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010973509.XA
Other languages
Chinese (zh)
Other versions
CN112307050A (en
Inventor
丁庆晏
徐伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suning Cloud Computing Co Ltd
Original Assignee
Suning Cloud Computing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suning Cloud Computing Co Ltd filed Critical Suning Cloud Computing Co Ltd
Priority to CN202010973509.XA priority Critical patent/CN112307050B/en
Publication of CN112307050A publication Critical patent/CN112307050A/en
Priority to CA3130988A priority patent/CA3130988A1/en
Application granted granted Critical
Publication of CN112307050B publication Critical patent/CN112307050B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2433Query languages
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a method, a device and a computer system for identifying repeated correlation calculation, wherein the method comprises the steps of obtaining a first SQL statement and a second SQL statement to be identified; analyzing the first SQL statement, and identifying a first association query included in the first SQL statement, wherein the association query includes association calculation between data tables required by executing the SQL statement; analyzing the second SQL statement and identifying a second associated query included by the second SQL statement; when the first correlation query and the second correlation query have repeated correlation calculation, determining that the first SQL statement and the second SQL statement have repeated correlation calculation, and identifying whether the plurality of SQL statements include repeated correlation calculation, so that the SQL statements including repeated correlation calculation can be optimized and adjusted in the following process, and the operating efficiency of a data platform is further improved.

Description

Identification method and device for repeated correlation calculation and computer system
Technical Field
The present invention relates to the field of data processing, and in particular, to a method, an apparatus, and a computer system for identifying duplicate association calculation.
Background
In a data processing scenario such as a big data offline task, a large number of SQL statements need to be processed. During the execution of a large number of SQL statements, repeated associative computations on two data tables often occur. Such repeated associated computation may result in a great amount of waste of computing resources and storage resources, seriously affect the operating efficiency of the data platform and increase the operating cost of the data platform. Therefore, a method capable of identifying repeated correlation calculations included in a plurality of SQL statements is highly desirable.
Disclosure of Invention
In order to solve the defects of the prior art, the present invention mainly aims to provide a method, an apparatus and a computer system for identifying duplicate association calculation included in an SQL statement.
In order to achieve the above object, the present invention provides, in a first aspect, a method for identifying a repetitive correlation calculation, the method including:
acquiring a first SQL statement and a second SQL statement to be identified;
analyzing the first SQL statement, and identifying a first association query included in the first SQL statement, wherein the association query includes association calculation between data tables required by executing the SQL statement;
analyzing the second SQL statement and identifying a second associated query included by the second SQL statement;
when the first correlation query and the second correlation query have repeated correlation calculation, determining that the first SQL statement and the second SQL statement have repeated correlation calculation.
In some embodiments, the association calculation includes a corresponding data table and an association relation keyword, where the association relation keyword is used to describe association calculation required among data tables, the parsing the first SQL statement, and identifying a first association query included in the first SQL statement includes:
analyzing the first SQL statement, and identifying a first incidence relation keyword contained in the first SQL statement and a first data table and a second data table corresponding to the first incidence relation keyword;
and determining first association calculation included in the first association query according to the first data table, the second data table and the first association relation key word.
In some embodiments, said parsing said first SQL statement, identifying a first associated query comprised by said first SQL statement comprises:
analyzing the first SQL statement to generate json data corresponding to the first SQL statement;
and identifying a first associated query included by the first SQL statement according to the json data.
In some embodiments, the second SQL statement comprises a sub-query and an associated query of the sub-query with a third data table, and the parsing the second SQL statement to identify a second associated query comprised by the second SQL statement comprises:
analyzing the second SQL statement, and identifying a second incidence relation keyword included in the sub-query and a fourth data table and a fifth data table corresponding to the second incidence relation keyword;
determining second association calculation included in the second association query according to the second association relation key words and a fourth data table and a fifth data table corresponding to the second association relation key words;
identifying a third association relation keyword included in the second SQL statement and the third data table and the sub-query corresponding to the third association relation keyword;
determining third association calculation included in the second association query according to the third association relation key words, the third data table and the fourth data table;
and determining fourth association calculation included in the second association query according to the third association relation key words, the third data table and the fifth data table.
In some embodiments, the first SQL statement and the second SQL statement comprise corresponding to-be-processed data tables, the method comprising:
and when the corresponding data table to be processed is a temporary table, replacing the data table to be processed with a corresponding entity table.
In some embodiments, the determining that there is a duplicate association calculation between the first SQL statement and the second SQL statement when there is a duplicate association calculation between the first association query and the second association query comprises:
grouping the association calculation according to the association relation key words;
and when any group comprises the same association calculation of the corresponding data table, determining that the first SQL statement and the second SQL statement have repeated association calculation.
In a second aspect, the present application provides an apparatus for identifying duplicate association calculations, the apparatus comprising:
the acquisition module is used for acquiring a first SQL statement and a second SQL statement to be identified;
the analysis module is used for analyzing the first SQL statement and identifying a first correlation query included in the first SQL statement, wherein the correlation query includes correlation calculation between data tables required by execution of the SQL statement; analyzing the second SQL statement and identifying a second associated query included by the second SQL statement;
and the processing module is used for determining that the first SQL statement and the second SQL statement have repeated correlation calculation when the first correlation query and the second correlation query have repeated correlation calculation.
In some embodiments, the parsing module may be further configured to parse the first SQL statement, and identify a first incidence relation keyword included in the first SQL statement and a first data table and a second data table corresponding to the first incidence relation keyword; and determining first association calculation included in the first association query according to the first data table, the second data table and the first association relation key word.
In some embodiments, the second SQL statement includes a sub-query and an associated query of the sub-query and a third data table, and the parsing module is further configured to parse the second SQL statement, and identify a second association relation keyword included in the sub-query and a fourth data table and a fifth data table corresponding to the second association relation keyword; determining second association calculation included in the second association query according to the second association relation key words and a fourth data table and a fifth data table corresponding to the second association relation key words; identifying a third association relation keyword included in the parsed second SQL statement, wherein the third association relation keyword is used for describing an association query of the sub-query and the third data table; determining third association calculation included in the second association query according to the third association relation key words, the third data table and the fourth data table; and determining fourth association calculation included in the second association query according to the third association relation key word, the third data table and the fifth data table.
In a third aspect, the present application provides a computer system comprising:
one or more processors;
and memory associated with the one or more processors for storing program instructions that, when read and executed by the one or more processors, perform operations comprising:
acquiring a first SQL statement and a second SQL statement to be identified;
analyzing the first SQL statement, and identifying a first correlation query included in the first SQL statement, wherein the correlation query includes correlation calculation between data tables required by execution of the SQL statement;
analyzing the second SQL statement and identifying a second associated query included in the second SQL statement;
when repeated correlation calculation exists in the first correlation query and the second correlation query, determining that repeated correlation calculation exists between the first SQL statement and the second SQL statement.
The invention has the following beneficial effects:
the application provides an identification method of repeated correlation calculation, which comprises the steps of obtaining a first SQL statement and a second SQL statement to be identified; analyzing the first SQL statement, and identifying a first association query included in the first SQL statement, wherein the association query includes association calculation between data tables required by executing the SQL statement; analyzing the second SQL statement and identifying a second associated query included by the second SQL statement; when the first correlation query and the second correlation query have repeated correlation calculation, determining that the first SQL statement and the second SQL statement have repeated correlation calculation, and identifying whether the plurality of SQL statements contain repeated correlation calculation, so that the SQL statements containing repeated correlation calculation can be optimized and adjusted subsequently, and the operating efficiency of a data platform is further improved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings required to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the description below are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a flow chart illustrating identification of a duplicate association calculation provided by an embodiment of the present application;
FIG. 2 is a flow diagram of a repetitive correlation computation of an identification task provided by an embodiment of the present application;
FIG. 3 is a flow chart of a method provided by an embodiment of the present application;
FIG. 4 is a block diagram of an apparatus according to an embodiment of the present disclosure;
fig. 5 is a computer system structure diagram provided in the embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, belong to the protection scope of the present invention.
As described in the background art, in order to solve the above problems, the present application provides a method for identifying a repeated correlation calculation, as shown in fig. 1 and 2, the identification of the repeated correlation calculation using the method includes:
step one, acquiring an SQL statement to be analyzed;
the udf function can be used for extracting an original SQL query task from the Hive task and the sparkSQL task, and then extracting an SQL statement to be analyzed from the original task.
Analyzing the SQL sentences, and respectively generating corresponding Json data according to the association query obtained by analysis;
the udf function can be developed by using an antlr technology and is used for analyzing the SQL statement.
The udf function can analyze and obtain the incidence relation key words contained in the SQL sentence, the data table to be queried corresponding to the incidence relation key words, the database where the data table to be queried is located and the incidence conditions corresponding to the incidence relation key words according to the input SQL sentence, and generates corresponding Json data according to all the data obtained through analysis.
When the data table to be queried comprises a temporary table, the corresponding entity table can be replaced.
The incidence relation keywords can include Join and union. The Join includes Join, LEFT Join, INNER Join, RIGHT Join, CROSS Join, FULL Join, and NOT _ Join. In order to ensure the identification accuracy, analysis rules respectively corresponding to various types of SQL statements can be preset, wherein the types comprise the SQL statement containing JOIN, the SQL statement containing UNION and the SQL statement containing both JOIN and UNION or containing sub-queries. When the incidence relation key words contained in the SQL sentences are obtained through analysis, the SQL sentences are further analyzed according to the corresponding analysis rules so as to identify the data tables to be inquired corresponding to the incidence relation key words, the database where the data tables to be inquired are located, the incidence conditions corresponding to the incidence relation key words and the like.
Determining the associated query contained in the SQL statement according to the Json data corresponding to the SQL statement;
in order to identify the association query, one or more association calculations included in the association query need to be identified, and according to data included in the Json data, the association relation key words and the corresponding data tables included in the SQL statement can be determined, and the association relation key words and the corresponding data tables form one association calculation.
For example, according to Json data, the first SQL statement includes an associated query to table a and a sub-query of the T1 database, and the sub-query includes an associated query to table B and table C of the T1 database. The key word of the association relation between the table B and the table C is LEFT JOIN; the association relationship between the table A and the sub-query is Union, and the association condition is a first association condition. Determining that a first incidence relation keyword contained in the SQL statement is JOIN, the corresponding data tables to be queried are a table B and a table C, and a database where the data tables to be queried are located is T1; the obtained second incidence relation key words are Union, the corresponding data tables to be inquired are a table A and a table B, and the incidence condition is the first incidence condition; the third association key word is Union, the corresponding data tables to be queried are table A and table C, and the association condition is the first association condition. The first SQL statement comprises three associative computations, namely UNION associative computation of table a and table B, UNION associative computation of table a and table C, and JOIN associative computation of table B and table C.
And obtaining according to the Json data, wherein the SQL statement comprises the association query of a table B and a table C of the T1 database, the association relation key word is RIGHT JOIN, and the association operation included by the second SQL statement is the JOIN association calculation of the table B and the table C.
And fourthly, grouping the json data according to the included incidence relation keywords, and counting whether the json data which are included in the same group and have the same data table to be processed appear or not and the occurrence times.
And step five, when the occurrence frequency is not less than a preset threshold value, judging that json data which are the same as the included data table to be processed comprise repeated correlation calculation.
Preferably, when the number of occurrences is not less than 1, judging that json data identical to the included data table to be processed includes repeating the correlation calculation.
Since the first SQL statement and the second SQL statement both include JOIN correlation calculation for table B and table C, it may be determined that there is duplicate correlation calculation for the first SQL statement and the second SQL statement.
The hive table can be generated according to the identified repeated correlation calculation and the corresponding SQL statement and is provided for technical staff to refer, so that the technical staff can optimize and adjust the query process of the SQL statement and the SQL statement, and the operation efficiency of the system is improved.
Example two
Corresponding to the foregoing embodiments, the present application provides an identification method for repeated correlation calculation, as shown in fig. 3, the method includes:
310. acquiring a first SQL statement and a second SQL statement to be identified;
preferably, the first SQL statement and the second SQL statement include corresponding to-be-processed data tables, and the method includes:
311. and when the corresponding data table to be processed is a temporary table, replacing the data table to be processed with a corresponding entity table.
320. Analyzing the first SQL statement, and identifying a first association query included in the first SQL statement, wherein the association query includes association calculation between data tables required by executing the SQL statement;
preferably, the association calculation includes a corresponding data table and an association relation keyword, where the association relation keyword is used to describe association calculation that needs to be performed between data tables, and the analyzing the first SQL statement and identifying a first association query included in the first SQL statement includes:
321. analyzing the first SQL statement, and identifying a first incidence relation keyword contained in the first SQL statement and a first data table and a second data table corresponding to the first incidence relation keyword;
322. and determining first association calculation included in the first association query according to the first data table, the second data table and the first association relation key word.
Preferably, the parsing the first SQL statement and identifying the first associated query included in the first SQL statement includes:
323. analyzing the first SQL statement to generate json data corresponding to the first SQL statement;
324. and identifying a first associated query included by the first SQL statement according to the json data.
330. Analyzing the second SQL statement and identifying a second associated query included by the second SQL statement;
preferably, the second SQL statement includes a sub-query and an associated query of the sub-query and a third data table, and the analyzing the second SQL statement and identifying the second associated query included in the second SQL statement includes:
331. analyzing the second SQL statement, and identifying a second incidence relation keyword included in the sub-query and a fourth data table and a fifth data table corresponding to the second incidence relation keyword;
332. determining second association calculation included in the second association query according to the second association relation key words and a fourth data table and a fifth data table corresponding to the second association relation key words;
333. identifying a third association relation keyword included in the second SQL statement and the third data table and the sub-query corresponding to the third association relation keyword;
334. determining third association calculation included in the second association query according to the third association relation key words, the third data table and the fourth data table;
335. and determining fourth association calculation included in the second association query according to the third association relation key word, the third data table and the fifth data table.
340. When repeated correlation calculation exists in the first correlation query and the second correlation query, determining that repeated correlation calculation exists between the first SQL statement and the second SQL statement.
Preferably, when there is a duplicate correlation calculation between the first correlation query and the second correlation query, the determining that there is a duplicate correlation calculation between the first SQL statement and the second SQL statement includes:
341. grouping the association calculation according to the association relation key words;
342. and when any group comprises the same association calculation of the corresponding data table, determining that the first SQL statement and the second SQL statement have repeated association calculation.
EXAMPLE III
In response to the above method, the present application proposes an apparatus for identifying duplicate association calculation, as shown in fig. 4, the apparatus including:
an obtaining module 410, configured to obtain a first SQL statement and a second SQL statement to be identified;
the parsing module 420 is configured to parse the first SQL statement, and identify a first association query included in the first SQL statement, where the association query includes association calculation between data tables that needs to be performed when the SQL statement is executed; analyzing the second SQL statement and identifying a second associated query included by the second SQL statement;
the processing module 430 is configured to determine that there is repeated correlation calculation between the first SQL statement and the second SQL statement when there is repeated correlation calculation between the first correlation query and the second correlation query.
Preferably, the parsing module 420 may be further configured to parse the first SQL statement, and identify a first incidence relation keyword included in the first SQL statement and a first data table and a second data table corresponding to the first incidence relation keyword; and determining first association calculation included in the first association query according to the first data table, the second data table and the first association relation key word.
Preferably, the second SQL statement includes a sub-query and an association query between the sub-query and a third data table, and the parsing module 420 may be further configured to parse the second SQL statement, and identify a second association relationship keyword included in the sub-query and a fourth data table and a fifth data table corresponding to the second association relationship keyword; determining second association calculation included in the second association query according to the second association relation key words and a fourth data table and a fifth data table corresponding to the second association relation key words; identifying a third association relation keyword included in the parsed second SQL statement, wherein the third association relation keyword is used for describing association query of the sub-query and the third data table; determining third association calculation included in the second association query according to the third association relation key words, the third data table and the fourth data table; and determining fourth association calculation included in the second association query according to the third association relation key word, the third data table and the fifth data table.
Preferably, the parsing module 420 is further configured to parse the first SQL statement to generate json data corresponding to the first SQL statement; and identifying a first associated query included in the first SQL statement according to the json data.
Preferably, the first SQL statement and the second SQL statement include corresponding to-be-processed data tables, and the obtaining module 410 is further configured to replace the to-be-processed data tables with corresponding entity tables when the corresponding to-be-processed data tables are temporary tables.
Preferably, the processing module 430 is further configured to group association calculations according to association relation keywords; and when any group comprises the correlation calculation with the same corresponding data table, determining that the first SQL statement and the second SQL statement have repeated correlation calculation.
Example four
Corresponding to the above method, apparatus, and system, a fourth embodiment of the present application provides a computer system, including:
one or more processors; and memory associated with the one or more processors for storing program instructions that, when read and executed by the one or more processors, perform operations comprising:
acquiring a first SQL statement and a second SQL statement to be identified;
analyzing the first SQL statement, and identifying a first correlation query included in the first SQL statement, wherein the correlation query includes correlation calculation between data tables required by execution of the SQL statement;
analyzing the second SQL statement and identifying a second associated query included by the second SQL statement;
when the first correlation query and the second correlation query have repeated correlation calculation, determining that the first SQL statement and the second SQL statement have repeated correlation calculation.
Fig. 5 illustrates an architecture of a computer system 1500 that may include, in particular, a processor 1510, a video display adapter 1511, a disk drive 1512, an input/output interface 1513, a network interface 1514, and a memory 1520. The processor 1510, the video display adapter 1511, the disk drive 1512, the input/output interface 1513, the network interface 1514, and the memory 1520 may be communicatively connected by a communication bus 1530.
The processor 1510 may be implemented by a general-purpose CPU (Central Processing Unit), a microprocessor, an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits, and is configured to execute related programs to implement the technical solution provided by the present Application.
The Memory 1520 may be implemented in the form of a ROM (Read Only Memory), a RAM (Random Access Memory), a static storage device, a dynamic storage device, or the like. The memory 1520 may store an operating system 1521 for controlling the operation of the computer system 1500, a basic input output system BIOS1522 for controlling low-level operations of the computer system 1500. In addition, a web browser 1523, a data storage management system 1524, an icon font processing system 1525, and the like may also be stored. The icon font processing system 1525 may be an application program that implements the operations of the foregoing steps in this embodiment of the application. In summary, when the technical solution provided by the present application is implemented by software or firmware, the relevant program codes are stored in the memory 1520 and called for execution by the processor 1510.
The input/output interface 1513 is used for connecting an input/output module to input and output information. The i/o module may be configured as a component in a device (not shown) or may be external to the device to provide a corresponding function. Wherein the input devices may include a keyboard, mouse, touch screen, microphone, various sensors, etc., and the output devices may include a display, speaker, vibrator, indicator light, etc.
The network interface 1514 is used to connect a communication module (not shown) to enable the communication interaction of the present device with other devices. The communication module can realize communication in a wired mode (such as USB, network cable and the like) and also can realize communication in a wireless mode (such as mobile network, WIFI, bluetooth and the like).
The bus 1530 includes a path to transfer information between the various components of the device, such as the processor 1510, the video display adapter 1511, the disk drive 1512, the input/output interface 1513, the network interface 1514, and the memory 1520.
In addition, the computer system 1500 may also obtain information of specific pickup conditions from the virtual resource object pickup condition information database 1541 for performing condition judgment, and the like.
It should be noted that although the above devices only show the processor 1510, the video display adapter 1511, the disk drive 1512, the input/output interface 1513, the network interface 1514, the memory 1520, the bus 1530, etc., in a specific implementation, the device may also include other components necessary for normal operation. In addition, it will be understood by those skilled in the art that the above-described apparatus may also include only the components necessary to implement the embodiments of the present application, and need not include all of the components shown in the figures.
From the above description of the embodiments, it is clear to those skilled in the art that the present application can be implemented by software plus necessary general hardware platform. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, or the like, and includes several instructions for enabling a computer device (which may be a personal computer, a cloud server, or a network device) to execute the method according to the embodiments or some parts of the embodiments of the present application.
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, the system or system embodiments are substantially similar to the method embodiments and therefore are described in a relatively simple manner, and reference may be made to some of the descriptions of the method embodiments for related points. The above-described system and system embodiments are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims (8)

1. An identification method for repeating association calculations, the method comprising:
acquiring a first SQL statement and a second SQL statement to be identified;
analyzing the first SQL statement, and identifying a first association query included in the first SQL statement, wherein the association query includes association calculation between data tables required by executing the SQL statement;
analyzing the second SQL statement and identifying a second associated query included by the second SQL statement;
when repeated correlation calculation exists in the first correlation query and the second correlation query, determining that repeated correlation calculation exists between the first SQL statement and the second SQL statement;
the analyzing the second SQL statement and identifying the second associated query included in the second SQL statement includes:
analyzing the second SQL statement, and identifying a second incidence relation keyword included in the sub-query and a fourth data table and a fifth data table corresponding to the second incidence relation keyword;
determining second association calculation included in a second association query according to the second association relation key words and a fourth data table and a fifth data table corresponding to the second association relation key words;
identifying a third association relation keyword included in the second SQL statement and a third data table and a sub-query corresponding to the third association relation keyword;
determining third association calculation included in the second association query according to the third association relation key words, the third data table and the fourth data table;
and determining fourth association calculation included in the second association query according to the third association relation key words, the third data table and the fifth data table.
2. The method of claim 1, wherein the association calculation includes corresponding data tables and association keywords, the association keywords are used to describe association calculations required to be performed between the data tables, the parsing the first SQL statement, and the identifying a first association query included in the first SQL statement includes:
analyzing the first SQL statement, and identifying a first incidence relation keyword contained in the first SQL statement and a first data table and a second data table corresponding to the first incidence relation keyword;
and determining first association calculation included in the first association query according to the first data table, the second data table and the first association relation key word.
3. The method of claim 1, wherein parsing the first SQL statement and identifying a first associated query comprised by the first SQL statement comprises:
analyzing the first SQL statement to generate json data corresponding to the first SQL statement;
and identifying a first associated query included by the first SQL statement according to the json data.
4. The method according to any of claims 1-3, wherein the first SQL statement and the second SQL statement include corresponding to-be-processed data tables, and the method includes:
and when the corresponding data table to be processed is a temporary table, replacing the data table to be processed with a corresponding entity table.
5. The method of claim 2, wherein determining that there is a duplicate association calculation between the first SQL statement and the second SQL statement when there is a duplicate association calculation between the first association query and the second association query comprises:
grouping the association calculation according to the association relation key words;
and when any group comprises the same association calculation of the corresponding data table, determining that the first SQL statement and the second SQL statement have repeated association calculation.
6. An apparatus for identifying duplicate association calculations, the apparatus comprising:
the acquisition module is used for acquiring a first SQL statement and a second SQL statement to be identified;
the analysis module is used for analyzing the first SQL statement and identifying a first correlation query included in the first SQL statement, wherein the correlation query includes correlation calculation between data tables required by execution of the SQL statement; analyzing the second SQL statement and identifying a second associated query included in the second SQL statement;
the processing module is used for determining that repeated correlation calculation exists between the first SQL statement and the second SQL statement when repeated correlation calculation exists between the first correlation query and the second correlation query;
the second SQL statement comprises a sub query and an associated query of the sub query and a third data table, and the analysis module is further used for analyzing the second SQL statement and identifying a second associated relation keyword and a fourth data table and a fifth data table corresponding to the second associated relation keyword, wherein the second associated relation keyword is included in the sub query; determining second association calculation included in a second association query according to the second association relation key words and a fourth data table and a fifth data table corresponding to the second association relation key words; identifying a third association relation keyword included in the parsed second SQL statement, wherein the third association relation keyword is used for describing association query of the sub-query and a third data table; determining third association calculation included in the second association query according to the third association relation key words, the third data table and the fourth data table; and determining fourth association calculation included in the second association query according to the third association relation key words, the third data table and the fifth data table.
7. The apparatus according to claim 6, wherein the parsing module is further configured to parse the first SQL statement, and identify a first association relation keyword included in the first SQL statement and a first data table and a second data table corresponding to the first association relation keyword; and determining first association calculation included in the first association query according to the first data table, the second data table and the first association relation key word.
8. A computer system, the system comprising:
one or more processors;
and memory associated with the one or more processors for storing program instructions that, when read and executed by the one or more processors, perform operations comprising:
acquiring a first SQL statement and a second SQL statement to be identified;
analyzing the first SQL statement, and identifying a first correlation query included in the first SQL statement, wherein the correlation query includes correlation calculation between data tables required by execution of the SQL statement;
analyzing the second SQL statement and identifying a second associated query included by the second SQL statement;
when the first correlation query and the second correlation query have repeated correlation calculation, determining that the first SQL statement and the second SQL statement have repeated correlation calculation;
the analyzing the second SQL statement and identifying the second associated query included in the second SQL statement includes:
analyzing the second SQL statement, and identifying a second incidence relation keyword included in the sub-query and a fourth data table and a fifth data table corresponding to the second incidence relation keyword;
determining second association calculation included in a second association query according to the second association relation key words and a fourth data table and a fifth data table corresponding to the second association relation key words;
identifying a third association relation keyword included in the second SQL statement and a third data table and a sub-query corresponding to the third association relation keyword;
determining third association calculation included in the second association query according to the third association relation key words, the third data table and the fourth data table;
and determining fourth association calculation included in the second association query according to the third association relation key words, the third data table and the fifth data table.
CN202010973509.XA 2020-09-16 2020-09-16 Identification method and device for repeated correlation calculation and computer system Active CN112307050B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202010973509.XA CN112307050B (en) 2020-09-16 2020-09-16 Identification method and device for repeated correlation calculation and computer system
CA3130988A CA3130988A1 (en) 2020-09-16 2021-09-16 Method and device for identifying repetitive association calculation and computer system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010973509.XA CN112307050B (en) 2020-09-16 2020-09-16 Identification method and device for repeated correlation calculation and computer system

Publications (2)

Publication Number Publication Date
CN112307050A CN112307050A (en) 2021-02-02
CN112307050B true CN112307050B (en) 2022-11-15

Family

ID=74483971

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010973509.XA Active CN112307050B (en) 2020-09-16 2020-09-16 Identification method and device for repeated correlation calculation and computer system

Country Status (2)

Country Link
CN (1) CN112307050B (en)
CA (1) CA3130988A1 (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108038135A (en) * 2017-11-21 2018-05-15 平安科技(深圳)有限公司 Electronic device, the method for multilist correlation inquiry and storage medium
CN109656946A (en) * 2018-09-29 2019-04-19 阿里巴巴集团控股有限公司 A kind of multilist relation query method, device and equipment
CN110909016A (en) * 2019-10-12 2020-03-24 中国平安财产保险股份有限公司 Database-based repeated association detection method, device, equipment and storage medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108038135A (en) * 2017-11-21 2018-05-15 平安科技(深圳)有限公司 Electronic device, the method for multilist correlation inquiry and storage medium
CN109656946A (en) * 2018-09-29 2019-04-19 阿里巴巴集团控股有限公司 A kind of multilist relation query method, device and equipment
CN110909016A (en) * 2019-10-12 2020-03-24 中国平安财产保险股份有限公司 Database-based repeated association detection method, device, equipment and storage medium

Also Published As

Publication number Publication date
CA3130988A1 (en) 2022-03-16
CN112307050A (en) 2021-02-02

Similar Documents

Publication Publication Date Title
TWI643076B (en) Financial analysis system and method for unstructured text data
CN109933514B (en) Data testing method and device
CN110688544A (en) Method, device and storage medium for querying database
CN113987086A (en) Data processing method, data processing device, electronic device, and storage medium
CN111435406A (en) Method and device for correcting database statement spelling errors
CN114579104A (en) Data analysis scene generation method, device, equipment and storage medium
CN111427784A (en) Data acquisition method, device, equipment and storage medium
WO2018053889A1 (en) Distributed computing framework and distributed computing method
WO2018205391A1 (en) Method, system and apparatus for evaluating accuracy of information retrieval, and computer-readable storage medium
CN112307050B (en) Identification method and device for repeated correlation calculation and computer system
CN113360672B (en) Method, apparatus, device, medium and product for generating knowledge graph
CN114896269A (en) Structured query statement detection method and device, electronic equipment and storage medium
CN114860753A (en) SQL statement generation method, device, equipment, medium and product
CN114116773A (en) Structured Query Language (SQL) text auditing method and device
CN112214497A (en) Label processing method and device and computer system
CN113704234A (en) Data quality detection method and system based on big data application
CN109901983B (en) Automatic testing method and device, storage medium and electronic equipment
CN113779117A (en) Data monitoring method and device, storage medium and electronic equipment
CN113901094B (en) Data processing method, device, equipment and storage medium
CN110750569A (en) Data extraction method, device, equipment and storage medium
CN112035425B (en) Log storage method and device and computer system
CN112597149B (en) Data table similarity determination method and device
CN111339748B (en) Evaluation method, device, equipment and medium of analytical model
CN109325043B (en) Method and device for marking SQL (structured query language) statements and electronic equipment
CN117827840A (en) Index creation method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant