CN114428789A - Data processing method and device - Google Patents

Data processing method and device Download PDF

Info

Publication number
CN114428789A
CN114428789A CN202210352919.1A CN202210352919A CN114428789A CN 114428789 A CN114428789 A CN 114428789A CN 202210352919 A CN202210352919 A CN 202210352919A CN 114428789 A CN114428789 A CN 114428789A
Authority
CN
China
Prior art keywords
data table
sql statement
statistical function
white list
preset
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210352919.1A
Other languages
Chinese (zh)
Other versions
CN114428789B (en
Inventor
黄萌
林丹
阳兵
阳万里
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Industrial and Commercial Bank of China Ltd ICBC
Original Assignee
Industrial and Commercial Bank of China Ltd ICBC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Industrial and Commercial Bank of China Ltd ICBC filed Critical Industrial and Commercial Bank of China Ltd ICBC
Priority to CN202210352919.1A priority Critical patent/CN114428789B/en
Publication of CN114428789A publication Critical patent/CN114428789A/en
Application granted granted Critical
Publication of CN114428789B publication Critical patent/CN114428789B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2433Query languages
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors

Abstract

The present application relates to the field of big data technologies, and in particular, to a data processing method and apparatus. According to the processing method, a Structured Query Language (SQL) statement is obtained, if a statistical function exists in the SQL statement and a data table related to the statistical function is not a preset white list data table, optimization processing is carried out on the SQL statement, and the optimization processing is used for reducing the influence of execution of the SQL statement on the response speed of a database; and if the SQL statement has the statistical function and the data table related to the statistical function is a preset white list data table, executing the SQL statement. The data processing method provided by the application can simultaneously give consideration to the response speed requirement of the database and the requirement of outputting the quantity of non-repeated results in the designated data table, so that the database resources can be reasonably utilized.

Description

Data processing method and device
Technical Field
The present application relates to the field of big data technologies, and in particular, to a data processing method and apparatus.
Background
With the continuous development of the field of big data technology, Structured Query Language (SQL) is widely used in data processing. SQL is a database query and programming language for accessing data and querying, updating, and managing relational database systems; and is also an extension of the database script file.
In the process of data query through SQL statement, the number of non-repeated results satisfying the query condition in the specified data table needs to be counted sometimes. One commonly used method for requesting statistics on the number of non-duplicate results satisfying the query condition in a given data table is as follows: this is achieved by a SQL statement that contains a function for requesting the return of a statistical quantity and uses a key for requesting the return of a uniquely different value. For example, a "count (distinguct y from a)" clause is included in the SQL statement to request to return the number of data whose y columns are not repeated in table a, where a represents the data table to be queried, y represents the column to be queried, "distinguct" represents a key for requesting to return a unique different value, and "count ()" represents a function for requesting to return a statistical number.
However, when the database counts the number of non-repetitive results satisfying the query condition in the specified data table based on the SQL statement, a large amount of database resources are usually consumed, the response speed of the database is affected, and the application program using the database is very unsmooth.
In order to solve the above problems, the following methods are proposed in the art: that is, whether a function for requesting to return the statistical quantity exists in the SQL statement is judged, if not, the SQL statement is executed, otherwise, optimization processing is performed on the SQL statement, wherein the optimization processing refers to an operation for reducing the influence of the SQL statement on the response speed of the database. As an example, the optimization process includes not executing the function for requesting a return statistical quantity in the SQL statement.
For example, it may be determined whether the SQL statement includes the statistical function "count ()", if not, the SQL statement is normally executed, otherwise, the SQL statement is optimized, for example, the statistical function "count ()" in the SQL statement is not executed.
However, the above method causes the following problems: resulting in some scenarios where it is necessary to output a number of non-duplicate results in a given data table that satisfy the query criteria, the database cannot satisfy this requirement.
Therefore, how to simultaneously satisfy the requirement of the response speed of the database and the requirement of the quantity of non-repeated results in the output specified data table becomes a technical problem to be solved urgently.
Disclosure of Invention
The data processing method and device provided by the application are used for simultaneously considering the response speed requirement of the database and the requirement of outputting the quantity of non-repeated results in the designated data table.
In a first aspect, the present application provides a data processing method, applied to a server, including: acquiring a Structured Query Language (SQL) statement; if a statistical function exists in the SQL statement and a data table related to the statistical function is not a preset white list data table, performing optimization processing on the SQL statement, wherein the optimization processing is used for reducing the influence of the execution of the SQL statement on the response speed of a database; and if the SQL statement has a statistical function and the data table related to the statistical function is a preset white list data table, executing the SQL statement.
In the method, the SQL sentences which have statistical functions and the data tables related to the statistical functions are not preset white list data tables are optimized, so that the consumption of database resources can be reduced, the influence of the execution of the SQL sentences on the response speed of the database is reduced, and the response speed of the database is further improved; and the optimization processing is used for reducing the influence of the execution of the SQL statement on the response speed of the database, and compared with the prior art, the method can avoid the problem that the database cannot meet the number of non-repeated results meeting the query condition in a specified data table, and further achieve the effect of simultaneously considering the response speed requirement of the database and the number requirement of the non-repeated results in the specified data table.
With reference to the first aspect, in a possible implementation manner, the performing optimization processing on the SQL statement includes: determining whether a duplicate removal clause exists in the SQL sentence or not, wherein the duplicate removal clause comprises a column name to be inquired; and if the duplication removing clause exists in the SQL sentence, replacing the duplication removing clause with a grouping statistical data clause, wherein the grouping statistical data clause is used for grouping fields corresponding to the column names to be inquired so as to return non-repeated fields corresponding to the column names to be inquired.
In the implementation mode, when the duplication removing clause exists in the SQL statement, the duplication removing clause in the SQL statement is replaced by the grouping statistical data clause for optimization, so that the influence of the execution of the SQL statement on the response speed of the database is reduced.
With reference to the first aspect, in a possible implementation manner, the performing optimization processing on the SQL statement includes: the SQL statement is not executed.
In the implementation manner, the SQL statement may not be executed to perform the optimization processing on the SQL statement, so as to reduce the consumption of database resources and reduce the influence of the execution of the SQL statement on the response speed of the database.
With reference to the first aspect, in a possible implementation manner, the preset white list data table is used to indicate a data table in which a data amount in the data table is smaller than or equal to a preset data amount.
In the implementation mode, the data volume of the data table in the preset white list data table is smaller than or equal to the preset data volume, so that the influence on the response speed of the database is small, the direct execution can be realized, and the database resource is reasonably utilized.
With reference to the first aspect, in a possible implementation manner, whether the data table related to the statistical function is a preset white list data table includes: determining whether the data table related to the statistical function is located in the preset white list data table or not according to the table name of the data table related to the statistical function and the preset white list data table, wherein the preset white list data table comprises the table name of the data table of which the data amount is smaller than or equal to the preset data amount; and if the table name of the data table related to the statistical function is in the table names of the preset white list data table, the data table related to the statistical function is positioned in the preset white list data table.
In the implementation mode, whether the data table related to the statistical function is located in the preset white list data table or not is determined according to the table name of the data table related to the statistical function and the table name in the preset white list data table, the determined result is accurate and reliable, next optimization processing or direct operation execution is facilitated, and the response speed of the database is further improved.
With reference to the first aspect, in a possible implementation manner, whether the data table related to the statistical function is a preset white list data table includes: determining whether the data table related to the statistical function is located in the preset white list data table or not according to the table name of the data table related to the statistical function and a preset black list data table, wherein the preset black list data table comprises the table name of the data table of which the data amount in the data table is larger than or equal to the preset data amount; and if the table name of the data table related to the statistical function is not in the table names of the preset blacklist data table, the data table related to the statistical function is positioned in the preset white list data table.
In the implementation mode, whether the data table related to the statistical function is located in the preset white list data table or not is determined according to the table name of the data table related to the statistical function and the table name in the preset black list data table, the determined result is accurate and reliable, next optimization processing or direct operation execution is facilitated, and the response speed of the database is further improved.
In a second aspect, the present application provides an apparatus for processing data. The processing means may comprise functional modules for implementing the method of the first aspect or any one of its possible implementations. The technical effects of the processing apparatus can refer to the technical effects brought by the corresponding method, which are not described herein again.
The processing device may include: the acquisition module is used for acquiring a Structured Query Language (SQL) statement; the processing module is used for optimizing the SQL statement if a statistical function exists in the SQL statement and a data table related to the statistical function is not a preset white list data table, and the optimization is used for reducing the influence of the execution of the SQL statement on the response speed of a database; the processing module is further configured to execute the SQL statement if a statistical function exists in the SQL statement and a data table related to the statistical function is a preset white list data table.
With reference to the second aspect, in a possible implementation manner, the processing module is specifically configured to: determining whether a duplicate removal clause exists in the SQL statement, wherein the duplicate removal clause comprises a column name to be inquired; and if the duplication removing clause exists in the SQL sentence, replacing the duplication removing clause with a grouping statistical data clause, wherein the grouping statistical data clause is used for grouping fields corresponding to the column names to be inquired so as to return non-repeated fields corresponding to the column names to be inquired.
With reference to the second aspect, in a possible implementation manner, the processing module is specifically configured to: the SQL statement is not executed.
With reference to the second aspect, in a possible implementation manner, the preset white list data table is used to indicate a data table in which the data amount in the data table is smaller than or equal to a preset data amount.
With reference to the second aspect, in a possible implementation manner, the processing module is further configured to determine whether the data table related to the statistical function is located in the preset white list data table according to the table name of the data table related to the statistical function and the preset white list data table, where the preset white list data table includes the table name of the data table whose data amount in the data table is smaller than or equal to a preset data amount; and if the table name of the data table related to the statistical function is in the table names of the preset white list data table, the data table related to the statistical function is positioned in the preset white list data table.
With reference to the second aspect, in a possible implementation manner, the processing module is further configured to determine whether the data table related to the statistical function is located in the preset white list data table according to a table name of the data table related to the statistical function and a preset black list data table, where the preset black list data table includes a table name of the data table whose data amount in the data table is greater than or equal to a preset data amount; and if the table name of the data table related to the statistical function is not in the table names of the preset blacklist data table, the data table related to the statistical function is positioned in the preset whitelist data table.
In a third aspect, the present application provides a server, comprising: a memory and a processor; the memory is to store program instructions; the processor is configured to invoke program instructions in the memory to perform a method according to the first aspect or any one of its possible implementations.
In a fourth aspect, the present application provides a computer-readable storage medium having stored therein computer-executable instructions for implementing the method according to the first aspect or any one of the possible implementation manners thereof when executed by a processor.
In a fifth aspect, the present application provides a computer program product comprising a computer program which, when executed by a processor, is configured to implement the method according to the first aspect or any one of the possible implementations thereof.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application.
Fig. 1 is a schematic view of an application scenario provided in an embodiment of the present application;
FIG. 2 is a flow chart of a method for processing data according to an embodiment of the present application;
FIG. 3 is a block diagram of a data processing system according to an embodiment of the present application;
FIG. 4 is a flowchart of a processing method of an analysis and judgment module in the data processing system according to an embodiment of the present application;
FIG. 5 is a flowchart of a method for processing an optimization module in a data processing system, according to an embodiment of the present application;
FIG. 6 is a schematic diagram of a data processing apparatus according to an embodiment of the present application;
fig. 7 is a schematic view of an apparatus according to another embodiment of the present application.
With the above figures, there are shown specific embodiments of the present application, which will be described in more detail below. The drawings and written description are not intended to limit the scope of the inventive concepts in any manner, but rather to illustrate the concepts of the application by those skilled in the art with reference to specific embodiments.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the following exemplary examples do not represent all implementations consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present application, as detailed in the appended claims.
A specific application scenario of the present application is shown in fig. 1, and the scenario includes a client 101 and a server 102. The connection between the client 101 and the server 102 may be direct or indirect through wired or wireless communication, and the application is not limited herein.
Optionally, in the field of database technology, the SQL statement is input by the client 101, and the server 102 receives the SQL statement input by the client 101 and executes the statement, so as to implement functions such as data query, data addition, data deletion, and data update.
In the process of performing data query through SQL statements, sometimes it is related to the number of non-repetitive results satisfying query conditions in a specified data table that needs to be counted, and a currently common method for requesting to count the number of non-repetitive results satisfying query conditions in a specified data table is as follows: this is achieved by a SQL statement that contains a function for requesting the return of a statistical quantity and uses a key for requesting the return of a uniquely different value. For example, a "count (distinguct y from a)" clause is included in the SQL statement to request to return the number of data whose y columns are not repeated in table a, where a represents the data table to be queried, y represents the column to be queried, "distinguct" represents a key for requesting to return a unique different value, and "count ()" represents a function for requesting to return a statistical number.
Although the number of non-repeated results satisfying the query condition in the specified data table can be finally counted by using a statistical function such as "count ()", the function scans the data of the full data table during counting, and needs to perform a sorting operation on the basis of scanning the full data table, and a large amount of data completely unrelated to the results are involved in the process. Therefore, when the database counts the number of non-repetitive results satisfying the query condition in the specified data table based on the SQL statement, a large amount of database resources are consumed, the response speed of the database is affected, and the application program using the database is very unsmooth.
In order to solve the above problems, the following methods are proposed in the art: that is, whether a function for requesting to return the statistical quantity exists in the SQL statement is judged, if not, the SQL statement is executed, otherwise, optimization processing is performed on the SQL statement, wherein the optimization processing refers to an operation for reducing the influence of the SQL statement on the response speed of the database. As an example, the optimization process includes not executing the function for requesting a return statistical quantity in the SQL statement.
For example, it may be determined whether the SQL statement includes a statistical function "count ()", and if not, the SQL statement is normally executed; otherwise, the SQL statement is optimized, for example, the statistical function "count ()" in the SQL statement is not executed.
However, the above method causes the following problems: resulting in some scenarios where it is necessary to output a number of non-duplicate results in a given data table that satisfy the query criteria, the database cannot satisfy this requirement.
Therefore, how to simultaneously satisfy the requirement of the response speed of the database and the requirement of the quantity of non-repeated results in the output specified data table becomes a technical problem to be solved urgently.
In view of the above, the present application provides a data processing method, which aims to solve the above technical problems in the prior art.
The following describes the technical solutions of the present application and how to solve the above technical problems with specific examples. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments. Embodiments of the present application will be described below with reference to the accompanying drawings.
With reference to the application scenario shown in fig. 1, fig. 2 is a flowchart of a data processing method provided in an embodiment of the present application, and is applied to the server 102. As shown in fig. 2, the method provided in the embodiment of the present application includes S201, S202, S203, and S204. The individual steps in the method shown in fig. 2 are explained in detail below.
S201, obtaining the SQL statement.
In this step, SQL is a language with a very powerful query function, and as long as there is data in the database, it can be found from the database by an appropriate method.
For example, a user inputs an SQL statement through the client 101 in fig. 1, and the server 102 obtains the SQL statement input by the user and performs lexical and syntactic analysis on the SQL statement, for example, the SQL statement may be segmented and analyzed segment by segment.
S202, if the SQL statement has the statistical function and the data table related to the statistical function is not a preset white list data table, optimizing the SQL statement, wherein the optimizing is used for reducing the influence of the execution of the SQL statement on the response speed of the database.
Alternatively, the technician may set the white list data table in advance based on the requirement as the preset white list data table.
As an example, the preset white list data table may include a data table in which the data amount is less than or equal to the preset data amount. As another example, the preset white list data table may include a data table in which the number of non-duplicated data needs to be output even though the data amount is greater than or equal to the preset data amount.
In this step, in some possible implementation manners, the obtained SQL statement may be analyzed first to obtain an abstract syntax tree of a database node (sqlnode), then all nodes in the abstract syntax tree are traversed, whether a query node (e.g., select node) exists in the SQL statement is searched, if the query node exists, a table name of a data table in the query node is obtained, otherwise, the SQL statement is executed directly.
Optionally, the obtained SQL statements are parsed by a syntax parser supporting syntax extension to support parsing requirements of different syntaxes of multiple data sources. For example, the open-source syntax parsers "antlr" and "javacc" can realize multi-data-source syntax support by expanding syntax and rule files; the open source data analysis and storage system (Druid) has built-in programmatic parser.
Meanwhile, whether a statistical function (such as "count ()") exists in the SQL statement is searched, if so, whether a data table related to the statistical function is located in a preset white list data table is further judged, and if not, the SQL statement is optimized so as to reduce the influence of the execution of the SQL statement on the response speed of the database.
Illustratively, looking for the existence of a statistical function (e.g., "count ()") in an SQL statement, all "function nodes" in all query nodes ("select nodes") can be recursively checked, because the "count ()" function follows the "select nodes" and is a "function node").
One way to determine whether the data table related to the statistical function is located in the preset white list data table can be implemented as follows: and determining whether the data table related to the statistical function is located in the preset white list data table or not according to the table name of the data table related to the statistical function and the preset white list data table, and if the table name of the data table related to the statistical function is in the table names of the preset white list data table, locating the data table related to the statistical function in the preset white list data table.
Another implementation manner for judging whether the data table related to the statistical function is located in the preset white list data table is as follows: determining whether the data table related to the statistical function is located in a preset white list data table or not according to the table name of the data table related to the statistical function and a preset black list data table, wherein the preset black list data table comprises the table name of the data table of which the data amount in the data table is larger than or equal to the preset data amount; and if the table name of the data table related to the statistical function is not in the table names of the preset blacklist data table, the data table related to the statistical function is located in the preset white list data table.
In the case that a statistical function exists in the SQL statement and the data table related to the statistical function is not a preset white list data table, an implementation manner of performing optimization processing on the SQL statement is as follows: and if the SQL statement has the duplication eliminating clause, replacing the duplication eliminating clause with a grouping statistical data clause, and returning to execute so as to reduce the influence of the execution of the SQL statement on the response speed of the database.
And the grouping statistical data clause is used for grouping the fields corresponding to the column names to be queried so as to return the fields corresponding to the non-repeated column names to be queried.
Illustratively, whether a form of "count (diagnosis XXXX) from table" exists in the SQL statement is judged, if yes, the "count (diagnosis XXXX) from table" is modified into a format of "count (1) from (select XXXXXX from table group by XXXXXX)" by using an optimization template, and then the SQL statement is returned to be executed again. Wherein "XXXX" is the name of the corresponding column in the data table, "distint" is the deduplication key, "1" represents the 1 st column of the data table, and "group by" is the grouping key.
Alternatively, "distint" supports single-column, multi-column deduplication. The single-column duplicate removal mode is simple and easy to understand, namely only 1 identical value is reserved; the multiple columns are deduplicated according to the specified deduplicated column information, that is, only if all the specified column information is the same, the information is regarded as the duplicate information.
Another implementation way of performing optimization processing on the SQL statement is as follows: the SQL statement is not executed, so that the resource consumption of the database can be reduced, and the response speed of the database is improved.
S203, if the SQL statement has the statistical function and the data table related to the statistical function is a preset white list data table, executing the SQL statement.
If the SQL statement has the statistical function and the data table related to the statistical function is the preset white list data table, the data amount in the data table related to the statistical function is smaller than or equal to the preset data amount, and the influence on the response speed of the database is not large, so that the method can be directly executed, and the database resource can be reasonably utilized.
In the embodiment, by acquiring the SQL statement, if a statistical function exists in the SQL statement and a data table related to the statistical function is not a preset white list data table, optimization processing is performed on the SQL statement, and the optimization processing is used for reducing the influence of execution of the SQL statement on the response speed of the database; if the SQL statement has the statistical function and the data table related to the statistical function is the preset white list data table, the SQL statement is executed, the response speed requirement of the database and the requirement of outputting the quantity of non-repeated results in the specified data table can be considered at the same time, and therefore the database resources can be reasonably utilized.
On the basis of the foregoing embodiment, fig. 3 is a schematic structural diagram of a data processing system provided in an embodiment of the present application, and the system is applied to the server 102 in the foregoing embodiment, and the data processing system includes a parsing and determining module 301 and an optimizing module 302.
And the analysis and judgment module 301 is configured to analyze the obtained SQL statements, take the obtained SQL statements to an "sqlnode" syntax tree, traverse all nodes, and screen out the SQL statements that a statistical function exists in the SQL statements and a data table related to the statistical function is not a preset white list data table.
The optimizing module 302 is configured to determine whether the SQL statement of the problem screened by the analyzing and determining module 301 can be optimized, if yes, apply the optimizing template to optimize the SQL statement, otherwise, not execute the SQL statement.
As an example, on the basis of the foregoing embodiments, fig. 4 is a flowchart of a processing method of the parsing and determining module in the data processing system according to an embodiment of the present application. As shown in fig. 4, the processing method includes S401, S402, S403, S404, S405, S406, S407, S408, S409, and S410. The individual steps in the method shown in fig. 4 are explained in detail below.
S401, obtaining the SQL statement.
In this step, the server obtains the SQL statement input by the user, and the SQL statement may be used to perform a data query operation.
S402, analyzing the SQL statement, if the analysis is successful, executing S404, otherwise, executing S403.
In this step, the SQL statement is parsed by a syntax parser to obtain an abstract syntax tree.
And S403, directly executing the SQL statement.
S404, searching all query nodes, if the query nodes exist, executing S405, otherwise, executing S403.
In this step, a recursive method may be used to traverse all nodes in the abstract syntax tree to find query nodes (e.g., "select nodes").
S405, obtaining the table name in the query node.
S406, judging whether the data table corresponding to the table name is a preset white list data table or not, if so, executing S407, otherwise, executing S408.
And the data volume in the preset white list data table does not reach the data magnitude which influences the performance of the database.
Optionally, whether the data table corresponding to the table name is the preset white list data table is determined by judging whether the table name is the table name in the preset white list data table.
S407, checking the next inquiry node.
S408, searching whether a statistical function exists in the SQL statement, if so, executing S409, otherwise, executing S407.
Illustratively, looking for the existence of a statistical function (e.g., "count ()") in an SQL statement, all "function nodes" in all query nodes ("select nodes") can be recursively checked, because the "count ()" function follows the "select nodes" and is a "function node").
Optionally, the sequence of the two steps of searching whether the SQL statement has the statistical function and judging whether the data table corresponding to the table name is the preset white list data table may be exchanged, and the sequence of the two steps is not limited in the embodiment of the present application.
And S409, judging whether the optimization template is met, if so, executing S410, otherwise, executing S407.
In this step, it is determined whether a deduplication clause (e.g., "distint XXXX") exists in the SQL statement, and if so, the SQL statement conforms to the optimization template.
S410, adding a specified list.
FIG. 5 is a flowchart of a method for processing an optimization module in a data processing system, according to an embodiment of the present application. As shown in fig. 5, the processing method includes S501, S502, S503, S504, S505, and S506. The individual steps in the method shown in fig. 5 are explained in detail below.
S501, obtaining SQL sentences in the specified list.
S502, the statistical function is recursively checked.
In this step, all the "function nodes" are recursively checked, and the statistical function statements are screened out.
S503, judging whether the duplicate removal clause exists, if so, executing S505, otherwise, executing S504.
Illustratively, whether a "distinct XXXX" exists in the SQL statement or not is judged, and if the "distinct XXXX" exists, a deduplication clause exists.
S504, the next statistical function is checked if the optimization is impossible.
And S505, replacing the duplicate removal clause with a grouping statistical data clause and outputting the grouping statistical data clause.
Illustratively, in this step, the deduplication clause "count (distinguint XXXX) from table" is modified into the packet statistics clause "count (1) from (select XXXX) from table group by XXXX)" format for output, and then returned for re-execution. Wherein "XXXX" is the name of the corresponding column in the data table, "distint" is the deduplication key, "1" represents the 1 st column of the data table, and "group by" is the grouping key.
S506, the SQL statement is returned to be executed again.
In the embodiment, the influence of the execution of the SQL statement on the response speed of the database can be accurately reduced through the processing methods of the analysis and judgment module and the optimization module, so that the response speed of the database is improved, and the database resources can be reasonably utilized.
On the basis of the above embodiments, fig. 6 shows a device 600 for processing data provided by the embodiments of the present application, where the device 600 includes: an acquisition module 601 and a processing module 602.
The obtaining module 601 is configured to obtain a structured query language SQL statement; the processing module 602 is configured to perform optimization processing on the SQL statement if a statistical function exists in the SQL statement and a data table related to the statistical function is not a preset white list data table, where the optimization processing is used to reduce an influence of execution of the SQL statement on a response speed of the database; the processing module 602 is further configured to execute the SQL statement if a statistical function exists in the SQL statement and a data table related to the statistical function is a preset white list data table.
As an example, the apparatus 600 may be configured to perform the method shown in fig. 2, for example, the obtaining module 601 is configured to perform S201, and the processing module 602 is configured to perform S202 and S203.
In a possible implementation manner, the processing module 602 is specifically configured to: determining whether a duplicate removal clause exists in the SQL sentence, wherein the duplicate removal clause comprises a column name to be inquired; and if the duplication eliminating clause exists in the SQL sentence, replacing the duplication eliminating clause with a grouping statistical data clause, wherein the grouping statistical data clause is used for grouping the fields corresponding to the column names to be inquired so as to return the fields corresponding to the non-repeated column names to be inquired.
In a possible implementation manner, the processing module 602 is specifically configured to: the SQL statement is not executed.
In a possible implementation manner, the preset white list data table is used for indicating a data table in which the data amount in the data table is smaller than or equal to the preset data amount.
In a possible implementation manner, the processing module 602 is further configured to determine whether the data table related to the statistical function is located in a preset white list data table according to a table name of the data table related to the statistical function and the preset white list data table, where the preset white list data table includes a table name of a data table whose data amount in the data table is less than or equal to a preset data amount; and if the table name of the data table related to the statistical function is in the table names of the preset white list data table, the data table related to the statistical function is located in the preset white list data table.
In a possible implementation manner, the processing module 602 is further configured to determine whether the data table related to the statistical function is located in a preset white list data table according to a table name of the data table related to the statistical function and a preset black list data table, where the preset black list data table includes a table name of a data table whose data amount in the data table is greater than or equal to a preset data amount; and if the table name of the data table related to the statistical function is not in the table names of the preset blacklist data table, the data table related to the statistical function is located in the preset white list data table.
Fig. 7 is a schematic view of an apparatus according to another embodiment of the present application. The apparatus shown in fig. 7 may be used to perform the method of any of the previous embodiments.
As shown in fig. 7, the apparatus 700 of the present embodiment includes: memory 701, processor 702, communication interface 703, and bus 704. The memory 701, the processor 702, and the communication interface 703 are communicatively connected to each other via a bus 704.
The memory 701 may be a Read Only Memory (ROM), a static memory device, a dynamic memory device, or a Random Access Memory (RAM). The memory 701 may store a program, and the processor 702 is configured to perform the steps of the method shown in the above-described embodiments when the program stored in the memory 701 is executed by the processor 702.
The processor 702 may be a general-purpose Central Processing Unit (CPU), a microprocessor, an Application Specific Integrated Circuit (ASIC), or one or more integrated circuits, and is configured to execute related programs to implement the methods shown in the embodiments of the present application.
The processor 702 may also be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the method of the embodiment of the present application may be implemented by integrated logic circuits of hardware in the processor 702 or instructions in the form of software.
The processor 702 may also be a general purpose processor, a Digital Signal Processor (DSP), an ASIC, an FPGA (field programmable gate array) or other programmable logic device, discrete gate or transistor logic device, or discrete hardware components. The various methods, steps and logic blocks disclosed in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The steps of the method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in the memory 701, and the processor 702 reads the information in the memory 701, and completes the functions required to be performed by the units included in the apparatus in the present application in combination with the hardware thereof.
The communication interface 703 may enable communication between the apparatus 700 and other devices or communication networks using, but not limited to, transceiver devices.
Bus 704 may include a pathway to transfer information between various components of apparatus 700 (e.g., memory 701, processor 702, communication interface 703).
It should be understood that the apparatus 700 shown in the embodiment of the present application may be an electronic device, or may also be a chip configured in the electronic device.
The data processing method and apparatus in the present application may be used in the field of big data technology, and may also be used in any field other than the field of big data technology. The application of the data processing method and the data processing device is not limited in the application field.
It should be understood that the term "and/or" herein is merely one type of association relationship that describes an associated object, meaning that three relationships may exist, e.g., a and/or B may mean: a exists alone, A and B exist simultaneously, and B exists alone, wherein A and B can be singular or plural. In addition, the "/" in this document generally indicates that the former and latter associated objects are in an "or" relationship, but may also indicate an "and/or" relationship, which may be understood with particular reference to the former and latter text.
In the present application, "at least one" means one or more, "a plurality" means two or more. "at least one of the following" or similar expressions refer to any combination of these items, including any combination of the singular or plural items. For example, at least one (one) of a, b, or c, may represent: a, b, c, a-b, a-c, b-c or a-b-c, wherein a, b and c can be single or multiple.
It should be understood that, in the various embodiments of the present application, the sequence numbers of the above-mentioned processes do not mean the execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, a division of a unit is merely a logical division, and an actual implementation may have another division, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one position, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: u disk, removable hard disk, read only memory, random access memory, magnetic or optical disk, etc. for storing program codes.
The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (9)

1. A data processing method is applied to a server and comprises the following steps:
acquiring a Structured Query Language (SQL) statement;
if a statistical function exists in the SQL statement and a data table related to the statistical function is not a preset white list data table, performing optimization processing on the SQL statement, wherein the optimization processing is used for reducing the influence of the execution of the SQL statement on the response speed of a database;
and if the SQL statement has a statistical function and the data table related to the statistical function is a preset white list data table, executing the SQL statement.
2. The method according to claim 1, wherein the optimizing the SQL statement comprises:
determining whether a duplicate removal clause exists in the SQL statement, wherein the duplicate removal clause comprises a column name to be inquired;
and if the duplication removing clause exists in the SQL sentence, replacing the duplication removing clause with a grouping statistical data clause, wherein the grouping statistical data clause is used for grouping fields corresponding to the column names to be inquired so as to return non-repeated fields corresponding to the column names to be inquired.
3. The method according to claim 1, wherein the optimizing the SQL statement comprises: the SQL statement is not executed.
4. The method according to any one of claims 1 to 3, wherein the preset white list data table is used for indicating a data table in which the data amount is smaller than or equal to the preset data amount.
5. The method according to any one of claims 1 to 3, wherein whether the data table to which the statistical function relates is a preset white list data table comprises:
determining whether the data table related to the statistical function is located in the preset white list data table or not according to the table name of the data table related to the statistical function and the preset white list data table, wherein the preset white list data table comprises the table name of the data table of which the data amount is smaller than or equal to the preset data amount;
and if the table name of the data table related to the statistical function is in the table names of the preset white list data table, the data table related to the statistical function is positioned in the preset white list data table.
6. The method according to any one of claims 1 to 3, wherein whether the data table to which the statistical function relates is a preset white list data table comprises:
determining whether the data table related to the statistical function is located in the preset white list data table or not according to the table name of the data table related to the statistical function and a preset black list data table, wherein the preset black list data table comprises the table name of the data table of which the data amount is larger than or equal to the preset data amount;
and if the table name of the data table related to the statistical function is not in the table names of the preset blacklist data table, the data table related to the statistical function is positioned in the preset white list data table.
7. An apparatus for processing data, comprising:
the acquisition module is used for acquiring a Structured Query Language (SQL) statement;
the processing module is used for optimizing the SQL statement if a statistical function exists in the SQL statement and a data table related to the statistical function is not a preset white list data table, and the optimization is used for reducing the influence of the execution of the SQL statement on the response speed of a database;
the processing module is further configured to execute the SQL statement if a statistical function exists in the SQL statement and a data table related to the statistical function is a preset white list data table.
8. A server, comprising: a memory and a processor;
the memory is to store program instructions;
the processor is configured to invoke program instructions in the memory to perform the method of any of claims 1 to 6.
9. A computer-readable storage medium having computer-executable instructions stored therein, which when executed by a processor, are configured to implement the method of any one of claims 1 to 6.
CN202210352919.1A 2022-04-06 2022-04-06 Data processing method and device Active CN114428789B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210352919.1A CN114428789B (en) 2022-04-06 2022-04-06 Data processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210352919.1A CN114428789B (en) 2022-04-06 2022-04-06 Data processing method and device

Publications (2)

Publication Number Publication Date
CN114428789A true CN114428789A (en) 2022-05-03
CN114428789B CN114428789B (en) 2022-07-08

Family

ID=81314471

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210352919.1A Active CN114428789B (en) 2022-04-06 2022-04-06 Data processing method and device

Country Status (1)

Country Link
CN (1) CN114428789B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6356888B1 (en) * 1999-06-18 2002-03-12 International Business Machines Corporation Utilize encoded vector indexes for distinct processing
US20050138001A1 (en) * 2003-12-23 2005-06-23 Siebel Systems, Inc. Optimization for aggregate navigation for distinct count metrics
CN103514166A (en) * 2012-06-15 2014-01-15 阿里巴巴集团控股有限公司 Method and device for SQL inspection
CN107180053A (en) * 2016-03-11 2017-09-19 中国移动通信集团河北有限公司 A kind of data warehouse optimization method and device
CN108628975A (en) * 2018-04-25 2018-10-09 上海达梦数据库有限公司 A kind of database SQL optimization and device
CN111666424A (en) * 2020-06-10 2020-09-15 深圳开思时代科技有限公司 Steam part inquiry and quotation optimization method based on knowledge representation learning
CN111708805A (en) * 2020-06-18 2020-09-25 腾讯科技(深圳)有限公司 Data query method and device, electronic equipment and storage medium
CN113282574A (en) * 2021-07-26 2021-08-20 云和恩墨(北京)信息技术有限公司 SQL optimization-based database operation control method, system and storage medium

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6356888B1 (en) * 1999-06-18 2002-03-12 International Business Machines Corporation Utilize encoded vector indexes for distinct processing
US20050138001A1 (en) * 2003-12-23 2005-06-23 Siebel Systems, Inc. Optimization for aggregate navigation for distinct count metrics
CN103514166A (en) * 2012-06-15 2014-01-15 阿里巴巴集团控股有限公司 Method and device for SQL inspection
CN107180053A (en) * 2016-03-11 2017-09-19 中国移动通信集团河北有限公司 A kind of data warehouse optimization method and device
CN108628975A (en) * 2018-04-25 2018-10-09 上海达梦数据库有限公司 A kind of database SQL optimization and device
CN111666424A (en) * 2020-06-10 2020-09-15 深圳开思时代科技有限公司 Steam part inquiry and quotation optimization method based on knowledge representation learning
CN111708805A (en) * 2020-06-18 2020-09-25 腾讯科技(深圳)有限公司 Data query method and device, electronic equipment and storage medium
CN113282574A (en) * 2021-07-26 2021-08-20 云和恩墨(北京)信息技术有限公司 SQL optimization-based database operation control method, system and storage medium

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
FAST DETECTION OF ACCESS CONTROL VULNERABILITIES IN PHP APPLICAT: "Fast Detection of Access Control Vulnerabilities in PHP Applications", 《2012 19TH WORKING CONFERENCE ON REVERSE ENGINEERING》 *
FAST DETECTION OF ACCESS CONTROL VULNERABILITIES IN PHP APPLICAT: "Fast Detection of Access Control Vulnerabilities in PHP Applications", 《2012 19TH WORKING CONFERENCE ON REVERSE ENGINEERING》, 24 October 2012 (2012-10-24), pages 247 - 248 *
WANGYUN727710: "《https://jingyan.baidu.com/article/a65957f495bfab24e67f9bce.html》", 8 May 2014, pages: 1 - 2 *
刘井莲: "一种基于SQL的属性约简算法", 《科学技术与工程》 *
刘井莲: "一种基于SQL的属性约简算法", 《科学技术与工程》, no. 25, 8 September 2010 (2010-09-08), pages 6292 *

Also Published As

Publication number Publication date
CN114428789B (en) 2022-07-08

Similar Documents

Publication Publication Date Title
CN107451149B (en) Monitoring method and device for flow data query task
US8924373B2 (en) Query plans with parameter markers in place of object identifiers
US11687546B2 (en) Executing conditions with negation operators in analytical databases
US10296497B2 (en) Storing a key value to a deleted row based on key range density
CN111061758B (en) Data storage method, device and storage medium
US11308060B2 (en) Method, apparatus, device and storage medium for managing index
CN112579610A (en) Multi-data source structure analysis method, system, terminal device and storage medium
US20150039625A1 (en) Hierarchical Temporal Event Management
CN111198898B (en) Big data query method and big data query device
CN111258819A (en) Data acquisition method, device and system for MySQL database backup file
CN111078276B (en) Application redundant resource processing method, device, equipment and storage medium
CN110704472A (en) Data query statistical method and device
CN114428789B (en) Data processing method and device
CN114185867A (en) Method and device for confirming data consistency and electronic equipment
CN107633094B (en) Method and device for data retrieval in cluster environment
CN113010539A (en) Data processing method and device
CN111143398B (en) Extra-large set query method and device based on extended SQL function
CN113448965A (en) Method, device and equipment for determining full-table-scanning structured query statement
CN113934430A (en) Data retrieval analysis method and device, electronic equipment and storage medium
CN114281842A (en) Method and device for sub-table query of database
CN114528313A (en) Data processing method and device and electronic equipment
CN112506953A (en) Query method, device and storage medium based on Structured Query Language (SQL)
CN111125147B (en) Extra-large set analysis method and device based on extended pre-calculation model and SQL function
WO2024041221A1 (en) Selection rate estimation method and apparatus
CN111125264B (en) Extra-large set analysis method and device based on extended OLAP model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant