CN114428789A

CN114428789A - Data processing method and device

Info

Publication number: CN114428789A
Application number: CN202210352919.1A
Authority: CN
Inventors: 黄萌; 林丹; 阳兵; 阳万里
Original assignee: Industrial and Commercial Bank of China Ltd ICBC
Current assignee: Industrial and Commercial Bank of China Ltd ICBC
Priority date: 2022-04-06
Filing date: 2022-04-06
Publication date: 2022-05-03
Anticipated expiration: 2042-04-06
Also published as: CN114428789B

Abstract

The present application relates to the field of big data technologies, and in particular, to a data processing method and apparatus. According to the processing method, a Structured Query Language (SQL) statement is obtained, if a statistical function exists in the SQL statement and a data table related to the statistical function is not a preset white list data table, optimization processing is carried out on the SQL statement, and the optimization processing is used for reducing the influence of execution of the SQL statement on the response speed of a database; and if the SQL statement has the statistical function and the data table related to the statistical function is a preset white list data table, executing the SQL statement. The data processing method provided by the application can simultaneously give consideration to the response speed requirement of the database and the requirement of outputting the quantity of non-repeated results in the designated data table, so that the database resources can be reasonably utilized.

Description

Data processing method and device

Technical Field

The present application relates to the field of big data technologies, and in particular, to a data processing method and apparatus.

Background

With the continuous development of the field of big data technology, Structured Query Language (SQL) is widely used in data processing. SQL is a database query and programming language for accessing data and querying, updating, and managing relational database systems; and is also an extension of the database script file.

In the process of data query through SQL statement, the number of non-repeated results satisfying the query condition in the specified data table needs to be counted sometimes. One commonly used method for requesting statistics on the number of non-duplicate results satisfying the query condition in a given data table is as follows: this is achieved by a SQL statement that contains a function for requesting the return of a statistical quantity and uses a key for requesting the return of a uniquely different value. For example, a "count (distinguct y from a)" clause is included in the SQL statement to request to return the number of data whose y columns are not repeated in table a, where a represents the data table to be queried, y represents the column to be queried, "distinguct" represents a key for requesting to return a unique different value, and "count ()" represents a function for requesting to return a statistical number.

However, when the database counts the number of non-repetitive results satisfying the query condition in the specified data table based on the SQL statement, a large amount of database resources are usually consumed, the response speed of the database is affected, and the application program using the database is very unsmooth.

In order to solve the above problems, the following methods are proposed in the art: that is, whether a function for requesting to return the statistical quantity exists in the SQL statement is judged, if not, the SQL statement is executed, otherwise, optimization processing is performed on the SQL statement, wherein the optimization processing refers to an operation for reducing the influence of the SQL statement on the response speed of the database. As an example, the optimization process includes not executing the function for requesting a return statistical quantity in the SQL statement.

For example, it may be determined whether the SQL statement includes the statistical function "count ()", if not, the SQL statement is normally executed, otherwise, the SQL statement is optimized, for example, the statistical function "count ()" in the SQL statement is not executed.

However, the above method causes the following problems: resulting in some scenarios where it is necessary to output a number of non-duplicate results in a given data table that satisfy the query criteria, the database cannot satisfy this requirement.

Therefore, how to simultaneously satisfy the requirement of the response speed of the database and the requirement of the quantity of non-repeated results in the output specified data table becomes a technical problem to be solved urgently.

Disclosure of Invention

The data processing method and device provided by the application are used for simultaneously considering the response speed requirement of the database and the requirement of outputting the quantity of non-repeated results in the designated data table.

In a first aspect, the present application provides a data processing method, applied to a server, including: acquiring a Structured Query Language (SQL) statement; if a statistical function exists in the SQL statement and a data table related to the statistical function is not a preset white list data table, performing optimization processing on the SQL statement, wherein the optimization processing is used for reducing the influence of the execution of the SQL statement on the response speed of a database; and if the SQL statement has a statistical function and the data table related to the statistical function is a preset white list data table, executing the SQL statement.

In the method, the SQL sentences which have statistical functions and the data tables related to the statistical functions are not preset white list data tables are optimized, so that the consumption of database resources can be reduced, the influence of the execution of the SQL sentences on the response speed of the database is reduced, and the response speed of the database is further improved; and the optimization processing is used for reducing the influence of the execution of the SQL statement on the response speed of the database, and compared with the prior art, the method can avoid the problem that the database cannot meet the number of non-repeated results meeting the query condition in a specified data table, and further achieve the effect of simultaneously considering the response speed requirement of the database and the number requirement of the non-repeated results in the specified data table.

With reference to the first aspect, in a possible implementation manner, the performing optimization processing on the SQL statement includes: determining whether a duplicate removal clause exists in the SQL sentence or not, wherein the duplicate removal clause comprises a column name to be inquired; and if the duplication removing clause exists in the SQL sentence, replacing the duplication removing clause with a grouping statistical data clause, wherein the grouping statistical data clause is used for grouping fields corresponding to the column names to be inquired so as to return non-repeated fields corresponding to the column names to be inquired.

In the implementation mode, when the duplication removing clause exists in the SQL statement, the duplication removing clause in the SQL statement is replaced by the grouping statistical data clause for optimization, so that the influence of the execution of the SQL statement on the response speed of the database is reduced.

With reference to the first aspect, in a possible implementation manner, the performing optimization processing on the SQL statement includes: the SQL statement is not executed.

In the implementation manner, the SQL statement may not be executed to perform the optimization processing on the SQL statement, so as to reduce the consumption of database resources and reduce the influence of the execution of the SQL statement on the response speed of the database.

With reference to the first aspect, in a possible implementation manner, the preset white list data table is used to indicate a data table in which a data amount in the data table is smaller than or equal to a preset data amount.

In the implementation mode, the data volume of the data table in the preset white list data table is smaller than or equal to the preset data volume, so that the influence on the response speed of the database is small, the direct execution can be realized, and the database resource is reasonably utilized.

With reference to the first aspect, in a possible implementation manner, whether the data table related to the statistical function is a preset white list data table includes: determining whether the data table related to the statistical function is located in the preset white list data table or not according to the table name of the data table related to the statistical function and the preset white list data table, wherein the preset white list data table comprises the table name of the data table of which the data amount is smaller than or equal to the preset data amount; and if the table name of the data table related to the statistical function is in the table names of the preset white list data table, the data table related to the statistical function is positioned in the preset white list data table.

In the implementation mode, whether the data table related to the statistical function is located in the preset white list data table or not is determined according to the table name of the data table related to the statistical function and the table name in the preset white list data table, the determined result is accurate and reliable, next optimization processing or direct operation execution is facilitated, and the response speed of the database is further improved.

With reference to the first aspect, in a possible implementation manner, whether the data table related to the statistical function is a preset white list data table includes: determining whether the data table related to the statistical function is located in the preset white list data table or not according to the table name of the data table related to the statistical function and a preset black list data table, wherein the preset black list data table comprises the table name of the data table of which the data amount in the data table is larger than or equal to the preset data amount; and if the table name of the data table related to the statistical function is not in the table names of the preset blacklist data table, the data table related to the statistical function is positioned in the preset white list data table.

In the implementation mode, whether the data table related to the statistical function is located in the preset white list data table or not is determined according to the table name of the data table related to the statistical function and the table name in the preset black list data table, the determined result is accurate and reliable, next optimization processing or direct operation execution is facilitated, and the response speed of the database is further improved.

In a second aspect, the present application provides an apparatus for processing data. The processing means may comprise functional modules for implementing the method of the first aspect or any one of its possible implementations. The technical effects of the processing apparatus can refer to the technical effects brought by the corresponding method, which are not described herein again.

The processing device may include: the acquisition module is used for acquiring a Structured Query Language (SQL) statement; the processing module is used for optimizing the SQL statement if a statistical function exists in the SQL statement and a data table related to the statistical function is not a preset white list data table, and the optimization is used for reducing the influence of the execution of the SQL statement on the response speed of a database; the processing module is further configured to execute the SQL statement if a statistical function exists in the SQL statement and a data table related to the statistical function is a preset white list data table.

With reference to the second aspect, in a possible implementation manner, the processing module is specifically configured to: determining whether a duplicate removal clause exists in the SQL statement, wherein the duplicate removal clause comprises a column name to be inquired; and if the duplication removing clause exists in the SQL sentence, replacing the duplication removing clause with a grouping statistical data clause, wherein the grouping statistical data clause is used for grouping fields corresponding to the column names to be inquired so as to return non-repeated fields corresponding to the column names to be inquired.

With reference to the second aspect, in a possible implementation manner, the processing module is specifically configured to: the SQL statement is not executed.

With reference to the second aspect, in a possible implementation manner, the preset white list data table is used to indicate a data table in which the data amount in the data table is smaller than or equal to a preset data amount.

With reference to the second aspect, in a possible implementation manner, the processing module is further configured to determine whether the data table related to the statistical function is located in the preset white list data table according to the table name of the data table related to the statistical function and the preset white list data table, where the preset white list data table includes the table name of the data table whose data amount in the data table is smaller than or equal to a preset data amount; and if the table name of the data table related to the statistical function is in the table names of the preset white list data table, the data table related to the statistical function is positioned in the preset white list data table.

With reference to the second aspect, in a possible implementation manner, the processing module is further configured to determine whether the data table related to the statistical function is located in the preset white list data table according to a table name of the data table related to the statistical function and a preset black list data table, where the preset black list data table includes a table name of the data table whose data amount in the data table is greater than or equal to a preset data amount; and if the table name of the data table related to the statistical function is not in the table names of the preset blacklist data table, the data table related to the statistical function is positioned in the preset whitelist data table.

In a third aspect, the present application provides a server, comprising: a memory and a processor; the memory is to store program instructions; the processor is configured to invoke program instructions in the memory to perform a method according to the first aspect or any one of its possible implementations.

In a fourth aspect, the present application provides a computer-readable storage medium having stored therein computer-executable instructions for implementing the method according to the first aspect or any one of the possible implementation manners thereof when executed by a processor.

In a fifth aspect, the present application provides a computer program product comprising a computer program which, when executed by a processor, is configured to implement the method according to the first aspect or any one of the possible implementations thereof.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application.

Fig. 1 is a schematic view of an application scenario provided in an embodiment of the present application;

FIG. 2 is a flow chart of a method for processing data according to an embodiment of the present application;

FIG. 3 is a block diagram of a data processing system according to an embodiment of the present application;

FIG. 4 is a flowchart of a processing method of an analysis and judgment module in the data processing system according to an embodiment of the present application;

FIG. 5 is a flowchart of a method for processing an optimization module in a data processing system, according to an embodiment of the present application;

FIG. 6 is a schematic diagram of a data processing apparatus according to an embodiment of the present application;

fig. 7 is a schematic view of an apparatus according to another embodiment of the present application.

With the above figures, there are shown specific embodiments of the present application, which will be described in more detail below. The drawings and written description are not intended to limit the scope of the inventive concepts in any manner, but rather to illustrate the concepts of the application by those skilled in the art with reference to specific embodiments.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the following exemplary examples do not represent all implementations consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present application, as detailed in the appended claims.

A specific application scenario of the present application is shown in fig. 1, and the scenario includes a client 101 and a server 102. The connection between the client 101 and the server 102 may be direct or indirect through wired or wireless communication, and the application is not limited herein.

Optionally, in the field of database technology, the SQL statement is input by the client 101, and the server 102 receives the SQL statement input by the client 101 and executes the statement, so as to implement functions such as data query, data addition, data deletion, and data update.

In the process of performing data query through SQL statements, sometimes it is related to the number of non-repetitive results satisfying query conditions in a specified data table that needs to be counted, and a currently common method for requesting to count the number of non-repetitive results satisfying query conditions in a specified data table is as follows: this is achieved by a SQL statement that contains a function for requesting the return of a statistical quantity and uses a key for requesting the return of a uniquely different value. For example, a "count (distinguct y from a)" clause is included in the SQL statement to request to return the number of data whose y columns are not repeated in table a, where a represents the data table to be queried, y represents the column to be queried, "distinguct" represents a key for requesting to return a unique different value, and "count ()" represents a function for requesting to return a statistical number.

Although the number of non-repeated results satisfying the query condition in the specified data table can be finally counted by using a statistical function such as "count ()", the function scans the data of the full data table during counting, and needs to perform a sorting operation on the basis of scanning the full data table, and a large amount of data completely unrelated to the results are involved in the process. Therefore, when the database counts the number of non-repetitive results satisfying the query condition in the specified data table based on the SQL statement, a large amount of database resources are consumed, the response speed of the database is affected, and the application program using the database is very unsmooth.

For example, it may be determined whether the SQL statement includes a statistical function "count ()", and if not, the SQL statement is normally executed; otherwise, the SQL statement is optimized, for example, the statistical function "count ()" in the SQL statement is not executed.

In view of the above, the present application provides a data processing method, which aims to solve the above technical problems in the prior art.

The following describes the technical solutions of the present application and how to solve the above technical problems with specific examples. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments. Embodiments of the present application will be described below with reference to the accompanying drawings.

With reference to the application scenario shown in fig. 1, fig. 2 is a flowchart of a data processing method provided in an embodiment of the present application, and is applied to the server 102. As shown in fig. 2, the method provided in the embodiment of the present application includes S201, S202, S203, and S204. The individual steps in the method shown in fig. 2 are explained in detail below.

S201, obtaining the SQL statement.

In this step, SQL is a language with a very powerful query function, and as long as there is data in the database, it can be found from the database by an appropriate method.

For example, a user inputs an SQL statement through the client 101 in fig. 1, and the server 102 obtains the SQL statement input by the user and performs lexical and syntactic analysis on the SQL statement, for example, the SQL statement may be segmented and analyzed segment by segment.

S202, if the SQL statement has the statistical function and the data table related to the statistical function is not a preset white list data table, optimizing the SQL statement, wherein the optimizing is used for reducing the influence of the execution of the SQL statement on the response speed of the database.

Alternatively, the technician may set the white list data table in advance based on the requirement as the preset white list data table.

As an example, the preset white list data table may include a data table in which the data amount is less than or equal to the preset data amount. As another example, the preset white list data table may include a data table in which the number of non-duplicated data needs to be output even though the data amount is greater than or equal to the preset data amount.

In this step, in some possible implementation manners, the obtained SQL statement may be analyzed first to obtain an abstract syntax tree of a database node (sqlnode), then all nodes in the abstract syntax tree are traversed, whether a query node (e.g., select node) exists in the SQL statement is searched, if the query node exists, a table name of a data table in the query node is obtained, otherwise, the SQL statement is executed directly.

Optionally, the obtained SQL statements are parsed by a syntax parser supporting syntax extension to support parsing requirements of different syntaxes of multiple data sources. For example, the open-source syntax parsers "antlr" and "javacc" can realize multi-data-source syntax support by expanding syntax and rule files; the open source data analysis and storage system (Druid) has built-in programmatic parser.

Meanwhile, whether a statistical function (such as "count ()") exists in the SQL statement is searched, if so, whether a data table related to the statistical function is located in a preset white list data table is further judged, and if not, the SQL statement is optimized so as to reduce the influence of the execution of the SQL statement on the response speed of the database.

Illustratively, looking for the existence of a statistical function (e.g., "count ()") in an SQL statement, all "function nodes" in all query nodes ("select nodes") can be recursively checked, because the "count ()" function follows the "select nodes" and is a "function node").

One way to determine whether the data table related to the statistical function is located in the preset white list data table can be implemented as follows: and determining whether the data table related to the statistical function is located in the preset white list data table or not according to the table name of the data table related to the statistical function and the preset white list data table, and if the table name of the data table related to the statistical function is in the table names of the preset white list data table, locating the data table related to the statistical function in the preset white list data table.

Another implementation manner for judging whether the data table related to the statistical function is located in the preset white list data table is as follows: determining whether the data table related to the statistical function is located in a preset white list data table or not according to the table name of the data table related to the statistical function and a preset black list data table, wherein the preset black list data table comprises the table name of the data table of which the data amount in the data table is larger than or equal to the preset data amount; and if the table name of the data table related to the statistical function is not in the table names of the preset blacklist data table, the data table related to the statistical function is located in the preset white list data table.

In the case that a statistical function exists in the SQL statement and the data table related to the statistical function is not a preset white list data table, an implementation manner of performing optimization processing on the SQL statement is as follows: and if the SQL statement has the duplication eliminating clause, replacing the duplication eliminating clause with a grouping statistical data clause, and returning to execute so as to reduce the influence of the execution of the SQL statement on the response speed of the database.

And the grouping statistical data clause is used for grouping the fields corresponding to the column names to be queried so as to return the fields corresponding to the non-repeated column names to be queried.

Illustratively, whether a form of "count (diagnosis XXXX) from table" exists in the SQL statement is judged, if yes, the "count (diagnosis XXXX) from table" is modified into a format of "count (1) from (select XXXXXX from table group by XXXXXX)" by using an optimization template, and then the SQL statement is returned to be executed again. Wherein "XXXX" is the name of the corresponding column in the data table, "distint" is the deduplication key, "1" represents the 1 st column of the data table, and "group by" is the grouping key.

Alternatively, "distint" supports single-column, multi-column deduplication. The single-column duplicate removal mode is simple and easy to understand, namely only 1 identical value is reserved; the multiple columns are deduplicated according to the specified deduplicated column information, that is, only if all the specified column information is the same, the information is regarded as the duplicate information.

Another implementation way of performing optimization processing on the SQL statement is as follows: the SQL statement is not executed, so that the resource consumption of the database can be reduced, and the response speed of the database is improved.

S203, if the SQL statement has the statistical function and the data table related to the statistical function is a preset white list data table, executing the SQL statement.

If the SQL statement has the statistical function and the data table related to the statistical function is the preset white list data table, the data amount in the data table related to the statistical function is smaller than or equal to the preset data amount, and the influence on the response speed of the database is not large, so that the method can be directly executed, and the database resource can be reasonably utilized.

In the embodiment, by acquiring the SQL statement, if a statistical function exists in the SQL statement and a data table related to the statistical function is not a preset white list data table, optimization processing is performed on the SQL statement, and the optimization processing is used for reducing the influence of execution of the SQL statement on the response speed of the database; if the SQL statement has the statistical function and the data table related to the statistical function is the preset white list data table, the SQL statement is executed, the response speed requirement of the database and the requirement of outputting the quantity of non-repeated results in the specified data table can be considered at the same time, and therefore the database resources can be reasonably utilized.

On the basis of the foregoing embodiment, fig. 3 is a schematic structural diagram of a data processing system provided in an embodiment of the present application, and the system is applied to the server 102 in the foregoing embodiment, and the data processing system includes a parsing and determining module 301 and an optimizing module 302.

And the analysis and judgment module 301 is configured to analyze the obtained SQL statements, take the obtained SQL statements to an "sqlnode" syntax tree, traverse all nodes, and screen out the SQL statements that a statistical function exists in the SQL statements and a data table related to the statistical function is not a preset white list data table.

The optimizing module 302 is configured to determine whether the SQL statement of the problem screened by the analyzing and determining module 301 can be optimized, if yes, apply the optimizing template to optimize the SQL statement, otherwise, not execute the SQL statement.

As an example, on the basis of the foregoing embodiments, fig. 4 is a flowchart of a processing method of the parsing and determining module in the data processing system according to an embodiment of the present application. As shown in fig. 4, the processing method includes S401, S402, S403, S404, S405, S406, S407, S408, S409, and S410. The individual steps in the method shown in fig. 4 are explained in detail below.

S401, obtaining the SQL statement.

In this step, the server obtains the SQL statement input by the user, and the SQL statement may be used to perform a data query operation.

S402, analyzing the SQL statement, if the analysis is successful, executing S404, otherwise, executing S403.

In this step, the SQL statement is parsed by a syntax parser to obtain an abstract syntax tree.

And S403, directly executing the SQL statement.

S404, searching all query nodes, if the query nodes exist, executing S405, otherwise, executing S403.

In this step, a recursive method may be used to traverse all nodes in the abstract syntax tree to find query nodes (e.g., "select nodes").

S405, obtaining the table name in the query node.

S406, judging whether the data table corresponding to the table name is a preset white list data table or not, if so, executing S407, otherwise, executing S408.

And the data volume in the preset white list data table does not reach the data magnitude which influences the performance of the database.

Optionally, whether the data table corresponding to the table name is the preset white list data table is determined by judging whether the table name is the table name in the preset white list data table.

S407, checking the next inquiry node.

S408, searching whether a statistical function exists in the SQL statement, if so, executing S409, otherwise, executing S407.

Optionally, the sequence of the two steps of searching whether the SQL statement has the statistical function and judging whether the data table corresponding to the table name is the preset white list data table may be exchanged, and the sequence of the two steps is not limited in the embodiment of the present application.

And S409, judging whether the optimization template is met, if so, executing S410, otherwise, executing S407.

In this step, it is determined whether a deduplication clause (e.g., "distint XXXX") exists in the SQL statement, and if so, the SQL statement conforms to the optimization template.

S410, adding a specified list.

FIG. 5 is a flowchart of a method for processing an optimization module in a data processing system, according to an embodiment of the present application. As shown in fig. 5, the processing method includes S501, S502, S503, S504, S505, and S506. The individual steps in the method shown in fig. 5 are explained in detail below.

S501, obtaining SQL sentences in the specified list.

S502, the statistical function is recursively checked.

In this step, all the "function nodes" are recursively checked, and the statistical function statements are screened out.

S503, judging whether the duplicate removal clause exists, if so, executing S505, otherwise, executing S504.

Illustratively, whether a "distinct XXXX" exists in the SQL statement or not is judged, and if the "distinct XXXX" exists, a deduplication clause exists.

S504, the next statistical function is checked if the optimization is impossible.

And S505, replacing the duplicate removal clause with a grouping statistical data clause and outputting the grouping statistical data clause.

Illustratively, in this step, the deduplication clause "count (distinguint XXXX) from table" is modified into the packet statistics clause "count (1) from (select XXXX) from table group by XXXX)" format for output, and then returned for re-execution. Wherein "XXXX" is the name of the corresponding column in the data table, "distint" is the deduplication key, "1" represents the 1 st column of the data table, and "group by" is the grouping key.

S506, the SQL statement is returned to be executed again.

In the embodiment, the influence of the execution of the SQL statement on the response speed of the database can be accurately reduced through the processing methods of the analysis and judgment module and the optimization module, so that the response speed of the database is improved, and the database resources can be reasonably utilized.

On the basis of the above embodiments, fig. 6 shows a device 600 for processing data provided by the embodiments of the present application, where the device 600 includes: an acquisition module 601 and a processing module 602.

The obtaining module 601 is configured to obtain a structured query language SQL statement; the processing module 602 is configured to perform optimization processing on the SQL statement if a statistical function exists in the SQL statement and a data table related to the statistical function is not a preset white list data table, where the optimization processing is used to reduce an influence of execution of the SQL statement on a response speed of the database; the processing module 602 is further configured to execute the SQL statement if a statistical function exists in the SQL statement and a data table related to the statistical function is a preset white list data table.

As an example, the apparatus 600 may be configured to perform the method shown in fig. 2, for example, the obtaining module 601 is configured to perform S201, and the processing module 602 is configured to perform S202 and S203.

In a possible implementation manner, the processing module 602 is specifically configured to: determining whether a duplicate removal clause exists in the SQL sentence, wherein the duplicate removal clause comprises a column name to be inquired; and if the duplication eliminating clause exists in the SQL sentence, replacing the duplication eliminating clause with a grouping statistical data clause, wherein the grouping statistical data clause is used for grouping the fields corresponding to the column names to be inquired so as to return the fields corresponding to the non-repeated column names to be inquired.

In a possible implementation manner, the processing module 602 is specifically configured to: the SQL statement is not executed.

In a possible implementation manner, the preset white list data table is used for indicating a data table in which the data amount in the data table is smaller than or equal to the preset data amount.

In a possible implementation manner, the processing module 602 is further configured to determine whether the data table related to the statistical function is located in a preset white list data table according to a table name of the data table related to the statistical function and the preset white list data table, where the preset white list data table includes a table name of a data table whose data amount in the data table is less than or equal to a preset data amount; and if the table name of the data table related to the statistical function is in the table names of the preset white list data table, the data table related to the statistical function is located in the preset white list data table.

In a possible implementation manner, the processing module 602 is further configured to determine whether the data table related to the statistical function is located in a preset white list data table according to a table name of the data table related to the statistical function and a preset black list data table, where the preset black list data table includes a table name of a data table whose data amount in the data table is greater than or equal to a preset data amount; and if the table name of the data table related to the statistical function is not in the table names of the preset blacklist data table, the data table related to the statistical function is located in the preset white list data table.

Fig. 7 is a schematic view of an apparatus according to another embodiment of the present application. The apparatus shown in fig. 7 may be used to perform the method of any of the previous embodiments.

As shown in fig. 7, the apparatus 700 of the present embodiment includes: memory 701, processor 702, communication interface 703, and bus 704. The memory 701, the processor 702, and the communication interface 703 are communicatively connected to each other via a bus 704.

The memory 701 may be a Read Only Memory (ROM), a static memory device, a dynamic memory device, or a Random Access Memory (RAM). The memory 701 may store a program, and the processor 702 is configured to perform the steps of the method shown in the above-described embodiments when the program stored in the memory 701 is executed by the processor 702.

The processor 702 may be a general-purpose Central Processing Unit (CPU), a microprocessor, an Application Specific Integrated Circuit (ASIC), or one or more integrated circuits, and is configured to execute related programs to implement the methods shown in the embodiments of the present application.

The processor 702 may also be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the method of the embodiment of the present application may be implemented by integrated logic circuits of hardware in the processor 702 or instructions in the form of software.

The processor 702 may also be a general purpose processor, a Digital Signal Processor (DSP), an ASIC, an FPGA (field programmable gate array) or other programmable logic device, discrete gate or transistor logic device, or discrete hardware components. The various methods, steps and logic blocks disclosed in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The steps of the method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in the memory 701, and the processor 702 reads the information in the memory 701, and completes the functions required to be performed by the units included in the apparatus in the present application in combination with the hardware thereof.

The communication interface 703 may enable communication between the apparatus 700 and other devices or communication networks using, but not limited to, transceiver devices.

Bus 704 may include a pathway to transfer information between various components of apparatus 700 (e.g., memory 701, processor 702, communication interface 703).

It should be understood that the apparatus 700 shown in the embodiment of the present application may be an electronic device, or may also be a chip configured in the electronic device.

The data processing method and apparatus in the present application may be used in the field of big data technology, and may also be used in any field other than the field of big data technology. The application of the data processing method and the data processing device is not limited in the application field.

It should be understood that the term "and/or" herein is merely one type of association relationship that describes an associated object, meaning that three relationships may exist, e.g., a and/or B may mean: a exists alone, A and B exist simultaneously, and B exists alone, wherein A and B can be singular or plural. In addition, the "/" in this document generally indicates that the former and latter associated objects are in an "or" relationship, but may also indicate an "and/or" relationship, which may be understood with particular reference to the former and latter text.

In the present application, "at least one" means one or more, "a plurality" means two or more. "at least one of the following" or similar expressions refer to any combination of these items, including any combination of the singular or plural items. For example, at least one (one) of a, b, or c, may represent: a, b, c, a-b, a-c, b-c or a-b-c, wherein a, b and c can be single or multiple.

It should be understood that, in the various embodiments of the present application, the sequence numbers of the above-mentioned processes do not mean the execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, a division of a unit is merely a logical division, and an actual implementation may have another division, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one position, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: u disk, removable hard disk, read only memory, random access memory, magnetic or optical disk, etc. for storing program codes.

The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A data processing method is applied to a server and comprises the following steps:

acquiring a Structured Query Language (SQL) statement;

if a statistical function exists in the SQL statement and a data table related to the statistical function is not a preset white list data table, performing optimization processing on the SQL statement, wherein the optimization processing is used for reducing the influence of the execution of the SQL statement on the response speed of a database;

and if the SQL statement has a statistical function and the data table related to the statistical function is a preset white list data table, executing the SQL statement.

2. The method according to claim 1, wherein the optimizing the SQL statement comprises:

determining whether a duplicate removal clause exists in the SQL statement, wherein the duplicate removal clause comprises a column name to be inquired;

and if the duplication removing clause exists in the SQL sentence, replacing the duplication removing clause with a grouping statistical data clause, wherein the grouping statistical data clause is used for grouping fields corresponding to the column names to be inquired so as to return non-repeated fields corresponding to the column names to be inquired.

3. The method according to claim 1, wherein the optimizing the SQL statement comprises: the SQL statement is not executed.

4. The method according to any one of claims 1 to 3, wherein the preset white list data table is used for indicating a data table in which the data amount is smaller than or equal to the preset data amount.

5. The method according to any one of claims 1 to 3, wherein whether the data table to which the statistical function relates is a preset white list data table comprises:

determining whether the data table related to the statistical function is located in the preset white list data table or not according to the table name of the data table related to the statistical function and the preset white list data table, wherein the preset white list data table comprises the table name of the data table of which the data amount is smaller than or equal to the preset data amount;

and if the table name of the data table related to the statistical function is in the table names of the preset white list data table, the data table related to the statistical function is positioned in the preset white list data table.

6. The method according to any one of claims 1 to 3, wherein whether the data table to which the statistical function relates is a preset white list data table comprises:

determining whether the data table related to the statistical function is located in the preset white list data table or not according to the table name of the data table related to the statistical function and a preset black list data table, wherein the preset black list data table comprises the table name of the data table of which the data amount is larger than or equal to the preset data amount;

and if the table name of the data table related to the statistical function is not in the table names of the preset blacklist data table, the data table related to the statistical function is positioned in the preset white list data table.

7. An apparatus for processing data, comprising:

the acquisition module is used for acquiring a Structured Query Language (SQL) statement;

the processing module is used for optimizing the SQL statement if a statistical function exists in the SQL statement and a data table related to the statistical function is not a preset white list data table, and the optimization is used for reducing the influence of the execution of the SQL statement on the response speed of a database;

the processing module is further configured to execute the SQL statement if a statistical function exists in the SQL statement and a data table related to the statistical function is a preset white list data table.

8. A server, comprising: a memory and a processor;

the memory is to store program instructions;

the processor is configured to invoke program instructions in the memory to perform the method of any of claims 1 to 6.

9. A computer-readable storage medium having computer-executable instructions stored therein, which when executed by a processor, are configured to implement the method of any one of claims 1 to 6.