CN114153867A - Data grouping method and device, electronic equipment and storage medium - Google Patents

Data grouping method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN114153867A
CN114153867A CN202111433522.7A CN202111433522A CN114153867A CN 114153867 A CN114153867 A CN 114153867A CN 202111433522 A CN202111433522 A CN 202111433522A CN 114153867 A CN114153867 A CN 114153867A
Authority
CN
China
Prior art keywords
grouping
columns
column
data
grouped
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111433522.7A
Other languages
Chinese (zh)
Inventor
扈天阳
朱仲颖
韩朱忠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Dameng Database Co Ltd
Original Assignee
Shanghai Dameng Database Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Dameng Database Co Ltd filed Critical Shanghai Dameng Database Co Ltd
Priority to CN202111433522.7A priority Critical patent/CN114153867A/en
Publication of CN114153867A publication Critical patent/CN114153867A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2433Query languages
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention discloses a data grouping method, a data grouping device, electronic equipment and a storage medium. The method comprises the following steps: acquiring at least one grouping column of a grouping clause in a structured query statement; rejecting the grouped columns with the same grouping effect according to the characteristic columns in each grouped column; and realizing the data grouping corresponding to the structured query statement according to the removed grouping columns. The embodiment of the invention uses the removed grouping columns for grouping, reduces the number of the grouping columns, can reduce the data amount of calculation and comparison, and improves the speed of grouping calculation, thereby improving the execution efficiency of grouping statements.

Description

Data grouping method and device, electronic equipment and storage medium
Technical Field
The embodiment of the invention relates to the technical field of databases, in particular to a data grouping method, a data grouping device, electronic equipment and a storage medium.
Background
Structured Query Language (SQL Language) in a Structured database is the most important and most common database operation Language, and GROUP BY clauses are used for grouping data in the SQL Language. At present, in order to implement an SQL query statement including a GROUP BY clause, data in a query table needs to be fetched, and a grouping result is obtained according to each of a plurality of grouping items specified in the GROUP BY clause. However, in some cases, grouping can be completed by only calculating one or a few of the designated grouping items, and the same grouping result is obtained. There is room for improvement in the execution efficiency of grouped statements in existing databases.
Disclosure of Invention
Embodiments of the present invention provide a data grouping method, an apparatus, an electronic device, and a storage medium, so as to implement execution efficiency of grouping statements, improve data processing speed, and reduce performance overhead of a database.
In a first aspect, an embodiment of the present invention provides a data grouping method, where the method includes:
acquiring at least one grouping column of a grouping clause in a structured query statement;
rejecting the grouped columns with the same grouping effect according to the characteristic columns in each grouped column;
and realizing the data grouping corresponding to the structured query statement according to the removed grouping columns.
In a second aspect, an embodiment of the present invention further provides a data grouping apparatus, where the apparatus includes:
the group acquisition module is used for acquiring at least one group column of the group clauses in the structured query statement;
the grouping and rejecting module is used for rejecting the grouping columns with the same grouping effect according to the characteristic columns in each grouping column;
and the grouping execution module is used for realizing the data grouping corresponding to the structured query statement according to the removed grouping columns.
In a third aspect, an embodiment of the present invention further provides an electronic device, where the electronic device includes:
one or more processors;
a storage device for storing one or more programs,
when executed by the one or more processors, cause the one or more processors to implement a data grouping method as in any of the embodiments of the present invention.
In a fourth aspect, the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the data grouping method according to any one of the embodiments of the present invention.
According to the technical scheme of the embodiment of the invention, at least one grouping column of the grouping clauses in the structured query sentence is obtained, the grouping columns with the same grouping effect are removed according to the characteristic columns in the grouping columns, and the data grouping corresponding to the structured query sentence is realized according to the removed grouping columns. According to the technical scheme, the packet columns after being removed are used for grouping, the number of the packet columns is reduced, the data amount of calculation and comparison can be reduced, the speed of grouping calculation is improved, and therefore the execution efficiency of the packet statements is improved.
Drawings
Fig. 1 is a flowchart of a data grouping method according to an embodiment of the present invention;
fig. 2 is a flowchart of another data grouping method according to a second embodiment of the present invention;
fig. 3 is a schematic structural diagram of a data grouping apparatus according to a third embodiment of the present invention;
fig. 4 is a schematic structural diagram of an electronic device according to a fourth embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be noted that, for convenience of description, only a part of the structures related to the present invention, not all of the structures, are shown in the drawings, and furthermore, embodiments of the present invention and features of the embodiments may be combined with each other without conflict.
Example one
Fig. 1 is a flowchart of a data grouping method according to an embodiment of the present invention, where the embodiment of the present invention is applicable to a data grouping situation, for example, a GROUP BY clause including a grouping column is used in an SQL statement to implement a data grouping, and the method may be executed BY a data grouping apparatus, and the apparatus may be implemented in a hardware and/or software manner, referring to fig. 1, where the method according to the embodiment of the present invention specifically includes the following steps:
s110, at least one grouping column of the grouping clauses in the structured query statement is obtained.
The structured query statement is a language that operates on the database, and for example, the structured query statement may be an SQL statement, and may be used to access and process the database, including data insertion, query, update, and deletion. The grouping clause in the structured query statement may be used to GROUP data, for example, the grouping clause may be a GROUP BY statement, the GROUP BY statement may include one or more grouping columns, data may be grouped according to the one or more grouping columns BY using the GROUP BY statement, and the GROUP BY statement may be used to summarize information for each GROUP if combined with an aggregation function in SQL. Wherein the aggregation function can be used for statistics, summation, and minimization, etc. For example, grouping clauses used in conjunction with aggregation functions may be used to count the recorded information for each group. The grouping columns can be field names of data tables in the database, and the grouping clauses are combined with the grouping columns, so that grouping can be realized according to the grouping columns.
It should be noted that, for grouping data, in the grouping clause of the structured query statement, one field or multiple fields in the target data table need to be specified as a corresponding grouping column in advance according to the fields in the target data table, so as to meet the syntax specification of the structured query statement, and thus the program can normally run. Of course, the number of the target data tables may be one or more, depending on the actual situation.
In the embodiment of the invention, when the structured query statement comprises the grouping clause, one or more grouping columns in the grouping clause can be traversed one by one, and each grouping column can be obtained in the traversing process and can be stored in an array or a linked list. To facilitate subsequent operations on the grouped columns stored in the array or linked list.
And S120, eliminating the grouped columns with the same grouping effect according to the characteristic columns in the grouped columns.
Wherein a feature column may be a group column that can represent or identify other group columns. Since the feature column may represent other grouping columns, and data of the feature column and other identified grouping columns may be found in the target data table only through the feature column, data found in the data table only through the feature column and data found in common through the feature column and other grouping columns are the same, and thus the grouping result is the same, this case may be considered as the same grouping effect, and other identified grouping columns may be considered as grouping columns having the same grouping effect. And the elimination can be regarded as setting an ignore identifier or a covering identifier for the grouping columns with the same grouping effect, for example, in the actual grouping calculation process, the grouping columns with the ignore identifier or the covering identifier are not involved in the calculation and comparison.
The embodiment of the invention can reserve the characteristic column in the memory when the characteristic column capable of identifying other packet columns is identified in the process of traversing the packet columns in the packet clause, can remove the identified packet columns, and can not reserve the identified packet columns in the memory, so as to ensure that the data of calculation and comparison are reduced in the subsequent packet calculation process and improve the execution efficiency of the packet statements.
And S130, realizing data grouping corresponding to the structured query statement according to the removed grouping columns.
According to the embodiment of the invention, one or more data results of the data query can be grouped after the corresponding data query is carried out on the target data table by using the structured query statement, the calculation can be carried out according to each grouping column stored in the memory in the grouping process, and the grouping can be carried out according to each grouping column after being removed, so that the grouping calculation amount is reduced, and the grouping efficiency is improved.
According to the technical scheme of the embodiment of the invention, at least one grouping column of the grouping clauses in the structured query sentence is obtained, the grouping columns with the same grouping effect are removed according to the characteristic columns in the grouping columns, and the data grouping corresponding to the structured query sentence is realized according to the removed grouping columns. According to the technical scheme, the method for realizing the data grouping corresponding to the structured query statement only according to the removed grouping columns is adopted, the number of the grouping columns participating in calculation is reduced, the calculation amount of data processing can be reduced in the actual calculation process, the data processing speed is increased, and therefore the execution efficiency of the grouping statement is improved.
Example two
Fig. 2 is a flowchart of another data grouping method provided in the second embodiment of the present invention, and the second embodiment of the present invention is embodied on the basis of the foregoing second embodiment of the present invention, and referring to fig. 2, the method provided in the second embodiment of the present invention specifically includes the following steps:
s210, processing the structured query statement to obtain a statement memory structure.
The statement memory structure may be regarded as a data structure in which the structured statement is stored in the memory. For example, data structures such as hash tables, trees, linked lists, etc. may be included. The processing of the structured query statement may be lexical analysis, syntactic analysis, and semantic analysis.
The embodiment of the invention can process the structured query statement input by the user to meet the specification of the structured query language, ensure the normal operation of a program, execute the structured query statement after processing the structured query statement, thereby generating the statement memory structure of the structured query statement, and store the corresponding structured query statement in the statement memory structure.
Further, on the basis of the above-described embodiment of the present invention, the processing includes lexical analysis and syntactic analysis.
In the embodiment of the present invention, the processing may be regarded as parsing the lexical and grammatical structures of the structured query statement, wherein the lexical parsing may be parsing a quantized nonsense character stream, and translating the character stream into discrete character groups including identification identifiers, keywords, symbols, and operators for grammatical analysis. And the parsing organizes the received character set and converts it into a sequence allowed by the structured query language grammar definition. The structured query statement can be effectively ensured to conform to the specification of the structured query language through lexical analysis and syntactic analysis so as to ensure the correctness of subsequent operations.
And S220, extracting the grouping columns in the grouping clauses when the grouping clauses exist in the statement memory structure.
The embodiment of the invention can judge whether the statement memory structure comprises the grouping clause, and can extract the grouping row in the grouping clause if the statement memory structure comprises the grouping clause. For example, if the number of packet columns in a packet clause exceeds 1, traversal can be performed one by one, the packet columns are extracted in the traversal process, and the packet columns can be extracted into the memory for storage; if the number of the grouping columns in the grouping clause is equal to 1, the grouping columns in the grouping clause can be directly extracted, and the grouping columns can be extracted into the memory for storage.
And S230, determining the characteristic columns in each grouping column, wherein the characteristic column conditions meet the characteristic column conditions, and the characteristic column conditions comprise that the characteristic columns belong to non-empty columns and have UNIQUE constraints or belong to primary keys.
Wherein uniqueness constraints (UNIQUEs) may be used to ensure that each value in a specified column is UNIQUE, or that multiple columns are UNIQUE, but allow columns to be empty; the primary key (PRIMARY KEY) can be used to uniquely identify each record in the data table, the corresponding row of data can be accurately located only by the primary key, the column in which the primary key is located has no repeated value in the whole data table, and the column is not allowed to be empty. Therefore, data in the non-empty and grouped column is a unique value, and the grouped column can be considered to be in accordance with the condition of the characteristic column.
In the embodiment of the invention, in the process of extracting the grouping columns of the grouping clauses, whether each grouping column meets the condition of the characteristic column is judged, and when the grouping column is not empty and has UNIQUE constraint or belongs to the primary key constraint, the grouping column is considered as the characteristic column.
S240, according to the condition that each grouping column and the characteristic column belong to the same data table, the grouping columns with the same grouping effect are removed.
According to the embodiment of the invention, after the characteristic column is determined, the data tables of the characteristic column are recorded, the data tables of other grouping columns can be compared with the data tables of the characteristic column one by one, if the data tables of the other grouping columns are the same as the data tables of the characteristic column, the effect of grouping through the characteristic column and the combination of the characteristic column and the other grouping columns can be considered to be the same, the other grouping columns can be eliminated, and the data of the other grouping columns do not participate in comparison and calculation in the grouping calculation process.
And S250, acquiring the data to be grouped corresponding to the structured query statement.
According to the embodiment of the invention, the data to be grouped corresponding to the structured query statement can be obtained by executing the structured query statement. For example, the data to be grouped corresponding to the structured query statement may be acquired according to one or more columns specified in the query clause SELECT statement of the structured query statement.
And S260, grouping the data to be grouped according to the grouped columns after being removed.
According to the embodiment of the invention, after the data to be grouped is acquired, the data to be grouped can be grouped according to one or more group columns after being removed. In the actual calculation process, only the removed grouping columns participate in the calculation, so that the execution efficiency of the grouping statements is improved, and the execution data grouping according to the removed grouping columns and the grouping columns before the removal has the same grouping effect, so that the execution efficiency of the grouping statements is improved, and the grouping accuracy is also ensured.
Further, on the basis of the above embodiment of the present invention, the removing of the grouped columns having the same grouping effect according to the condition that each grouped column and the feature column belong to the same data table includes:
a1, acquiring a first data table to which the characteristic column belongs, and storing the first data table in a linked list.
It should be noted that, when the number of the grouping columns exceeds 1, a linked list needs to be created, nodes in the linked list are used for storing the grouping columns and corresponding data tables, and the linked list is used for storing data, so that the nodes can be dynamically generated in the subsequent execution operation process, and the memory space can be fully utilized. In the embodiment of the invention, after the characteristic column is determined, a corresponding first data table is found in a statement memory structure, the linked list is traversed, whether the first data table is the same as the data table in the linked list or not is judged in the traversing process, if the first data table is the same as the data table in the linked list, no processing is needed, and if the first data table is not the same as the data table in the linked list, a node is newly added to the linked list and is stored in the newly added node in a form corresponding to the characteristic column and the first data table.
b1, extracting a second data table to which the grouping column which is not the characteristic column belongs.
It should be noted that before the second data table is extracted, the chain table needs to be traversed again to find the grouping columns that belong to the same table and can be identified. Specifically, it is determined whether each grouping column exists in the linked list, and if the grouping column does not exist in the linked list, the grouping column may be regarded as a grouping column that is not a feature column.
In the embodiment of the invention, when judging whether each packet column exists in the linked list, the packet column with the non-existent judgment result can be recorded and stored, and the corresponding second data table is extracted from the statement memory structure.
c1, judging whether the second data table is stored in the linked list, if yes, removing the grouped columns, and if not, not processing the grouped columns.
In the embodiment of the invention, the link table is traversed again and judged, whether a second data table exists in the link table is judged, if yes, the second data table and the first data table are the same table, the grouping column corresponding to the second data table can be identified by the characteristic column and is identified column, the grouping column is removed in the actual calculation process, if not, the second data table and the first data table are not the same table, the data table to which the grouping column belongs does not have the characteristic column, the grouping column cannot be identified by other identification columns, and the grouping column is skipped without any processing.
Further, on the basis of the above embodiment of the invention, the removing includes: an ignore flag is set for the column of packets.
In the embodiment of the invention, when judging whether the second data table exists in the linked list and judging that the second data table exists, the group column corresponding to the second data table can be provided with the neglect identification, so that the group column is convenient to be ignored during actual calculation and comparison, and only the group column without the neglect identification is needed to be concerned, thereby effectively reducing the number of the calculation and comparison group columns and improving the execution efficiency of the group statement.
For example, the embodiment of the present invention takes the query of different customer order information and the number of orders as an example. First, there are three predefined data tables, which are respectively a CUSTOMER information table of CUSTOMER, an order information table of ORDERS, and a SHOP information table of SHOP, specifically, the CUSTOMER information table of CUSTOMER is defined as: CREATE TABLE CUSTOMER (ID INT PRIMARY KEY, NAME VARCHAR (20), AGE INT); wherein ID represents the identity ID of the client, PRIMARY KEY represents that ID is the PRIMARY KEY, NAME represents the NAME of the client, and AGE represents the AGE of the client. The ORDERS order information table is defined as: CREATE TABLE ORDERS (ORDERID INT PRIMARY KEY, CUSTOMERID INT, DATE DATE, SHOPIN INT); where ORDERID denotes the order number, PRIMARY KEY denotes ORDERID as the primary key, CUSTOMERID denotes the customer ID, and DATE denotes the order DATE. The SHOP information table of SHOP is defined as: CREATE TABLE SHOP (SHOPIC INT, SHOPINAME VARCHAR (20)); here, SHOPID represents the ID of the store, and shopaname represents the name of the store.
Specifically, for example, the information of orders placed by different customers and the number of orders thereof during the period from 5/1/2021 to 8/1/2021 are queried: the SQL structured query statement is: SELECT ID, NAME, AGE, DATE, order, shift, COUNT (left) FROM custom register, order, shift while ID ═ custom AND order, shift AND DATE ═ DATE '2021-05-01', AND DATE < DATE '2021-08-01' GROUP BY ID, NAME, AGE, DATE, order, shift; the structured query statement comprises a SELECT statement, an aggregation function COUNT and a GROUP BY grouping clause; the total number of 6 grouping columns, wherein the grouping columns ID, NAME and AGE all belong to the same CUSTOMER CUSTOMER information table, the grouping columns DATE and ORDERID belong to the same ORDERS order information table, and the grouping column SHOPINE belongs to the SHOP SHOP information table. The specific data packet may include the following steps:
step 1: and performing lexical and syntactic analysis on the SQL statement to generate a statement memory structure of the SQL.
Step 2: judging whether the SQL statement contains a GROUP BY clause, if not, performing the step 8; and if the GROUP BY clause exists, performing the step 3.
And step 3: if the number of the grouped columns is less than or equal to 1, performing step 8; otherwise, a linked list LST is created, and the step 4 is carried out.
And 4, step 4: traversing each group column in the SQL sentence one by one, judging whether the group column is a characteristic column of the table to which the group column belongs, and if so, performing the step 5; if not, skipping the current column without processing, taking the next packet column, and continuing to perform the step 4; and 6, finishing the traversal, and performing the step 6.
It should be noted that each packet to be traversed is listed as packet column ID, NAME, AGE, DATE, order, and shopename.
And 5: traversing the link table LST, judging whether the table T to which the current grouping column COL belongs already exists in the link table LST, if not, adding a node (the grouping column COL and the table T) on the link table LST, recording the grouping column COL and the table T to which the grouping column COL belongs, and continuing to perform the step 4, wherein the grouping column is called an identification column; if the table T exists, no processing is required, the current packet column is ignored, and the process proceeds to step 4.
It should be noted that, because the ID is the feature column of the CUSTOMER information table of the CUSTOMER, a node needs to be added to store the ID packet column and the CUSTOMER information table of the CUSTOMER, before adding the node, it needs to first determine whether the CUSTOMER information table of the CUSTOMER exists in the linked list LST, and this determination ensures that only one feature column needs to be reserved for one table. Similarly, ORDERID is a feature column of the ORDERS order information table, and also needs to judge and newly add nodes.
Step 6: traversing each group column in the SQL statement, and marking a non-identification column on the data table corresponding to the identification column; the identification column (the packet column on the linked list LST) can uniquely identify other packet columns of the corresponding data table and needs to be reserved for use; traversing each block column of the SQL statement, judging whether the block column COL exists in a link table LST, if so, skipping, and continuing traversing the block column, otherwise, performing the step 7; and (5) finishing traversing and performing step 8.
It should be noted that, the packet column ID, NAME, AGE, DATE, order, and shopename need to be traversed for the second time in order to determine whether the packet column can be identified by the feature column in the linked list LST. For example, when determining an ID column, it is necessary to determine whether the ID column exists in the linked list LST first, and after the ID column exists, this means that the column cannot be ignored, and similarly, the packet column order cannot be ignored. And continuing to judge the packet columns NAME, AGE, DATE and SHOPINME, and if the packet columns NAME, AGE, DATE and SHOPINME are not in the link list LST, entering the step 7.
And 7: judging whether the table T to which the grouping column COL belongs exists in a linked list LST or not; if the link list LST exists, the grouping column COL can be identified by an identification column (other grouping columns with characteristic column attributes), an ignore mark is marked on the non-identification column, the grouping column COL is marked to be ignored in the actual calculation and comparison grouping columns, and the step 6 is continued to traverse the next grouping column; if the list does not exist in the linked list LST, the list T does not have a characteristic column, the column COL cannot be identified by other identification columns, and needs to be reserved for use, the packet column is skipped, and the traversal in the step 6 is continued.
It should be noted that, it is determined whether the CUSTOMER information table of CUSTOMER belonging to the table of the packet columns of NAME and AGE exists in the linked list LST, and it is found that there is a CUSTOMER information table of CUSTOMER in the node (ID packet column, CUSTOMER information table) on the LST, so that if there exists, it is considered that the packet columns of NAME and AGE can be identified by the packet column ID with the characteristic column attribute, and further it is considered that the packet columns of NAME and AGE do not need to participate in the actual packet calculation and comparison. Similarly, the order information table of the table to which the DATE belongs also exists on the linked list LST, so that the ignore mark also needs to be marked on the group column DATE, and the SHOP information table of the table to which the SHOP group column belongs does not exist on the linked list LST, so that the ignore mark does not need to be marked; when the final grouping calculation is compared, only the ID, ORDERID and SHOPINME grouping columns need to be concerned, and other grouping columns have ignore marks.
And 7: and (6) finishing the optimization.
It should be noted that the SELECT statement may include a grouping column, and therefore, when performing optimization analysis, a grouping column that can be identified cannot be directly removed, which does not meet the syntax requirement. Thus, in step 6, a flag is added to the identified packet column, so that the packet column is ignored when actually calculating and comparing packets; and traversing the grouping columns of the SQL statement twice: the first traversal of the steps 4 and 5 is to find out a characteristic column which can identify other grouping columns, and the second traversal of the step 6 is to find out the grouping columns which can belong to a table and can be identified.
EXAMPLE III
Fig. 3 is a schematic structural diagram of a data grouping apparatus provided in a third embodiment of the present invention, which is capable of executing a data grouping method provided in any embodiment of the present invention, and has functional modules and beneficial effects corresponding to the execution method. The device can be implemented by software and/or hardware, and specifically comprises: a packet acquisition module 301, a packet culling module 302 and a packet execution module 303.
A grouping obtaining module 301, configured to obtain at least one grouping column of a grouping clause in a structured query statement;
a grouping and rejecting module 302, configured to reject the grouping columns with the same grouping effect according to the feature columns in each of the grouping columns;
and the grouping execution module 303 is configured to implement data grouping corresponding to the structured query statement according to each of the removed grouping columns.
According to the technical scheme of the embodiment of the invention, at least one grouping column of the grouping clauses in the structured query sentence is obtained through the grouping obtaining module, the grouping columns with the same grouping effect are removed through the grouping removing module according to the characteristic columns in the grouping columns, and the grouping execution module realizes the grouping of the data corresponding to the structured query sentence according to the removed grouping columns. According to the technical scheme, the method for realizing the data grouping corresponding to the structured query statement only according to the removed grouping columns is adopted, the number of the grouping columns participating in calculation is reduced, the calculation amount of data processing can be reduced in the actual calculation process, the data processing speed is increased, and therefore the execution efficiency of the grouping statement is improved.
Further, on the basis of the above embodiment of the present invention, the packet obtaining module 301 in the apparatus includes:
a structure obtaining unit, configured to process the structured query statement to obtain a statement memory structure;
and the grouping extraction unit is used for extracting the grouping columns in the grouping clauses when the grouping clauses exist in the statement memory structure.
Further, on the basis of the above embodiment of the present invention, the processing includes lexical analysis and syntactic analysis.
Further, on the basis of the above embodiment of the present invention, the packet removing module 302 in the apparatus includes:
a characteristic column determining unit, configured to determine the characteristic column in each of the grouped columns, which meets a characteristic column condition, where the characteristic column condition includes belonging to a non-empty column and having a UNIQUE constraint or belonging to a primary key;
and the grouping and rejecting unit is used for rejecting the grouping columns with the same grouping effect according to the condition that each grouping column and the characteristic column belong to the same data table.
Further, on the basis of the above embodiment of the present invention, the packet removing unit is specifically configured to:
acquiring a first data table to which the characteristic column belongs, and storing the first data table in a linked list;
extracting a second data table to which the grouping columns which are not the characteristic columns belong;
and judging whether the second data table is stored in the linked list, if so, rejecting the grouped columns, and if not, not processing the grouped columns.
Further, on the basis of the above embodiment of the invention, the removing includes: and setting an ignore identifier for the grouping column.
Further, on the basis of the above embodiment of the present invention, the packet execution module 303 in the apparatus includes:
the data acquisition unit is used for acquiring the data to be grouped corresponding to the structured query statement;
and the grouping execution unit is used for grouping the data to be grouped according to each grouped column after being removed.
Example four
Fig. 4 is a schematic structural diagram of an electronic device according to a fourth embodiment of the present invention. FIG. 4 illustrates a block diagram of an electronic device 412 suitable for use in implementing embodiments of the present invention. The electronic device 412 shown in fig. 4 is only an example and should not bring any limitations to the functionality and scope of use of the embodiments of the present invention. The device 412 is typically an electronic device that implements the data packet method.
As shown in fig. 4, the electronic device 412 is in the form of a general purpose computing device. The components of the electronic device 412 may include, but are not limited to: one or more processors 416, a storage device 428, and a bus 418 that couples the various system components including the storage device 428 and the processors 416.
Bus 418 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, and a processor or local bus using any of a variety of bus architectures. By way of example, such architectures include, but are not limited to, an Industry Standard Architecture (ISA) bus, a Micro Channel Architecture (MCA) bus, an enhanced ISA bus, a Video Electronics Standards Association (VESA) local bus, and a Peripheral Component Interconnect (PCI) bus.
Electronic device 412 typically includes a variety of computer system readable media. Such media may be any available media that is accessible by electronic device 412 and includes both volatile and nonvolatile media, removable and non-removable media.
Storage 428 may include computer system readable media in the form of volatile Memory, such as Random Access Memory (RAM) 440 and/or cache Memory 442. The electronic device 412 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 434 may be used to read from and write to non-removable, nonvolatile magnetic media (not shown in FIG. 4, commonly referred to as a "hard drive"). Although not shown in FIG. 4, a magnetic disk drive for reading from and writing to a removable, nonvolatile magnetic disk (e.g., a "floppy disk") and an optical disk drive for reading from or writing to a removable, nonvolatile optical disk (e.g., a Compact disk-Read Only Memory (CD-ROM), a Digital Video disk (DVD-ROM), or other optical media) may be provided. In these cases, each drive may be connected to bus 418 by one or more data media interfaces. Storage 428 may include at least one program product having a set (e.g., at least one) of program modules that are configured to carry out the functions of embodiments of the invention.
Program 436 having a set (at least one) of program modules 426 may be stored, for example, in storage 428, such program modules 426 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each of which examples or some combination may comprise an implementation of a network environment. Program modules 426 generally perform the functions and/or methodologies of embodiments of the invention as described herein.
The electronic device 412 may also communicate with one or more external devices 414 (e.g., keyboard, pointing device, camera, display 424, etc.), with one or more devices that enable a user to interact with the electronic device 412, and/or with any devices (e.g., network card, modem, etc.) that enable the electronic device 412 to communicate with one or more other computing devices. Such communication may occur via input/output (I/O) interfaces 422. Also, the electronic device 412 may communicate with one or more networks (e.g., a Local Area Network (LAN), Wide Area Network (WAN), and/or a public Network, such as the internet) via the Network adapter 420. As shown, network adapter 420 communicates with the other modules of electronic device 412 over bus 418. It should be appreciated that although not shown in FIG. 4, other hardware and/or software modules may be used in conjunction with the electronic device 412, including but not limited to: microcode, device drivers, Redundant processing units, external disk drive Arrays, disk array (RAID) systems, tape drives, and data backup storage systems, to name a few.
The processor 416 executes programs stored in the storage 428 to perform various functional applications and data processing, such as implementing the data packet methods provided by the above-described embodiments of the present invention.
EXAMPLE five
Embodiments of the present invention provide a computer-readable storage medium having stored thereon a computer program that, when executed by a processing apparatus, implements a data grouping method as in embodiments of the present invention. The computer readable medium of the present invention described above may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.
In some embodiments, the clients, servers may communicate using any currently known or future developed network Protocol, such as HTTP (HyperText Transfer Protocol), and may interconnect with any form or medium of digital data communication (e.g., a communications network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the Internet (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed network.
The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device.
The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: acquiring at least one grouping column of a grouping clause in a structured query statement;
rejecting the grouped columns with the same grouping effect according to the characteristic columns in each grouped column;
and realizing the data grouping corresponding to the structured query statement according to the removed grouping columns.
Computer program code for carrying out operations for the present disclosure may be written in any combination of one or more programming languages, including but not limited to an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in the embodiments of the present disclosure may be implemented by software or hardware. Where the name of an element does not in some cases constitute a limitation on the element itself.
The functions described herein above may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), systems on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), and the like.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims (10)

1. A method of data grouping, the method comprising:
acquiring at least one grouping column of a grouping clause in a structured query statement;
rejecting the grouped columns with the same grouping effect according to the characteristic columns in each grouped column;
and realizing the data grouping corresponding to the structured query statement according to the removed grouping columns.
2. The method of claim 1, wherein obtaining at least one grouped column of grouped clauses in the structured query statement comprises:
processing the structured query statement to obtain a statement memory structure;
and when the grouping clause exists in the statement memory structure, extracting the grouping columns in the grouping clause.
3. The method of claim 2, wherein the processing comprises lexical analysis and syntactic analysis.
4. The method of claim 1, wherein the culling the grouped columns having the same grouping effect according to the characteristic columns in each of the grouped columns comprises:
determining the characteristic column in each grouped column, wherein the characteristic column meets a characteristic column condition, and the characteristic column condition comprises belonging to a non-empty column and having a UNIQUE constraint or belonging to a primary key;
and rejecting the grouped columns with the same grouping effect according to the condition that each grouped column and the characteristic column belong to the same data table.
5. The method of claim 4, wherein said culling the grouped columns having the same grouping effect according to the fact that each of the grouped columns and the feature column belong to the same data table comprises:
acquiring a first data table to which the characteristic column belongs, and storing the first data table in a linked list;
extracting a second data table to which the grouping columns which are not the characteristic columns belong;
and judging whether the second data table is stored in the linked list, if so, rejecting the grouped columns, and if not, not processing the grouped columns.
6. The method of any one of claims 1, 4 or 5, wherein the culling comprises: and setting an ignore identifier for the grouping column.
7. The method of claim 1, wherein the implementing the data grouping corresponding to the structured query statement according to the removed each grouping column comprises:
acquiring data to be grouped corresponding to the structured query statement;
and grouping the data to be grouped according to the grouped columns after being removed.
8. A data packetization apparatus, the apparatus comprising:
the group acquisition module is used for acquiring at least one group column of the group clauses in the structured query statement;
the grouping and rejecting module is used for rejecting the grouping columns with the same grouping effect according to the characteristic columns in each grouping column;
and the grouping execution module is used for realizing the data grouping corresponding to the structured query statement according to the removed grouping columns.
9. An electronic device, characterized in that the electronic device comprises:
one or more processors;
a storage device for storing one or more programs,
when executed by the one or more processors, cause the one or more processors to implement the data packet method of any one of claims 1-7.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the data grouping method according to any one of claims 1 to 7.
CN202111433522.7A 2021-11-29 2021-11-29 Data grouping method and device, electronic equipment and storage medium Pending CN114153867A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111433522.7A CN114153867A (en) 2021-11-29 2021-11-29 Data grouping method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111433522.7A CN114153867A (en) 2021-11-29 2021-11-29 Data grouping method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN114153867A true CN114153867A (en) 2022-03-08

Family

ID=80454366

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111433522.7A Pending CN114153867A (en) 2021-11-29 2021-11-29 Data grouping method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114153867A (en)

Similar Documents

Publication Publication Date Title
KR101231560B1 (en) Method and system for discovery and modification of data clusters and synonyms
US9600507B2 (en) Index structure for a relational database table
US9703830B2 (en) Translation of a SPARQL query to a SQL query
US8606788B2 (en) Dictionary for hierarchical attributes from catalog items
CN113032362B (en) Data blood edge analysis method, device, electronic equipment and storage medium
US20150278268A1 (en) Data encoding and corresponding data structure
CN111008020B (en) Method for analyzing logic expression into general query statement
US10296497B2 (en) Storing a key value to a deleted row based on key range density
US10552394B2 (en) Data storage with improved efficiency
US20080195610A1 (en) Adaptive query expression builder for an on-demand data service
US8756246B2 (en) Method and system for caching lexical mappings for RDF data
CN110008448B (en) Method and device for automatically converting SQL code into Java code
CN107291938A (en) Order Query System and method
CN110147396B (en) Mapping relation generation method and device
CN114153867A (en) Data grouping method and device, electronic equipment and storage medium
CN116489251A (en) Universal code stream analysis method, device, computer readable medium and terminal equipment
CN106682107B (en) Method and device for determining incidence relation of database table
KR102153674B1 (en) A method for classifying sql query, a method for detecting abnormal occurrence, and a computing device
CN110569243B (en) Data query method, data query plug-in and data query server
CN115114297A (en) Data lightweight storage and search method and device, electronic equipment and storage medium
KR102215263B1 (en) A method for classifying sql query, a method for detecting abnormal occurrence, and a computing device
US8136064B2 (en) Bijectively mapping character string to integer values in integrated circuit design data
US11416496B2 (en) Computer implemented method for continuous processing of data-in-motion streams residing in distributed data sources
CN111858587A (en) Database data counting method, device, equipment and storage medium
CN116821135A (en) Full text retrieval processing method and system for database

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination