CN116644103A

CN116644103A - Data sorting method and device, computer equipment and storage medium

Info

Publication number: CN116644103A
Application number: CN202310553264.9A
Authority: CN
Inventors: 杨浩
Original assignee: Primitive Data Beijing Information Technology Co ltd
Current assignee: Primitive Data Beijing Information Technology Co ltd
Priority date: 2023-05-17
Filing date: 2023-05-17
Publication date: 2023-08-25
Anticipated expiration: 2043-05-17
Also published as: CN116644103B

Abstract

The embodiment of the application provides a data sorting method, a data sorting device, computer equipment and a storage medium, and belongs to the technical field of databases. The method comprises the following steps: determining a target data table and sequencing query data according to the sequencing query statement, wherein the sequencing query data comprises a return column field, a first target column field, sequencing type data and a return data amount; when the returned data quantity is smaller than the sorting data quantity threshold, performing tuple extraction on the target data table according to the first target column field, the sorting type data and the returned data quantity to obtain a first tuple; determining at least one second target column field from the return column field and the first target column field; obtaining second column data from the target data table according to the first number data of the first tuple and the second target column field; constructing a first target tuple from the first tuple and the second column data; and performing meta-combination on the first target tuple and obtaining a target sorting data table. The embodiment of the application can improve the data ordering efficiency of the database.

Description

Data sorting method and device, computer equipment and storage medium

Technical Field

The present application relates to the field of database technologies, and in particular, to a data sorting method, a data sorting device, a computer device, and a storage medium.

Background

Currently, in a database sorting query scenario, since a large amount of data is recorded in a data table, sorting is generally performed by using sorting sentences based on LIMIT keywords when sorting is performed for one or more fields. The related art data sorting method generally sorts a large amount of original tuple data by an external sorting method, and performs data merging according to the sorted result to obtain new tuple data, so as to replace the original tuple data with the new tuple data. However, in order to ensure the correspondence between the same-row data and multiple columns in the data table, the related technology needs to perform copy ordering on all of a large amount of data in the ordering stage, which occupies a large amount of memory capacity, and consumes a large amount of computer system resources for memory copying of data in non-ordered column fields, thereby obviously improving the performance overhead of the computer and reducing the ordering efficiency. In addition, when the external ordering method is used for ordering a large amount of original tuple data, a large amount of Input-Output (IO) operations are generated due to the limited memory of the computer, so that the data ordering efficiency of the database is reduced. Therefore, how to improve the data ordering efficiency of the database becomes a technical problem to be solved.

Disclosure of Invention

The embodiment of the application mainly aims to provide a data sorting method, a data sorting device, computer equipment and a storage medium, which can improve the data sorting efficiency of a database.

To achieve the above object, a first aspect of an embodiment of the present application provides a data sorting method, including:

determining a target data table and sequencing query data of the target data table according to a sequencing query statement, wherein the sequencing query data comprises a return column field, a first target column field, sequencing type data and a return data amount, and the first target column field is used for representing a data column to be sequenced;

when the returned data quantity is smaller than a preset sorting data quantity threshold value, performing tuple extraction on the target data table according to the first target column field, the sorting type data and the returned data quantity to obtain a first tuple, wherein the first tuple comprises first column data and first number data of the first column data;

determining at least one second target column field according to the return column field and the first target column field, wherein the second target column field is used for representing a data column which is not ordered;

Performing data extraction on the target data table according to the first number data and all the second target column fields to obtain second column data;

constructing a first target tuple according to the first numbering data, the first column data and the second column data;

and performing tuple combination on the first target tuple to obtain a target sorting data table.

In some embodiments, performing tuple extraction on the target data table according to the first target column field, the ordering type data and the returned data amount to obtain a first tuple, including:

performing meta-structure construction on the target data table according to the first target column field to obtain a second tuple;

performing tuple sequencing on the second tuple according to the sequencing type data to obtain a third tuple;

and performing tuple selection on the third tuple according to the returned data quantity to obtain the first tuple.

In some embodiments, the performing a tuple construction on the target data table according to the first target column field to obtain a second tuple includes:

acquiring a data number column of the target data table;

extracting field data from the target data table according to the first target column field to obtain second column data;

Determining second numbered data according to the second column data and the data numbered column;

and performing tuple construction according to the second column data and the second serial number data to obtain the second tuple.

In some embodiments, the performing tuple ordering on the second tuple according to the ordering type data to obtain a third tuple, including:

according to the ordering type data, carrying out data arrangement on the second column of data to obtain a column of data sequence;

taking the second column data in the column data sequence as third column data, and taking the second number data corresponding to the third column data as third number data;

and performing meta-structure construction according to the third column data and the third serial number data to obtain the third tuple.

In some embodiments, the performing tuple selection on the third tuple according to the returned data amount to obtain the first tuple includes:

data selection is carried out on the third column data according to the returned data quantity, the first column data is obtained, and the third serial number data corresponding to the first column data is used as the first serial number data;

and performing tuple construction according to the first column data and the first serial number data to obtain the first tuple.

In some embodiments, the performing a tuple construction on the target data table according to the first target column field to obtain a second tuple, further includes:

constructing an execution plan tree according to the target data table and the first target column field, wherein the execution plan tree comprises target attribute data which are used for representing node data of the first target column field in the execution plan tree;

and performing meta-structure construction on the target data table according to the target attribute data and the first target column field to obtain a second tuple.

In some embodiments, before said when said amount of returned data is less than a preset ordering data amount threshold, said method further comprises:

when the returned data quantity is greater than or equal to the sorting data quantity threshold value, extracting the tuple from the target data table according to the returned column field to obtain an initial tuple;

performing tuple arrangement on the initial tuple according to the first target column field and the ordering type data to obtain a target tuple table;

performing tuple selection on the target tuple table according to the returned data quantity to obtain a second target tuple;

and performing tuple combination on the second target tuple to obtain the target sorting data table.

To achieve the above object, a second aspect of an embodiment of the present application provides a data sorting apparatus, including:

the data determining module is used for determining a target data table and sequencing query data of the target data table according to a sequencing query statement, wherein the sequencing query data comprises a return column field, a first target column field, sequencing type data and a return data volume, and the first target column field is used for representing a data column to be sequenced;

the tuple extraction module is used for extracting the tuple from the target data table according to the first target column field, the sorting type data and the returned data quantity when the returned data quantity is smaller than a preset sorting data quantity threshold value to obtain a first tuple, wherein the first tuple comprises first column data and first serial number data of the first column data;

a field determining module, configured to determine at least one second target column field according to the returned column field and the first target column field, where the second target column field is used to represent a data column that is not ordered;

the data extraction module is used for carrying out data extraction on the target data table according to the first numbered data and all the second target column fields to obtain second column data;

A tuple construction module, configured to construct a first target tuple according to the first number data, the first column data, and the second column data;

and the tuple combination module is used for performing tuple combination on the first target tuple to obtain a target ordering data table.

To achieve the above object, a third aspect of the embodiments of the present application proposes a computer device, including:

at least one memory;

at least one processor;

at least one computer program;

the at least one computer program is stored in the at least one memory, and the at least one processor executes the at least one computer program to implement the data ordering method of the first aspect described above.

To achieve the above object, a fourth aspect of the embodiments of the present application proposes a computer-readable storage medium storing a computer program for causing a computer to execute the data sorting method according to the above first aspect.

The data sorting method, the data sorting device, the computer equipment and the storage medium firstly determine a target data table and sorting query data of the target data table according to a sorting query statement, wherein the sorting query data comprises a return column field, a first target column field, sorting type data and a return data amount, and the first target column field is used for representing a data column to be sorted. And then, when the returned data quantity is smaller than a preset sorting data quantity threshold value, performing tuple extraction on the target data table according to the first target column field, the sorting type data and the returned data quantity to obtain a first tuple, wherein the first tuple comprises first column data and first number data of the first column data. At least one second target column field is then determined from the return column field and the first target column field, the second target column field representing a data column that is not ordered. And carrying out data extraction on the target data table according to the first number data and all the second target column fields to obtain second column data, and constructing a first target tuple according to the first number data, the first column data and the second column data. And finally, performing tuple combination on the first target tuple to obtain a target ordering data table. The embodiment of the application can improve the data ordering efficiency of the database.

Drawings

FIG. 1 is a first flowchart of a data sorting method according to an embodiment of the present application;

fig. 2 is a flowchart of step S120 in fig. 1;

fig. 3 is a flowchart of step S210 in fig. 2;

fig. 4 is another flowchart of step S210 in fig. 2;

fig. 5 is a flowchart of step S220 in fig. 2;

fig. 6 is a flowchart of step S230 in fig. 2;

FIG. 7 is a schematic diagram of a structure for acquiring data of a first tuple according to an embodiment of the present application;

FIG. 8 is a second flowchart of a data sorting method provided by an embodiment of the present application;

FIG. 9 is a schematic diagram of a data sorting method according to an embodiment of the present application;

FIG. 10 is a schematic diagram of a data sorting apparatus according to an embodiment of the present application;

fig. 11 is a schematic diagram of a hardware structure of a computer device according to an embodiment of the present application.

Detailed Description

The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.

It should be noted that although functional block division is performed in a device diagram and a logic sequence is shown in a flowchart, in some cases, the steps shown or described may be performed in a different order than the block division in the device, or in the flowchart. The terms first, second and the like in the description and in the claims and in the above-described figures, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing embodiments of the application only and is not intended to be limiting of the application.

First, several nouns involved in the present application are parsed:

materialized: the physical transformation is that, for data, a materialized representation transforms the data from some intermediate form to an original, real form.

Delay materialization: the materialization is moved to the end of the query computing life cycle, namely, an intermediate form is used as an optimization process, and the optimization process occupies the whole life cycle as much as possible.

Tuple (tuple): is the basic concept in a relational database, when the relationship is a table, each row in the table (i.e., each record in the database) is a tuple and each column is an attribute. In a two-dimensional table, a tuple, also called a row, is made up of a series of elements ordered in a particular order.

Currently, in a database sorting query scenario, since a large amount of data is recorded in a data table, sorting is generally performed by using sorting sentences based on LIMIT keywords when sorting is performed for one or more fields. The related art data sorting method generally sorts a large amount of original tuple data by an external sorting method, and performs data merging according to the sorted result to obtain new tuple data, so as to replace the original tuple data with the new tuple data. However, in order to ensure the correspondence between the same-row data and multiple columns in the data table, the related technology needs to perform copy ordering on all of a large amount of data in the ordering stage, which occupies a large amount of memory capacity, and consumes a large amount of computer system resources for memory copying of the data of the non-ordered column fields, thereby obviously improving the performance overhead of the computer and reducing the ordering efficiency. In addition, when the external ordering method is used for ordering a large amount of original tuple data, a large amount of Input-Output (IO) operations are generated due to the limited memory of the computer, so that the data ordering efficiency of the database is reduced. Therefore, how to improve the data ordering efficiency of the database becomes a technical problem to be solved.

Based on the above, the embodiment of the application provides a data sorting method, a data sorting device, computer equipment and a storage medium, which can improve the data sorting efficiency of a database.

The data sorting method provided by the embodiment of the application can be applied to a terminal, a server and software running in the terminal or the server. In some embodiments, the terminal may be a smart phone, tablet, notebook, desktop, etc.; the server side can be configured as an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, and a cloud server for providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, content delivery networks (Content Delivery Network, CDN), basic cloud computing services such as big data and artificial intelligent platforms and the like; the software may be an application or the like that implements the data sorting method, but is not limited to the above form.

The application is operational with numerous general purpose or special purpose computer system environments or configurations. For example: personal computers, server computers, hand-held or portable devices, tablet devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network personal computers (Personal Computer, PCs), minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like. The application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

Referring to fig. 1, fig. 1 is an optional flowchart of a data sorting method according to an embodiment of the present application, where the method in fig. 1 may specifically include, but is not limited to, steps S110 to S160, and these six steps are described in detail below in conjunction with fig. 1.

Step S110, determining a target data table and sequencing query data of the target data table according to a sequencing query statement, wherein the sequencing query data comprises a return column field, a first target column field, sequencing type data and a return data amount, and the first target column field is used for representing a data column to be sequenced;

step S120, when the returned data quantity is smaller than a preset sorting data quantity threshold, performing tuple extraction on the target data table according to the first target column field, the sorting type data and the returned data quantity to obtain a first tuple, wherein the first tuple comprises first column data and first number data of the first column data;

step S130, determining at least one second target column field according to the returned column field and the first target column field, wherein the second target column field is used for representing a data column which is not ordered;

step S140, carrying out data extraction on the target data table according to the first numbered data and all second target column fields to obtain second column data;

Step S150, constructing a first target tuple according to the first serial number data, the first column data and the second column data;

step S160, performing tuple combination on the first target tuple to obtain a target sorting data table.

It may be appreciated that in the sequence query scenario of the database in steps S110 to S160 of some embodiments, first, a target data table and sequence query data of the target data table, which need to be subjected to a sequence query, in the database are determined according to a sequence query statement, where the sequence query data includes a return column field, a first target column field, sequence type data, and a return data amount, and the first target column field is used to represent a data column to be sequenced. And then, when the returned data quantity is smaller than a preset sorting data quantity threshold value, performing tuple extraction on the target data table according to the first target column field, the sorting type data and the returned data quantity to obtain a first tuple, wherein the first tuple comprises first column data and first number data of the first column data. At least one second target column field is then determined from the return column field and the first target column field, the second target column field representing a data column that is not ordered. And carrying out data extraction on the target data table according to the first number data and all the second target column fields to obtain second column data, and constructing a first target tuple according to the first number data, the first column data and the second column data. And finally, performing tuple combination on the first target tuple to obtain a target ordering data table. The embodiment of the application can improve the data ordering efficiency of the database.

It should be noted that, the data sorting method provided by the embodiment of the present application may be applied to a client or a server, which is not limited herein.

In step S110 of some embodiments, since the operation object may not need to obtain all the sorting results recorded in the data table when sorting for a certain table field or fields, but only care about the data in a certain top few or a certain limited interval, the return of specific data is typically performed BY using the technique of ORDER LIMIT. The application orders inquiry sentences as sentences input by an operation object to a client or a server according to actual demands, wherein a plurality of original data tables are stored in a database of the client or the server, the database is stored in a storage engine of the client or the server, and the storage engine is a disk or a memory.

It should be noted that, the form of the input sorting query statement may be "SELECT [ return column field ] FROM [ target data table ] ORDER BY [ first target column field ] LIMIT [ return data amount ] [ sorting type data ]", and the sorting query data therein may be flexibly defined and input according to actual needs. Thus, the target data table and the ordered query data of the target data table may be determined from the ordered query statement.

It should be noted that the target data table is used to represent the data table in the database that needs to be ordered and queried. One sort query statement may include at least one target data table, and the data sorting method of the present application is executed according to the sort query data corresponding to each target data table, so as to return the final target sort data table to the client or the server.

It should be noted that one sort query data may include at least one return column field and/or at least one first target column field. The returned column field is used for indicating the column field displayed on the final returned client or server, and the first target column field is used for indicating the column field needing to be ordered. Therefore, the return column field and the first target column field may be flexibly adjusted according to actual needs, which is not specifically limited herein.

It should be noted that, the sort type data is used to indicate the sort manner of the sort query statement on the data, including ascending sort (asc) or descending sort (desc), and when the specific sort type data is not written in the sort query statement, the default sort type is desc. Therefore, the ordering type data can be flexibly adjusted according to actual needs, and is not particularly limited herein.

The amount of returned data is used to represent the total number of rows of the data table displayed on the final returned client or server. The amount of the returned data is a positive integer, for example, may be 100, 1000, etc., and the amount of the returned data may be flexibly adjusted according to actual needs, which is not particularly limited herein.

Illustratively, the ordered query statement entered into the database is "SELECT FROM t ORDER BY C1LIMIT 100". From this statement, it can be determined that the target data table is table t, and it is assumed that column fields C1, C2, C3 exist in table t. The ". Times." indicates that the return column field is all column fields of the target data table, i.e. the final target ordered data table includes three columns of C1, C2 and C3. The first target column field of the sorting inquiry statement is C1, the sorting type data is in a default descending order, and the total number of lines, which is the returned data amount of the finally obtained target sorting data table, is 100. Thus, the meaning of the ordered query statement is: and extracting data respectively corresponding to the column fields C1, C2 and C3 sequenced in the front 100 from the table t according to the descending order of the C1 to obtain a target sequencing data table.

For example, if the input ordered query statement may also be "SELECT C1, C2 FROM t ORDER BY C1LIMIT 100", the meaning of the ordered query statement is: the data corresponding to the column fields C1, C2 ordered in the top 100 are extracted from the table t in descending order of the arrangement result of C1 to obtain a target ordered data table.

In step S120 of some embodiments, in the ranked query scenario of the database, when there are more output fields but fewer ranking fields, i.e., the number of output returned column fields is greater than the number of first target column fields. In order to ensure the correspondence between the data of the same row and the columns, the related technology is usually all field technology of a materialized data table in the sorting stage, namely, all column fields are copied and swapped out, so that a large amount of memory capacity is occupied, and a large amount of computer system resources are consumed for memory copying of the data of the non-sorted column fields, so that the sorting efficiency is reduced. Based on the above, the embodiment of the application reduces the memory usage of the sorting operator and the performance loss caused by the memory copy of the non-sorting column by the technology of delaying the materialization, namely, the column which is not required to be sorted is not materialized in the sorting process of the sorting operator (SORT operator), and the non-sorting sequence is read in the storage engine after the sorting is finished. Specifically, in order to more flexibly select a proper data sorting method, the application firstly carries out numerical comparison on the returned data quantity and a preset sorting data quantity threshold value, and when the returned data quantity is smaller than the preset sorting data quantity threshold value, a mode of delaying materialized sorting is selected.

The sort data amount threshold is used to indicate how to choose the delayed materialized sort when the returned data amount LIMIT in the sort query statement is. The sorting data amount threshold may be set to 1000, 2000, etc., and may be flexibly set according to actual needs, which is not particularly limited herein.

It should be noted that, when the returned data amount is greater than or equal to the preset sorting data amount threshold, another sorting mode is selected, so that the overall sorting efficiency of the database can be effectively improved, and the performance consumption is reduced.

Referring to fig. 2, fig. 2 is an optional flowchart of step S120, and in some embodiments of the present application, step S120 includes, but is not limited to, steps S210 to S230, which are described in detail below in conjunction with fig. 2.

Step S210, performing meta-structure construction on the target data table according to the first target column field to obtain a second tuple;

step S220, performing tuple sequencing on the second tuple according to the sequencing type data to obtain a third tuple;

and step S230, performing tuple selection on the third tuple according to the returned data quantity to obtain a first tuple.

In step S210 of some embodiments, the client or the server in the embodiments of the present application includes an optimizer and an executor when sorting, and when the optimizer determines that the amount of returned data is smaller than a preset sorting data amount threshold, the embodiments of the present application select a delayed materialization manner to sort data. Specifically, the scanning operator of the executor scans only data corresponding to the first target column field from the storage engine storing the target data table to obtain at least one second tuple, and transmits the obtained second tuple to the SORT operator.

Referring to fig. 3, fig. 3 is an optional flowchart of step S210 provided in an embodiment of the present application, and in some embodiments, step S210 may include, but is not limited to, steps S310 to S340, and these four steps are described in detail below in connection with fig. 3.

Step S310, obtaining a data number column of a target data table;

step S320, extracting field data from the target data table according to the first target column field to obtain second column data;

step S330, determining second numbered data according to the second column data and the data numbered columns;

step S340, performing the meta-structure construction according to the second column data and the second number data to obtain a second tuple.

In steps S310 to S340 of some embodiments, in order to accurately return data corresponding to the fields including the ordered sequence and the non-ordered sequence after delaying the ordering, the present application performs data reading according to the data of the ordered first target column field and the preset data number column. The data number column of the application stores at least one initial number data marked on the target data table in advance, and the initial number data and each column data of a column field in the target data table contain a corresponding relation. The initial number data may be a row number (noted RowID) of the target data table, and the initial number data ranges from 0 to a total row number of the target data table; the initial number data may be other number data preset in the target data table, and is not particularly limited herein. Specifically, field data extraction is performed on the target data table according to the first target column field, so as to obtain second column data, wherein the second column data is used for representing the tuple data needing to be ordered. Then, initial number data corresponding to the second column data in the data number column is taken as second number data. Then, a second tuple representing each row of data in the currently extracted second tuple table is obtained from the second column of data and the second number of data, and the tuple form of each second tuple is (second number of data, second column of data). Therefore, when the SORT operator is used for sorting, the columns which do not need sorting are not materialized any more, but are sorted by recording the numbered data, and after the sorting is finished, the non-sorted columns are read in the storage engine again by the numbered data, so that the memory usage amount of the SORT operator and the performance loss caused by non-sorted column memory copying to the system are effectively reduced.

Referring to fig. 4, fig. 4 is another optional flowchart of step S210 provided in an embodiment of the present application, and in some embodiments, step S210 may specifically further include, but is not limited to, steps 410 to S420, which are described in detail below in conjunction with fig. 4.

Step S410, an execution plan tree is constructed according to the target data table and the first target column field, wherein the execution plan tree comprises target attribute data, and the target attribute data is used for representing node data of the first target column field in the execution plan tree;

step S420, performing meta-structure construction on the target data table according to the target attribute data and the first target column field to obtain a second tuple.

In steps S410 to S420 of some embodiments, in order for the client or the server executor to accurately let the scanning operator know the data to be scanned, that is, to accurately mark the first target column field in the target data table and the ordering method to be adopted. Specifically, an execution plan tree is constructed from the target data table and the first target column field, the execution plan tree being used to drive a scan operator to read data from the storage engine. The execution plan tree includes target attribute data for indicating that the first target column field is at the node data of the execution plan tree, and display attribute data for indicating that the return column field is at the node data of the execution plan tree. And then, performing meta-structure construction on the target data table by using a scanning operator of the actuator according to the target attribute data and the first target column field to obtain a second tuple.

The target attribute data and the display attribute data are structures constructed based on nodes (nodes) of two different execution plan trees.

In step S220 of some embodiments, the SORT operator ranks the second tuples according to whether the ranking type data is in an ascending order or a descending order, to obtain a tuple ranking sequence, where the tuple ranking sequence includes at least one third tuple.

Referring to fig. 5, fig. 5 is an optional flowchart of step S220 according to an embodiment of the present application. In some embodiments, step S220 may include, but is not limited to, steps S510 to S530, which are described in detail below in conjunction with fig. 5.

Step S510, data arrangement is carried out on the second column of data according to the ordering type data, and a column of data sequence is obtained;

step S520, taking the second column data in the column data sequence as the third column data and the second number data corresponding to the third column data as the third number data;

in step S530, the tuple is constructed according to the third column data and the third number data, so as to obtain a third tuple.

In steps S510 to S530 of some embodiments, the SORT operator of the executor arranges at least one second column of data to be arranged according to the ordering type data, takes the second column of data in the arranged column of data sequence as a third column of data, and takes the second number data corresponding to the third column of data as a third number data. Therefore, the application arranges the plurality of second tuples according to the second column data to obtain a plurality of third tuples in the third tuple table, the third tuple is used for representing each row of data in the third tuple table currently extracted, and the tuple form of each third tuple is (third number data, third column data). The number of tuples of the second tuple and the third tuple are the same.

In step S230 of some embodiments, after sorting according to the SORT operator is completed, tuple selection is performed from the first row of tuple data in the third tuple table according to the amount of returned data to obtain a first tuple of the amount of returned data.

Referring to fig. 6, fig. 6 is an optional flowchart of step S230 according to an embodiment of the present application. In some embodiments, step S230 may include, but is not limited to, steps S610 to S620, which are described in detail below in conjunction with fig. 6.

Step S610, selecting data of a third column of data according to the returned data quantity to obtain a first column of data, and taking third serial number data corresponding to the first column of data as first serial number data;

in step S620, the first tuple is constructed according to the first column data and the first number data, so as to obtain a first tuple.

In steps S610 and S620 of some embodiments, the present application performs data selection on a plurality of third tuples according to the third column of data, and the obtained first tuple table includes a first tuple of LIMIT number, where the first tuple is used to represent each row of data in the first tuple table after being currently extracted and ordered, and the tuple form of each first tuple is (first number data, first column data).

For example, referring to fig. 7, fig. 7 is a schematic structural diagram of acquiring data of a first tuple according to an embodiment of the application. It is assumed that a table t is stored in a storage engine of the database, and field information of the table t is (C1 int, C2 text, C3 double, C4 float), that is, column fields of four different data types of C1, C2, C3, C4 are stored in the table t, and a data number column is also stored in the table t. Wherein int represents that the data type stored in the C1 column field is an integer type, text represents that the data type stored in the C2 column field is a character string type, double represents that the data type stored in the C3 column field is a double-precision floating point type, and float represents that the data type stored in the C4 column field is a single-precision floating point type. Assume that the client-entered sort query statement is "SELECT C1, C3, C4 FROM t ORDER BY C1 LIMIT 2", and the data number columns are numbered according to the row number of the data table. Therefore, the target data table 710 is configured according to the first target column field C1 to obtain the second tuple table 720, and the second tuple table 720 includes the second column data 721 and the second number data 722 corresponding to the column field C1. Then, the second column data 721 is arranged in a descending order to obtain a third tuple table 730, and the third tuple table 730 includes at least one third tuple, and the third tuple includes third column data 731 and third number data 732. Then, the first two third tuples sorted in the first order are selected from the third tuple table 730 as the first tuple, where the first tuple includes the first column data 741 and the first number data 742, and the first tuple table 740 is obtained.

Referring to fig. 8, fig. 8 is another alternative flowchart of a data sorting method according to an embodiment of the present application. In some embodiments, before step S120, the data sorting method of the present application may specifically further include, but is not limited to, steps S810 to S840, and these four steps are described in detail below in conjunction with fig. 8.

Step S810, when the returned data quantity is larger than or equal to the sorting data quantity threshold value, extracting the tuple from the target data table according to the returned column field to obtain an initial tuple;

step S820, performing tuple arrangement on the initial tuple according to the first target column field and the ordering type data to obtain a target tuple table;

step S830, performing tuple selection on the target tuple table according to the returned data quantity to obtain a second target tuple;

in step S840, the second target tuple is combined to obtain the target sorting data table.

In steps S810 to S840 of some embodiments, the efficiency is improved by adopting the delayed materialization method of the present application when the amount of returned data is large. In order to more flexibly select a proper data sorting method, when the amount of returned data is greater than or equal to the sorting data amount threshold, other sorting methods can be used for sorting data. Specifically, firstly, extracting tuples from the target data table according to the returned column fields to obtain initial tuples, wherein the initial tuples are used for representing data sets corresponding to all returned column fields in the same row in the target data table. Then, the whole initial tuple is subjected to sorting movement, namely, the initial tuple is subjected to tuple arrangement according to the first target column field and the sorting type data, and a target tuple table is obtained. And then, performing tuple selection on the target tuple table according to the returned data quantity to obtain a second target tuple with the returned data quantity. And finally, performing tuple combination on the second target tuple to obtain a target sorting data table.

In step S130 of some embodiments, in order to return the data information corresponding to all the returned column fields, after the first tuple is selected, at least one second target column field is determined according to the returned column field and the first target column field. For example, four column fields of different data types of C1, C2, C3, and C4 are stored in the table t, and the first target column field is C1, then C2, C3, and C4 all belong to the second target column field.

In step S140 of some embodiments, since the returned first tuple eliminates the column data belonging to the first target column field but not meeting the SELECT requirement, the data of the second target column field having the same number data as the first number data in each second target column field is taken as the second column data.

In step S150 and step S160 of some embodiments, after determining that all the data that needs to be returned by the input SELECT statement, a first target tuple is constructed according to the first number data, the first column data, and the second column data, and the first target tuple is combined, so as to determine the target ordered data table. And then, returning the target ordering data table to the client or the server for display.

Referring to fig. 7 and 9, fig. 9 is a schematic diagram illustrating a specific structure of a data sorting method according to an embodiment of the present application. It is assumed that a table t is stored in a storage engine of the database, and field information of the table t is (C1 int, C2 text, C3 double, C4 float). Assume that the client-entered sort query statement is "SELECT C1, C3, C4 FROM t ORDER BY C1 LIMIT 2", and the data number columns are numbered according to the row number of the data table. Specifically, the data sorting method according to the present application performs a sorting query on the target data table 710, determining the first tuple table 740. The first number data of each first tuple data in the first tuple table 740 is numbered according to the initial number data in the data number column to read the second column data corresponding to all the second target column fields from the storage engine 910 according to the executor. Finally, a first target tuple is constructed according to the first number data, the first column data and the second column data, and the first target tuple is combined to obtain a target ordering data table 920. The number of rows of the target sorting data table 920 is equal to the amount of returned data, and the sorting of the numbered data at the time of merging the tuples is not limited to a specific order.

According to the embodiment of the application, through a technology of delaying materialization, namely, columns which are not required to be ordered are not materialized in the ordering process of an ordering operator (SORT operator) of an executor, and non-ordered sequences are read in a storage engine after the ordering is finished, so that the memory usage amount of the ordering operator and the performance loss caused by non-ordered column memory copying are reduced. Therefore, the application can effectively improve the data sorting efficiency of the database under the LIMIT scene, and the improvement of the data sorting efficiency is more obvious when the quantity difference between the quantity of the returned column fields and the quantity of the first target column fields is larger.

Referring to fig. 10, fig. 10 is a schematic structural diagram of a data sorting apparatus according to an embodiment of the present application, where the apparatus may implement the data sorting method according to the above embodiment, and the apparatus includes a data determining module 1010, a tuple extracting module 1020, a field determining module 1030, a data extracting module 1040, a tuple modeling module 1050, and a tuple combining module 1060.

A data determining module 1010, configured to determine, according to the sorting query statement, a target data table and sorting query data of the target data table, where the sorting query data includes a return column field, a first target column field, a sorting type data, and a return data amount, and the first target column field is used to represent a data column to be sorted;

The tuple extraction module 1020 is configured to perform tuple extraction on the target data table according to the first target column field, the sort type data, and the returned data amount when the returned data amount is less than a preset sort data amount threshold, to obtain a first tuple, where the first tuple includes first column data and first number data of the first column data;

a field determining module 1030 configured to determine at least one second target column field according to the returned column field and the first target column field, where the second target column field is used to represent a data column that is not ordered;

the data extraction module 1040 is configured to perform data extraction on the target data table according to the first number data and all the second target column fields, so as to obtain second column data;

a tuple modeling block 1050 for constructing a first target tuple from the first number data, the first column data, and the second column data;

and a tuple combination module 1060, configured to perform tuple combination on the first target tuple to obtain the target ordered data table.

It should be noted that, the data sorting device according to the embodiment of the present application is used to implement the data sorting method according to the foregoing embodiment, and the data sorting device according to the embodiment of the present application corresponds to the foregoing data sorting method, and the specific processing procedure refers to the foregoing data sorting method and is not repeated herein.

The embodiment of the application also provides a computer device, which comprises: at least one memory, at least one processor, at least one computer program stored in the at least one memory, the at least one processor executing the at least one computer program to implement the data ordering method of any of the above embodiments. The computer equipment can be any intelligent terminal including a tablet personal computer, a vehicle-mounted computer and the like.

Referring to fig. 11, fig. 11 illustrates a hardware structure of a computer device according to another embodiment, the computer device includes:

the processor 1110 may be implemented by a general-purpose central processing unit (Central Processing Unit, CPU), a microprocessor, an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), or one or more integrated circuits, etc., for executing relevant programs to implement the technical solutions provided by the embodiments of the present application;

the Memory 1120 may be implemented in the form of a Read Only Memory (ROM), a static storage device, a dynamic storage device, or a random access Memory (Random Access Memory, RAM). Memory 1120 may store an operating system and other application programs, and when the technical solutions provided by the embodiments of the present disclosure are implemented in software or firmware, relevant program codes are stored in memory 1120 and the processor 1110 invokes a data sorting method for performing the embodiments of the present disclosure;

An input/output interface 1130 for implementing information input and output;

the communication interface 1140 is configured to implement communication interaction between the present device and other devices, and may implement communication in a wired manner (e.g. USB, network cable, etc.), or may implement communication in a wireless manner (e.g. mobile network, WIFI, bluetooth, etc.);

a bus 1150 for transferring information between various components of the device (e.g., processor 1110, memory 1120, input/output interface 1130, and communication interface 1140);

wherein processor 1110, memory 1120, input/output interface 1130, and communication interface 1140 implement communication connections among each other within the device via bus 1150.

The embodiment of the application also provides a computer readable storage medium storing a computer program for causing a computer to execute the data sorting method in the above embodiment.

The memory, as a non-transitory computer readable storage medium, may be used to store non-transitory software programs as well as non-transitory computer executable programs. In addition, the memory may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory optionally includes memory remotely located relative to the processor, the remote memory being connectable to the processor through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The embodiments described in the embodiments of the present application are for more clearly describing the technical solutions of the embodiments of the present application, and do not constitute a limitation on the technical solutions provided by the embodiments of the present application, and those skilled in the art can know that, with the evolution of technology and the appearance of new application scenarios, the technical solutions provided by the embodiments of the present application are equally applicable to similar technical problems.

It will be appreciated by persons skilled in the art that the embodiments of the application are not limited by the illustrations, and that more or fewer steps than those shown may be included, or certain steps may be combined, or different steps may be included.

The above described apparatus embodiments are merely illustrative, wherein the units illustrated as separate components may or may not be physically separate, i.e. may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

Those of ordinary skill in the art will appreciate that all or some of the steps of the methods, systems, functional modules/units in the devices disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof.

The terms "first," "second," "third," "fourth," and the like in the description of the application and in the above figures, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the application described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

It should be understood that in the present application, "at least one (item)" means one or more, and "a plurality" means two or more. "and/or" for describing the association relationship of the association object, the representation may have three relationships, for example, "a and/or B" may represent: only a, only B and both a and B are present, wherein a, B may be singular or plural. The character "/" generally indicates that the context-dependent object is an "or" relationship. "at least one of" or the like means any combination of these items, including any combination of single item(s) or plural items(s). For example, at least one (one) of a, b or c may represent: a, b, c, "a and b", "a and c", "b and c", or "a and b and c", wherein a, b, c may be single or plural.

In the several embodiments provided by the present application, it should be understood that the disclosed apparatus and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the above-described division of units is merely a logical function division, and there may be another division manner in actual implementation, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.

The units described above as separate components may or may not be physically separate, and components shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be embodied in essence or a part contributing to the prior art or all or part of the technical solution in the form of a software product stored in a storage medium, including multiple instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method of the various embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing a program.

The foregoing description of the preferred embodiments of the present application has been presented with reference to the drawings and is not intended to limit the scope of the claims. Any modifications, equivalent substitutions and improvements made by those skilled in the art without departing from the scope and spirit of the embodiments of the present application shall fall within the scope of the claims of the embodiments of the present application.

Claims

1. A method of ordering data, the method comprising:

2. The method of claim 1, wherein performing tuple extraction on the target data table according to the first target column field, the sort type data, and the return data amount to obtain a first tuple comprises:

3. The method of claim 2, wherein constructing the target data table from the first target column field into a second tuple comprises:

acquiring a data number column of the target data table;

4. A method according to claim 3, wherein said sorting of tuples according to said sort type data to obtain a third tuple comprises:

5. The method of claim 4, wherein performing tuple selection on the third tuple according to the amount of returned data to obtain the first tuple comprises:

6. The method of claim 2, wherein the constructing the target data table from the first target column field into a second tuple further comprises:

7. The method of any of claims 1 to 6, wherein before said when said amount of returned data is less than a preset ordering data amount threshold, the method further comprises:

8. A data ordering apparatus, the apparatus comprising:

9. A computer device, comprising:

at least one memory;

at least one processor;

at least one computer program;

the at least one computer program is stored in the at least one memory, the at least one processor executing the at least one computer program to implement:

the method of any one of claims 1 to 7.

10. A computer-readable storage medium storing a computer program for causing a computer to execute:

the method of any one of claims 1 to 7.