CN106484868B - Data reordering method and data collator based on LIMIT semanteme - Google Patents

Data reordering method and data collator based on LIMIT semanteme Download PDF

Info

Publication number
CN106484868B
CN106484868B CN201610888986.XA CN201610888986A CN106484868B CN 106484868 B CN106484868 B CN 106484868B CN 201610888986 A CN201610888986 A CN 201610888986A CN 106484868 B CN106484868 B CN 106484868B
Authority
CN
China
Prior art keywords
data
reordering buffer
limit
reordering
tuple
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610888986.XA
Other languages
Chinese (zh)
Other versions
CN106484868A (en
Inventor
李海翔
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
BEIJING VSETTAN DATA TECHNOLOGY CO.,LTD.
Original Assignee
Beijing Huasheng Xintai Data Technology Co Ltd
Huasheng Xintai Information Industry Development Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Huasheng Xintai Data Technology Co Ltd, Huasheng Xintai Information Industry Development Co Ltd filed Critical Beijing Huasheng Xintai Data Technology Co Ltd
Priority to CN201610888986.XA priority Critical patent/CN106484868B/en
Publication of CN106484868A publication Critical patent/CN106484868A/en
Application granted granted Critical
Publication of CN106484868B publication Critical patent/CN106484868B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • G06F16/24534Query rewriting; Transformation
    • G06F16/24539Query rewriting; Transformation using cached or materialised query results

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Memory System Of A Hierarchy Structure (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention proposes a kind of data reordering method and device based on LIMIT semanteme, which includes: the first reordering buffer of distribution and the second reordering buffer;Target data is read into several times in the first reordering buffer, when reading target data every time, to the data sorting in the first reordering buffer, judge whether it is for the first time to the data sorting in the first reordering buffer, if for the first time to the data sorting in the first reordering buffer, then the preceding N tuple for meeting LIMIT semantic constraint in the first reordering buffer is stored in the second reordering buffer, if non-for the first time to the data sorting in the first reordering buffer, then by the preceding N tuple in the first reordering buffer and the coalescence in the second reordering buffer, preceding N tuple after merger is stored in the second reordering buffer;According to the data in the second reordering buffer, it is determined for compliance with data defined by the LIMIT semanteme.According to the technical solution of the present invention, the efficiency of sorting operation is improved.

Description

Data reordering method and data collator based on LIMIT semanteme
Technical field
The present invention relates to database technical fields, in particular to a kind of data reordering method based on LIMIT semanteme With a kind of data sorting device based on LIMIT semanteme.
Background technique
In data base management system, query statement is frequently necessary to be ranked up data, and a common issuer Formula is that the clause of LIMIT keyword is had in query statement, i.e., the row_ in top n tuple is found out from orderly data Num tuple.In the very big situation of data volume, it is usually necessary to use the technologies of external sort to data progress for sorting operation Sequence processing;Then conflation algorithm is utilized, the merging of data is carried out;Later, on the upper layer of iterator, then the tuple of return is counted Number realizes the semanteme of LIMIT.
But in the data sorting scheme of the prior art, often a large amount of data are all ranked up, it can account in this way With a large amount of memory, a large amount of system for computer resource is consumed.Moreover, because limited memory is to use external sort When technology is ranked up, a large amount of I/O operation can be generated, reduces the efficiency of data base management system.In addition, on iterator Layer rather than realize that LIMIT is semantic in sorting operation so that iterator also needs to carry out counting operation to the tuple of return, increase The workload of calculating is not optimized so as to cause most time-consuming sequencer procedure.
Therefore, how the data sorting that LIMIT clause is had in query statement is optimized, to save computer system Resource and improve sorting operation speed and efficiency become technical problem urgently to be resolved.
Summary of the invention
The present invention is directed to solve at least one of the technical problems existing in the prior art or related technologies.
For this purpose, an object of the present invention is to provide a kind of data reordering methods based on LIMIT semanteme.
It is another object of the present invention to propose a kind of data sorting device based on LIMIT semanteme.
To realize that at least one above-mentioned purpose proposes one kind and be based on according to the embodiment of the first aspect of the invention The data reordering method of LIMIT semanteme, comprising: when shifting sorting operation under LIMIT is semantic, for target data distribution first Reordering buffer and the second reordering buffer;The target data is read into several times in first reordering buffer, directly Until the target data has been read, wherein when the target data being read into first reordering buffer every time, It is performed both by following steps: to the data sorting in first reordering buffer, judging whether slow to first sequence for the first time The data sorting in area is deposited, if it is determined that for for the first time to the data sorting in first reordering buffer, then by the first row The preceding N tuple for meeting the LIMIT semantic constraint in sequence buffer area is stored in second reordering buffer, if it is determined that To be non-for the first time to the data sorting in first reordering buffer, then by the preceding N tuple in first reordering buffer With the coalescence in second reordering buffer, the preceding N tuple after merger is stored in second reordering buffer In;And according to the data in second reordering buffer, it is determined for compliance with data defined by the LIMIT semanteme.
In the above-mentioned technical solutions, it is preferable that described before the step of shifting sorting operation under the semanteme by LIMIT Data reordering method further include: determine tuple number N defined by the LIMIT semanteme;If N is less than or equal to preset threshold, Then execute the step of shifting sorting operation under the semanteme by LIMIT.
In any of the above-described technical solution, it is preferable that if not the target data is read into described the by last time In one reordering buffer, then the target data is read full into first reordering buffer.
In any of the above-described technical solution, it is preferable that the space of second reordering buffer is greater than or equal to for depositing Store up the space of 2N tuple data.
In any of the above-described technical solution, it is preferable that the number of first reordering buffer is one or more, in institute State the first reordering buffer number be it is multiple in the case where, by the target data be read into parallel it is multiple it is described first sequence In buffer area.
Embodiment according to the second aspect of the invention proposes a kind of data sorting device based on LIMIT semanteme, packet Include: allocation unit, for when shifting sorting operation under LIMIT is semantic, for target data distribute the first reordering buffer and Second reordering buffer;Processing unit, for the target data to be read into several times in first reordering buffer, directly Until the target data has been read, wherein when the target data being read into first reordering buffer every time, The processing unit judges whether it is slow to first sequence for the first time to the data sorting in first reordering buffer The data sorting in area is deposited, if it is determined that for for the first time to the data sorting in first reordering buffer, then by the first row The preceding N tuple for meeting the LIMIT semantic constraint in sequence buffer area is stored in second reordering buffer, if it is determined that To be non-for the first time to the data sorting in first reordering buffer, then by the preceding N tuple in first reordering buffer With the coalescence in second reordering buffer, then the preceding N tuple after merger be stored in second order buffer Qu Zhong;And first determination unit, for being determined for compliance with the LIMIT language according to the data in second reordering buffer Data defined by justice.
In the above-mentioned technical solutions, it is preferable that the data sorting device further include: the second determination unit, for determining Tuple number N defined by the LIMIT semanteme;Wherein, when N is less than or equal to preset threshold, the allocation unit is by institute It states and shifts sorting operation under LIMIT semanteme.
In any of the above-described technical solution, it is preferable that the processing unit is specifically used for, if not last time will be described Target data is read into first reordering buffer, then first reordering buffer is completely arrived in target data reading In.
In any of the above-described technical solution, it is preferable that the space of second reordering buffer is greater than or equal to for depositing Store up the space of 2N tuple data.
In any of the above-described technical solution, it is preferable that the number of first reordering buffer is one or more, in institute State the first reordering buffer number be it is multiple in the case where, the target data is read into multiple by the processing unit parallel In first reordering buffer.
The data reordering method and data collator based on LIMIT semanteme through the invention, can be in data depositary management The data sorting with LIMIT clause is optimized in reason system, to improve the speed and efficiency of sorting operation.
Detailed description of the invention
The process that Fig. 1 shows the data reordering method according to an embodiment of the invention based on LIMIT semanteme is shown It is intended to;
Fig. 2 shows the processes of the data reordering method based on LIMIT semanteme according to another embodiment of the invention Schematic diagram;And
The structure that Fig. 3 shows the data sorting device according to an embodiment of the invention based on LIMIT semanteme is shown It is intended to.
Specific embodiment
It is with reference to the accompanying drawing and specific real in order to be more clearly understood that the above objects, features and advantages of the present invention Applying mode, the present invention is further described in detail.It should be noted that in the absence of conflict, the implementation of the application Feature in example and embodiment can be combined with each other.
In the following description, numerous specific details are set forth in order to facilitate a full understanding of the present invention, still, the present invention may be used also To be implemented using other than the one described here other modes, therefore, protection scope of the present invention is not by described below Specific embodiment limitation.
The process that Fig. 1 shows the data reordering method according to an embodiment of the invention based on LIMIT semanteme is shown It is intended to.
As shown in Figure 1, the data reordering method according to an embodiment of the invention based on LIMIT semanteme, comprising:
Step 102, when shifting sorting operation under LIMIT is semantic, for target data distribute the first reordering buffer and Second reordering buffer.
Step 104, the target data is read into several times in first reordering buffer, until the number of targets According to until having read, wherein when being every time read into the target data in first reordering buffer, be performed both by following step It is rapid: to the data sorting in first reordering buffer, to judge whether it is for the first time to the number in first reordering buffer According to sequence, if it is determined that for for the first time to the data sorting in first reordering buffer, then it will be in first reordering buffer The preceding N tuple for meeting the LIMIT semantic constraint be stored in second reordering buffer, if it is determined that be non-right for the first time Data sorting in first reordering buffer, then by the preceding N tuple and described second in first reordering buffer Preceding N tuple after merger is stored in second reordering buffer by the coalescence in reordering buffer.
For example, be determined as it is non-for the first time to the data sorting in the first reordering buffer when, in the first reordering buffer Top n tuple are as follows: tuple A, tuple B and tuple D (i.e. N=3) have tuple B, tuple C and tuple E in the second reordering buffer, After the tuple in the top n tuple and the second reordering buffer in the first reordering buffer is so carried out merger are as follows: tuple A, Tuple B, tuple B, tuple C, tuple D and tuple E take preceding 3 tuples (i.e. tuple A, tuple B, tuple B) to be stored in the second sequence In buffer area.
For another example be determined as it is non-for the first time to the data sorting in the first reordering buffer when, in the first reordering buffer Top n tuple are as follows: tuple D, tuple B and tuple A (i.e. N=3) have tuple E, tuple B and tuple in the second reordering buffer C, then after the tuple in the top n tuple and the second reordering buffer in the first reordering buffer is carried out merger are as follows: tuple E, tuple D, tuple C, tuple B, tuple B and tuple A take preceding 3 tuples (i.e. tuple E, tuple D, tuple C) to be stored in second row In sequence buffer area.
Step 106, it according to the data in second reordering buffer, is determined for compliance with defined by the LIMIT semanteme Data.
In the technical scheme, by configuring two kinds of reordering buffers (the first reordering buffer and second rows in memory Sequence buffer area), when carrying out merger operation to the tuple after sequence, participation merger can be reduced using two kinds of reordering buffers The tuple number of operation, to save IO (In Out) operation to tuple and avoid occupying CPU (Central Processing Unit, central processing unit) excessive resource.And this programme does not need to execute the multichannel merger based on external memory The step of algorithm can also greatly reduce the I/O operation for reading file and written document, simplify data sorting, to effectively improve The speed and efficiency of data sorting.In addition, after all having handled target data, by the tuple in the second reordering buffer The processing that iterator carries out next step is issued, the iterator avoided in the related technology unites to the tuple number of return Meter converts internal sort operation for external sorting operation to realize.
In the above-mentioned technical solutions, it is preferable that described before the step of shifting sorting operation under the semanteme by LIMIT Data reordering method further include: determine tuple number N defined by the LIMIT semanteme;If N is less than or equal to preset threshold, Then execute the step of shifting sorting operation under the semanteme by LIMIT.
In the technical scheme, the tuple number N as defined by LIMIT semanteme is excessive, will affect the effect of data sorting Fruit, therefore, the scheme more than N is just executed less than or equal to preset threshold, thus reliability when ensure that data sorting.? In one preferred embodiment, preset threshold be 100 (this value is parameter, can according to the actual situation flexible setting).
In one embodiment, the format of LIMIT clause is " LIMIT { offset, row_num } ", wherein offset table Show offset, row_num indicates the tuple number from offset backward, can calculate according to offset and row_num Tuple number N=offset+row_num defined by LIMIT semanteme.Such as offset=3, row_num=7, then N=10. In another embodiment, the format of LIMIT clause is " LIMIT { row_num } ", wherein defaulting offset=0, equally can Tuple number N=offset+row_num defined by LIMIT semanteme, such as row_num=5 are calculated, then N=5.
In any of the above-described technical solution, it is preferable that if not the target data is read into described the by last time In one reordering buffer, then the target data is read full into first reordering buffer.
In the technical scheme, when reading target data due to last time, the target data that last time is read may not First reordering buffer can be read completely, and when not being that target data is read into the first reordering buffer by last time, it reads Entering target data to the first reordering buffer is full state, the first reordering buffer can be made full use of, to improve number According to the efficiency of sequence.
In any of the above-described technical solution, it is preferable that the space of second reordering buffer is greater than or equal to for depositing Store up the space of 2N tuple data.
In the technical scheme, the space of the second reordering buffer is greater than or equal to the data for storing 2N tuple Space, so that having enough spaces in the second reordering buffer to store data, to effectively improve data sorting Speed and efficiency.
In any of the above-described technical solution, it is preferable that the number of first reordering buffer is one or more, in institute State the first reordering buffer number be it is multiple in the case where, the target data is read into multiple by the processing unit parallel In first reordering buffer.
In the technical scheme, by being read into target data parallel in multiple first reordering buffers, further Improve the speed and efficiency of data sorting.
It certainly, can also be by number of targets in addition to that can be read into target data parallel in multiple first reordering buffers According to being serially read into multiple first reordering buffers.
In addition, the number of the first reordering buffer and the second reordering buffer can be identical, it can not also be identical.If first Reordering buffer is identical with the number of the second reordering buffer, and the number of the first reordering buffer and the second reordering buffer is Multiple, then the first reordering buffer and the second reordering buffer correspond, i.e., by multiple first reordering buffers and multiple the Two reordering buffers are divided into multiple groups, and each group has first reordering buffer and second reordering buffer, often The data in the first reordering buffer and the second reordering buffer in one group be performed both by sorting operation in above-mentioned steps 104 and Merger operation, after being disposed to target data, executes the data in the second all reordering buffers again and repeatedly returns And operate, until only retaining the data of second reordering buffer, by the top n tuple in last current merger operation As meet LIMIT semanteme defined by data.
In the technical scheme, pass through multiple first reordering buffers and multiple second reordering buffers while processing target Data further increase the speed and efficiency of data sorting.
In any of the above-described technical solution, it is preferable that using quicksort algorithm (quick sorting algorithm) to described first Data sorting in reordering buffer.
Fig. 2 shows the processes of the data reordering method based on LIMIT semanteme according to another embodiment of the invention Schematic diagram.
As shown in Fig. 2, the data reordering method based on LIMIT semanteme according to another embodiment of the invention, comprising:
Step 202, optimizer push away under LIMIT is semantic.
Step 204, judge whether that the clause of BY containing ORDER enters step 206 when the judgment result is yes, tied in judgement When fruit is no, 234 are entered step, i.e., data are ranked up using scheme in the prior art.
Step 206, judge whether clause containing LIMIT, when the judgment result is yes, enter step 208, be in judging result When no, 234 are entered step, i.e., data are ranked up using scheme in the prior art.
Step 208, whether the format for judging LIMIT clause is LIMIT { offset, row_num } or LIMIT { row_ Num }, when the judgment result is yes, 210 are entered step, when the judgment result is No, enter step 234, that is, uses the prior art In scheme data are ranked up.Wherein, offset indicates offset, and row_num indicates the tuple from offset backward Number.
For example, if the format of LIMIT clause is LIMIT { 3,4 }, i.e. offset=3, row_num=4.In addition, if The format of LIMIT clause is LIMIT { row_num }, then default offset amount offset=0, such as LIMIT { 4 }, i.e. offset= 0, row_num=4.
Step 210, judge whether tuple number N defined by LIMIT semanteme is less than or equal to Kmax (i.e. preset threshold), If N is less than or equal to Kmax, 212 are entered step, if N is greater than Kmax, enters step 234, i.e., using in the prior art Scheme is ranked up data.Wherein, described above such as Fig. 1 embodiment, it can be calculated according to offset and row_num Tuple number N (N=offset+row_num) defined by LIMIT semanteme, details are not described herein.
Step 212, two values of offset and row_num are pushed away under to sorting operation.
Step 214, iterator is ranked up operation optimization.
Step 216, (i.e. the first reordering buffer) distribution sort buffer area B1.
Step 218, (i.e. the second reordering buffer) distribution sort buffer area B2.
Step 220, target data is read into B1, until reordering buffer B1 is full or last part target data All read in.
Step 222, when target data being read into the B1 of reordering buffer every time, to the data in the B1 of reordering buffer into Row sequence.Preferably, quicksort algorithm can be executed to the data in the B1 of reordering buffer to be ranked up.
Step 224, judge whether it is the data sorting executed for the first time in the B1 of reordering buffer, determining to be to execute for the first time When, 226 are entered step, when determining is not to execute for the first time, enters step 228.
Step 226, the preceding N tuple in the B1 of reordering buffer is put into reordering buffer B2.
Step 228, the preceding N tuple in the B1 of reordering buffer directly executes merger behaviour with the tuple in the B2 of reordering buffer Make, after merger, to reordering buffer B2, other are abandoned N item before retaining.
Step 230, judge whether target complete data are completed, when the judgment result is yes, enter step 232, judging When being as a result no, 220 are entered step.
Step 232, the data in the B2 of reordering buffer are sent to iterator.In the embodiment above, reordering buffer B1 Number with reordering buffer B2 is one, and therefore, the data in the B2 of reordering buffer are to meet LIMIT semanteme to be limited Data, the data in the B2 of reordering buffer are sent to iterator.
Certainly the number of reordering buffer B1 can also be multiple, and target data can be thus read into parallel to multiple rows In sequence buffer area B1.And in the case that the number of reordering buffer B1 is multiple, the number of reordering buffer B2 can be one It is a or multiple.If the number of reordering buffer B2 be it is multiple, after target data has been read, finally multiple reordering buffer B2 In have data (respectively retain N tuple), the data in multiple reordering buffer B2 are finally subjected to multiple merger operation, and will Preceding N tuple in last time merger operation, which is used as, meets data defined by LIMIT semanteme.
Step 234, original process, i.e. execution prior art.
In above technical scheme, there are two types of reordering buffer B1 and B2 for distribution, are carrying out merger to the tuple after sequence When operation, the tuple number for participating in merger operation can be reduced using two kinds of reordering buffers, to save to tuple I/O operation and the resource for avoiding occupancy CPU excessive.And this programme does not need to execute the multichannel conflation algorithm based on external memory, and it can also With the step of greatly reducing the I/O operation for reading file and written document, simplifying data sorting, to effectively improve data The speed and efficiency of sequence.In addition, the tuple in the second reordering buffer is issued repeatedly after all having handled target data It being handled in next step for device progress, the iterator avoided in the related technology counts the tuple number of return, from And it realizes and converts internal sort operation for external sorting operation.
The structure that Fig. 3 shows the data sorting device according to an embodiment of the invention based on LIMIT semanteme is shown It is intended to.
As shown in figure 3, the data sorting device 300 according to an embodiment of the invention based on LIMIT semanteme, packet It includes: allocation unit 302, processing unit 304 and the first determination unit 306.
Allocation unit 302, for being sorted for target data distribution first when shifting sorting operation under LIMIT is semantic Buffer area and the second reordering buffer;Processing unit 304, for the target data to be read into first sequence several times In buffer area, until the target data has been read, wherein the target data is read into first sequence every time When in buffer area, the processing unit 304 judges whether it is right for the first time to the data sorting in first reordering buffer Data sorting in first reordering buffer, if it is determined that for for the first time to the data sorting in first reordering buffer, The preceding N tuple for meeting the LIMIT semantic constraint in first reordering buffer is then stored in second sequence In buffer area, if it is determined that be non-for the first time to the data sorting in first reordering buffer, then by first order buffer The coalescence in preceding N tuple and second reordering buffer in area, then the preceding N tuple after merger is stored in institute It states in the second reordering buffer;And first determination unit 306, for according to the data in second reordering buffer, really Surely meet data defined by the LIMIT semanteme.
In the technical scheme, by configuring two kinds of reordering buffers in memory, return to the tuple after sequence And when operating, the tuple number for participating in merger operation can be reduced using two kinds of reordering buffers, to save to tuple IO (In Out) operation and avoid occupying CPU (Central Processing Unit, central processing unit) excessive resource. And this programme does not need to execute the multichannel conflation algorithm based on external memory, can also greatly reduce the IO for reading file and written document The step of operating, simplifying data sorting, to effectively improve the speed and efficiency of data sorting.In addition, to target After data have all been handled, the tuple in the second reordering buffer is issued into iterator progress and is handled in next step, is avoided Iterator in the related technology counts the tuple number of return, converts internal sort for external sorting operation to realize Operation.
In the above-mentioned technical solutions, it is preferable that data sorting device 300 further include: the second determination unit 308, for true Tuple number N defined by the fixed LIMIT semanteme;Wherein, when N is less than or equal to preset threshold, allocation unit 302 is by institute It states and shifts sorting operation under LIMIT semanteme.
If tuple number N defined by LIMIT semanteme is excessive, the effect of data sorting will affect, therefore, in the technology In scheme, N be less than or equal to preset threshold just execute more than scheme, thus reliability when ensure that data sorting.? In one preferred embodiment, preset threshold 100.
In any of the above-described technical solution, it is preferable that if not the target data is read into described the by last time In one reordering buffer, processing unit 304 reads the target data full into first reordering buffer.
In the technical scheme, when reading target data due to last time, the target data that last time is read may not First reordering buffer can be read completely, and when not being that target data is read into the first reordering buffer by last time, the One reordering buffer is full state, the first reordering buffer can be made full use of, to improve the efficiency of data sorting.
In any of the above-described technical solution, it is preferable that the space of second reordering buffer is greater than or equal to for depositing Store up the space of 2N tuple data.
In the technical scheme, the space of the second reordering buffer is greater than or equal to the data for storing 2N tuple Space, so that having enough spaces in the second reordering buffer to store data, to effectively improve data sorting Speed and efficiency.
In any of the above-described technical solution, it is preferable that the number of first reordering buffer is one or more, in institute State the first reordering buffer number be it is multiple in the case where, the target data is read into multiple by processing unit 304 parallel In first reordering buffer.
In the technical scheme, by being read into target data parallel in multiple first reordering buffers, further Improve the speed and efficiency of data sorting.
Certainly, target data can be read into multiple first reordering buffers by processing unit 304 parallel, can also be incited somebody to action Target data is serially read into multiple first reordering buffers.
In addition, the number of the first reordering buffer and the second reordering buffer can be identical, it can not also be identical.If first Reordering buffer is identical with the number of the second reordering buffer, and the number of the first reordering buffer and the second reordering buffer is Multiple, then the first reordering buffer and the second reordering buffer correspond, i.e., by multiple first reordering buffers and multiple the Two reordering buffers are divided into multiple groups, and each group has first reordering buffer and second reordering buffer, place Reason unit 304 is performed both by above-mentioned sequence to the data in the first reordering buffer and the second reordering buffer in each group and grasps Make and merger operates, after being disposed to target data, processing unit 304 is to the number in the second all reordering buffers It is operated according to multiple merger is executed again, until only retaining the data of second reordering buffer, and will finally current merger Top n tuple in operation, which is used as, meets data defined by LIMIT semanteme.
The technical scheme of the present invention has been explained in detail above with reference to the attached drawings, the number based on LIMIT semanteme through the invention According to sort method and data collator, the data sorting with LIMIT clause can be carried out in data base management system Optimization, to improve the speed and efficiency of sorting operation.
The foregoing is only a preferred embodiment of the present invention, is not intended to restrict the invention, for the skill of this field For art personnel, the invention may be variously modified and varied.All within the spirits and principles of the present invention, made any to repair Change, equivalent replacement, improvement etc., should all be included in the protection scope of the present invention.

Claims (10)

1. a kind of data reordering method based on LIMIT semanteme characterized by comprising
When shifting sorting operation under LIMIT is semantic, the first reordering buffer and the second order buffer are distributed for target data Area;
The target data is read into several times in first reordering buffer, is until the target data has been read Only, wherein when being every time read into the target data in first reordering buffer, be performed both by following steps: to described Data sorting in first reordering buffer judges whether it is for the first time to the data sorting in first reordering buffer, if Be determined as the first time to the data sorting in first reordering buffer, then it is meeting in first reordering buffer is described The preceding N tuple of LIMIT semantic constraint is stored in second reordering buffer, if it is determined that being non-for the first time to the first row Data sorting in sequence buffer area, then by first reordering buffer preceding N tuple and second reordering buffer In coalescence, the preceding N tuple after merger is stored in second reordering buffer;And
According to the data in second reordering buffer, it is determined for compliance with data defined by the LIMIT semanteme.
2. the data reordering method according to claim 1 based on LIMIT semanteme, which is characterized in that described by LIMIT language Before the step of shifting sorting operation under justice, the data reordering method further include:
Determine tuple number N defined by the LIMIT semanteme;
If N is less than or equal to preset threshold, the step of shifting sorting operation under the semanteme by LIMIT is executed.
3. the data reordering method according to claim 1 based on LIMIT semanteme, which is characterized in that
If not the target data is read into first reordering buffer by last time, then the target data is read Completely into first reordering buffer.
4. the data reordering method according to claim 1 based on LIMIT semanteme, which is characterized in that
The space of second reordering buffer is greater than or equal to the space for storing 2N tuple data.
5. the data reordering method according to any one of claim 1 to 4 based on LIMIT semanteme, which is characterized in that
The number of first reordering buffer is one or more, and the number in first reordering buffer is multiple feelings Under condition, the target data is read into parallel in multiple first reordering buffers.
6. a kind of data sorting device based on LIMIT semanteme characterized by comprising
Allocation unit, for when shifting sorting operation under LIMIT is semantic, for target data distribute the first reordering buffer and Second reordering buffer;
Processing unit, for the target data to be read into several times in first reordering buffer, until the target Until reading data is complete, wherein when being every time read into the target data in first reordering buffer, the processing is single Member judges whether it is for the first time to the number in first reordering buffer data sorting in first reordering buffer According to sequence, if it is determined that for for the first time to the data sorting in first reordering buffer, then it will be in first reordering buffer The preceding N tuple for meeting the LIMIT semantic constraint be stored in second reordering buffer, if it is determined that be non-right for the first time Data sorting in first reordering buffer, then by the preceding N tuple and described second in first reordering buffer Coalescence in reordering buffer, then the preceding N tuple after merger is stored in second reordering buffer;And
First determination unit, for being determined for compliance with according to the data in second reordering buffer, the LIMIT is semantic to be limited Fixed data.
7. the data sorting device according to claim 6 based on LIMIT semanteme, which is characterized in that the data sorting Device further include:
Second determination unit, for determining tuple number N defined by the LIMIT semanteme;
Wherein, when N is less than or equal to preset threshold, the allocation unit will shift sorting operation under the LIMIT semanteme.
8. the data sorting device according to claim 6 based on LIMIT semanteme, which is characterized in that
If the processing unit is not that the target data is read into first reordering buffer by last time, described Processing unit reads the target data full into first reordering buffer.
9. the data sorting device according to claim 6 based on LIMIT semanteme, which is characterized in that
The space of second reordering buffer is greater than or equal to the space for storing 2N tuple data.
10. the data sorting device according to any one of claims 6 to 9 based on LIMIT semanteme, which is characterized in that
The number of first reordering buffer is one or more, and the number in first reordering buffer is multiple feelings Under condition, the target data is read into multiple first reordering buffers by the processing unit parallel.
CN201610888986.XA 2016-10-11 2016-10-11 Data reordering method and data collator based on LIMIT semanteme Active CN106484868B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610888986.XA CN106484868B (en) 2016-10-11 2016-10-11 Data reordering method and data collator based on LIMIT semanteme

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610888986.XA CN106484868B (en) 2016-10-11 2016-10-11 Data reordering method and data collator based on LIMIT semanteme

Publications (2)

Publication Number Publication Date
CN106484868A CN106484868A (en) 2017-03-08
CN106484868B true CN106484868B (en) 2019-07-09

Family

ID=58270561

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610888986.XA Active CN106484868B (en) 2016-10-11 2016-10-11 Data reordering method and data collator based on LIMIT semanteme

Country Status (1)

Country Link
CN (1) CN106484868B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110297858B (en) * 2019-05-27 2021-11-09 苏宁云计算有限公司 Optimization method and device for execution plan, computer equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1581162A (en) * 2004-03-03 2005-02-16 北京大学 Quick-sorting in page method based on quick sorting computation
CN103605750A (en) * 2013-11-22 2014-02-26 厦门雅迅网络股份有限公司 Rapid distributed data paging method
CN104598485A (en) * 2013-11-01 2015-05-06 国际商业机器公司 Method and device for processing database table
CN105224697A (en) * 2015-11-16 2016-01-06 北京京东尚科信息技术有限公司 Sort method with filtercondition and the device for performing described method

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160188643A1 (en) * 2014-12-31 2016-06-30 Futurewei Technologies, Inc. Method and apparatus for scalable sorting of a data set

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1581162A (en) * 2004-03-03 2005-02-16 北京大学 Quick-sorting in page method based on quick sorting computation
CN104598485A (en) * 2013-11-01 2015-05-06 国际商业机器公司 Method and device for processing database table
CN103605750A (en) * 2013-11-22 2014-02-26 厦门雅迅网络股份有限公司 Rapid distributed data paging method
CN105224697A (en) * 2015-11-16 2016-01-06 北京京东尚科信息技术有限公司 Sort method with filtercondition and the device for performing described method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于多线程归并排序算法设计;孙琳琳等;《吉林大学学报(信息科学版)》;20150115;第33卷(第1期);第105-110页

Also Published As

Publication number Publication date
CN106484868A (en) 2017-03-08

Similar Documents

Publication Publication Date Title
US10585889B2 (en) Optimizing skewed joins in big data
CN107239335B (en) Job scheduling system and method for distributed system
US8620932B2 (en) Parallel sorting apparatus, method, and program
CN102354289B (en) Concurrent transaction scheduling method and related device
CN104111936B (en) Data query method and system
CA3177212A1 (en) Resource allocating method, device, computer equipment, and storage medium
CN111460023A (en) Service data processing method, device, equipment and storage medium based on elastic search
CN102521347B (en) Pattern matching intermediate result management method based on priority
Mitzenmacher Analyzing distributed join-idle-queue: A fluid limit approach
CN102831120A (en) Data processing method and system
CN112100233B (en) Flight time linking method and system based on tabu search algorithm
US9189489B1 (en) Inverse distribution function operations in a parallel relational database
CN103440246A (en) Intermediate result data sequencing method and system for MapReduce
US20170308578A1 (en) A method for efficient one-to-one join
CN104871153A (en) System and method for flexible distributed massively parallel processing (mpp) database
CN109828790A (en) A kind of data processing method and system based on Shen prestige isomery many-core processor
CN109885642A (en) Classification storage method and device towards full-text search
CN103116641B (en) Obtain method and the collator of the statistics of sequence
CN112527836A (en) Big data query method based on T-BOX platform
CN105550180B (en) The method, apparatus and system of data processing
CN106484868B (en) Data reordering method and data collator based on LIMIT semanteme
US8667008B2 (en) Search request control apparatus and search request control method
CN107172193A (en) A kind of load-balancing method and its device based on cluster
CN107169138B (en) Data distribution method for distributed memory database query engine
CN113568931A (en) Route analysis system and method for data access request

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20220418

Address after: Room 403, 4th floor, building 23, East District, yard 10, Xibeiwang East Road, Haidian District, Beijing 100089

Patentee after: BEIJING VSETTAN DATA TECHNOLOGY CO.,LTD.

Address before: 100192 West Zone, 10 / F, block a, No. 8 Xueqing Road (Science and technology wealth center), Haidian District, Beijing

Patentee before: VSETTAN INFORMATION INDUSTRY DEVELOPMENT CO.,LTD.

Patentee before: Beijing Huasheng Xintai Data Technology Co., Ltd

TR01 Transfer of patent right