Specific embodiment
According to an embodiment of the invention, a kind of system that ordered data is provided is provided, including data source components, connecting portion
Point, sorting mechanism part and client.Each data source provides the data of local order by coupling part to sorting mechanism, through row
After order mechanism processing, the data of global orderly are finally provided in client.
Each data source of data source components is, for example, MySQL examples.
Coupling part is used to managing and being assigned to the connection of each data source, each data source is accessed with to realize sorting mechanism
Pipeline so that sorting mechanism can obtain data from each data source.
The data that sorting mechanism is obtained from each data source are that order obtains according to the rules, i.e., the number that each data source provides
According to itself being local order, such as all it is incremental order.Sorting mechanism needs to carry out the data from different data sources complete
Office's sequence.
Sorting mechanism is, for example, database middleware.The database middleware is to there is a TCP connection between a MySQL.
Data on MySQL examples are constantly sent to middleware by the connection.
Respectively there are one buffering areas for the transmitting terminal and receiving terminal of TCP connection.MySQL examples are sending data to TCP connection
When, in the buffering area for the transmitting terminal that TCP connection can be placed data into first, it is subsequently sent to opposite receiving terminal, receiving terminal
Can first data be stored in the buffering area of receiving terminal of the TCP connection later by receiving data, these operations are all by operating
What system was completed in itself.
Middleware receives the data on MySQL examples by TCP connection, is exactly the reception in middleware reading TCP connection
Content in the buffering area at end.
When the buffering area of receiving terminal is less than the data on MySQL examples can just continue through TCP connection and send over, no
Then the transmission of the data on MySQL examples will be blocked.
According to an embodiment of the invention, the schematic diagram of entire sequence processing is as shown in Figure 1, also there are four portions on corresponding figure
Point:
MySQL example sets 101;
TCP connection 102 between middleware and MySQL examples;
Middleware sorting mechanism 103;And
Client 104.
According to an embodiment of the invention, a kind of system composition, wherein data source components MySQL example sets is shown in FIG. 1
MySQL examples there are six 101.Sequence middleware 103 there is one to be connected with each MySQL examples, share six connect A, B, C,
D, E and F.
By way of example it is assumed that there are some data, these data on each MySQL examples to transmit discharge by coupling part
Sequence middleware.The connection data to be transmitted of A have 0,6 and 12, and the connection data to be transmitted of B have 1,7 and 13, and connection C to be transmitted
Data have 8 and 20, and the connection data to be transmitted of D have 9,15 and 21, and the connection data to be transmitted of E have 4 and 10, and connection F will be transmitted
Data have 5,11 and 17.The data to be transmitted all are ranked in every connection, it is assumed that are sequentially from small to large.
The method according to the present invention for providing ordered data includes step as described below.
Sorting mechanism receives the inquiry request with ordering requirements from client, and parses the inquiry request to generate pair
It should be asked in the subquery of each data source.
Then, sorting mechanism obtains the connection corresponding to above-mentioned each data source, establish with the corresponding buffering area of each connection simultaneously
Empty and labeled as less than, and the request of each subquery of generation by corresponding linkup transmit to corresponding data source.It is described
Each data source responds corresponding subquery and retrieves corresponding data record in its database, and passes through its corresponding connection and return
Return the data record to sort on request.
Sorting mechanism 103 carrys out poll according to predetermined rule and respectively connects to determine which is connected with data and can be read.Work as row
Order mechanism determines a certain to be connected with that data can be read and the corresponding core buffer of the connection is less than then reading institute in the connection
There is readable data record and store into the corresponding core buffer of the connection.When the data volume in the buffering area is more than predetermined
It is to have expired the buffer tag during threshold value.Then, sorting mechanism determines to need whether the data read by the connection have been located
Reason finishes, and the linkage flag is terminated to read when the data that need to read by the connection all read and finish.
Then, sorting mechanism 103 performs the heapsort of all connections.The ranking value of connection is determined according to following rule:Such as
The fruit connection is marked as being disposed, infinitely great to the ascending sort connection value, to the value of the descending sort connection without
It limits small;If the connection is not labeled as being disposed and the corresponding buffering area of the connection is sky, to the ascending sort connection
Value is infinitely small, the value infinity to the descending sort connection;If not above-mentioned two situations, the value of the connection is the company
Connect the sort field value of first record in corresponding buffering area.
After a heapsort, determine whether the corresponding buffering area of connection positioned at heap top is empty.When the connection on heap top
During corresponding core buffer non-empty, first record of taking-up is sent to client 104 from the core buffer, then repeats
It the heapsort of all connections and first record in the corresponding core buffer of connection on heap top is taken out is sent to client
The processing at end, until the corresponding core buffer of the connection on heap top for sky, wherein being taken from the corresponding core buffer in heap top
Go out first record be sent to after client to judge the core buffer whether less than, when less than when, cancel the memory and delay
Rush the note of full scale in area.
Terminate when the corresponding core buffer of connection on heap top is labeled reading for empty and heap top the connection, then mark
This is connected as being disposed, and otherwise continues the poll, to attempt to read the data in some connection.
When all connections have all been disposed, then end mark is sent to client represents that sequencer procedure terminates,
Otherwise the poll is continued, to attempt to read the data in some connection.
The detailed process of the method for present invention offer ordered data is described with reference to Fig. 2.
The middleware that sorts receives the inquiry request with ordering requirements from client, is, for example, SQL request.
The inquiry request is received in step S2001 sequence middlewares and parses the inquiry request.If the inquiry request relates to
And multiple MySQL examples, then it splits to the inquiry request, the subquery generated corresponding to each MySQL examples is asked.
If inquiry request only relates to single MySQL examples, decomposition query request is not required, middleware only needs to receive
The data that the single MySQL examples send over.
Step S2003 sequence middleware obtain the connection in relation to each MySQL examples, then step S2005 establish and
These connect corresponding buffering areas for storing data record, and these buffer empties and labeled as less than.
It is connected in step S2007 sequence middlewares by these and sub- SQL request of generation is sent to corresponding MySQL realities
Example.
Each MySQL examples receive corresponding sub- SQL request, retrieve data according to sub- SQL request, and accordingly data are done
Then ordered data is returned to sequence middleware by sequence by corresponding connection.
According to predetermined rule carrying out poll in step S2009 sequence middlewares, which is connected with data and can read, if
It is readable not to be connected with data, then continues waiting for, and whether to be connected with data readable for poll again.It for example, can be according to setting
Time interval come periodically to it is each connection do poll, with judge connect whether there are data can be read.
How this specific implementation that judges dependent on connection is performed.For example, in the connection realized in TCP layer, TCP
Respectively there are one buffering areas for the transmitting terminal and receiving terminal of connection.Database such as MySQL examples to TCP connection send data when
It waits, in the buffering area for the transmitting terminal that TCP connection can be placed data into first, is subsequently sent to opposite receiving terminal, receives termination
Can first data be stored in the buffering area of receiving terminal of the TCP connection later by receiving data, these operations are all by accordingly grasping
Make system completion.
If judging can therefrom read data in the presence of connection in step S2009, the connection is judged in step S2011
Whether corresponding core buffer has expired.
Whether buffering area here has been full of whether the data volume referred in buffering area has been over predetermined threshold value.It is reading
When taking the data in a connection, it should which the data that can be read during this is connected all read and finish.Therefore a connection is being read
Data during, even if the corresponding buffering area of the connection has been expired, also the digital independent that can be read is finished and is stored in
In the buffering area (such as buffering area actually useful to preserve the space of " spilling part ").
When step S2011 judge buffering area less than the middleware that then sorts is all in step S2012 reads the connection can
Whether the data record of reading is simultaneously stored into the corresponding core buffer of the connection, then judge buffering area in step S2013
It is full, the buffer tag it is to have expired in step S2014 when expire constantly.
The present invention controls the use of memory by this mode of " full " mark of the setting buffers to connection, realizes
The balance of efficiency and availability.
Next, whether the data read by the connection are needed according to the judgement of the data of reading in step S2015
It all reads and finishes, i.e., whether the data that the corresponding data source of the connection (MySQL examples) should provide all have been read
It finishes.The judgement can be realized according to techniques known in the art.It is finished if read, in step S2017 this
Linkage flag terminates to read.
Next, heapsort is done to all connections in step S2019 or all connections are carried out at a heap
Reason.
Heapization processing is the key that improving performance in the present invention, because if simply using the thinking of traditional conflation algorithm
To handle entire sequencer procedure multiple connected Data duplications can be caused to compare.On the thin of " heapization processing " of the invention
Section, later special description.
After carrying out heap processing to all connections, the corresponding memory of connection in heap top is judged in step S2021
Whether buffering area is empty.
If sky, then the data that explanation also has connection do not send over, and rejudge the data of which connection again
It is readable.
If there is data record in the corresponding core buffer of the connection on heap top, first data record in buffering area
It is exactly not yet to export record minimum in the data record to client (this refers to the situation of ascending order, if descending sort, then
It is the largest record), first record is taken out from the core buffer of the connection on heap top in step S2029 and is sent to client
End." take out record " mentioned here refers to after the record is read from core buffer, the record from core buffer
Middle deletion, in this way, second original record of the core buffer of the connection on heap top becomes first record, it is meant that should
" value " of connection is changed.Then judge whether the core buffer of the connection is full again in step S2030, when
It is discontented, then cancel the note of full scale of the connection buffering area in step S2031.Then, then go to step S2019 all connections are done
Whether heapization processing, the core buffer then proceeded to depending on the connection of heap top have data to decide whether to continue cycling through processing still again
It goes to receive data.
When judging that the corresponding buffering area of connection positioned at heap top is sky in step S2021, then go to step S2022 and judge the company
Whether labeled read is connect to terminate.Terminate if do not read, go to step S2009, otherwise mark the connection in step S2023
It is disposed.
Then, judge whether that all connections have all been disposed in step S2025.
It has all been disposed if all of connection, has then gone to step S2027 and end mark is sent to client.Example
Such as, end mark, which corresponds to, refers to EOF bags in MySQL agreements.
It is connected if also had handle, then turns S2009 to rejudge the data which is connected readable.
According to the present invention, the limit of the buffering area of connection ensure that middleware required memory when handling ordered
It is limited and controllable, and heap part ensure that number of comparisons is that the minimum time complexity for ensuring entirely to sort is O
(nlgn)。
Heapization processing
Conflation algorithm is a kind of known sort algorithm, is had for two or more orderly sequences to be merged into one
The sequence of sequence.Assuming that clooating sequence is from small to large, it is every when n (n is greater than 2 integer) ordered sequences are related to
All alignments once, are got minimum element by secondary needs, then by the minimum element from corresponding orderly sequence
It is removed in row.Assuming that these ordered sequences are to be denoted as 1...k...n respectively, the wherein first element of sequence k is minimum.It is inciting somebody to action
After the first element of sequence k removes, first element of all ordered sequences compares once again, that is, compare n (n-1)/
2.But remaining n-1 ordered sequence compared each other at this time, it is only necessary to by the new first element and n- of sequence k
The first element of 1 ordered sequence is compared, i.e., compares n-1 times again.Actually since remaining n-1 ordered sequence
First element compared, if comparative result before has all had record, only need the new head of sequence k
A element is compared with the minimum element in the first element of remaining n-1 ordered sequence, so optimally
It only needs to compare 1 element that can get next minimum again, it is likely that the new header element of sequence k may be than other
Sequence first element it is all big, then still need again to compare once with remaining n-1 ordered sequence.But if n- before
The comparative result of 1 sequence is all recorded and recorded in the form of heap, then only needs to compare lgn times i.e.
It can.
Robert Freud (Robert W.Floyd) and WILLIAMS-DARLING Ton (J.Williams) were sent out jointly in 1964
Understand famous Heap algorithm (Heap Sort).Heapsort generates a kind of orderly pile structure, is complete binary tree.Heap
It is divided into most raft (or big root heap, great Ding Dui) and most rickle (or rootlet heap, little Ding Dui).In most raft, each node
Value all be not more than its father node value.In most rickle, the value of non-leaf nodes is less than the value of its child nodes.
It is exactly to do heapsort to each connection that heapization processing is done to connection.Heap (the y-bend generated in the heapization processing of the present invention
Tree) in node elements be connection rather than connection in record.
For the in the case of of sorting from small to large, using most rickle, i.e. the element on heap top is minimum, and according to following rule really
Surely the value connected:
If the corresponding buffering area of the connection is empty (not having data), i.e., data are transmitted across from MySQL examples not yet
Come, the value of the connection, which takes, infinitely small (is regarded as the minimum value of system, i.e., the system there can not possibly be numeric ratio this minimum value more
It is small);
Finished if the data to be transmitted of the connection are all processed, the value of the connection take it is infinitely great (it is believed that
It is the maximum of system, i.e., the system can not possibly have numeric ratio this maximum bigger);
If not above-mentioned two situations, the value of the connection is the sequence of first record in the corresponding buffering area of the connection
Field value.
In the case of most rickle, if there is connecting, there are no data to come, then the empty connection is bound to be pulled to
Heap top, if there is the data of connection have all been disposed, then the connection can be pulled to the bottommost of heap.It removes
Beyond both of these case, other when first item to be recorded as minimum connection must be located at the heap top of heap.
The heap of the present invention is exactly the first item record for making to meet as the connection of element in heap parent node connection
Value is less than the value of the first item record in child nodes connection.
If it is required that according to sorting from big to small, the present invention can be used most raft, and when relatively adjustment ratio
Compared with function.
Embodiment shown in specification and drawings is only used for explanation and illustration, the scope being not intended to limit the present invention, the present invention
It is defined by the claims.