WO2012041235A1

WO2012041235A1 - Page flipping method and system for distributed system

Info

Publication number: WO2012041235A1
Application number: PCT/CN2011/080321
Authority: WO
Inventors: 袁清
Original assignee: 腾讯科技（深圳）有限公司
Priority date: 2010-09-28
Filing date: 2011-09-28
Publication date: 2012-04-05
Also published as: CN102419756A

Abstract

Disclosed in the embodiment of the present invention are a page flipping method and system for distributed systems, comprising: sorting an unordered data tuple sequence S stored in the server until every entry of sequence S can be ensured of a unique location within sequence S, wherein n is a natural number; flipping the pages of tuple sequence S after sorting, and displaying the data after page flipping to the client. After applying the embodiment of the present invention, a unified identification method can be used to fully utilize every dimension of the data, until it is possible to uniquely identify a data entry, thereby able to obtain a unique identification in a long sequence for multidimensional data, thus improving the accuracy of page flips, with accuracy reaching 100% while reducing query time by over 50%.

Description

Distributed data page turning method and system

The present invention relates to the field of Internet application technologies, and more particularly to a distributed data paging method and system. Background of the invention

Currently in distributed systems, data is typically stored in a background server, and upper-level users request data from each server. Since the data is generated by users, a large number of users will inevitably generate massive amounts of data, and at the same time will bring a large amount of reading and writing. How to store this massive amount of data and how to provide high-concurrency read and write services are all issues that UGC business systems must face. For example, in the current Weibo system, it is necessary to display a large amount of data to the front-end client in units of pages (the page is a collection of a certain amount of data), which involves the page turning technical problem of the data. The front-end client can page up, down, or jump to the first or last page as needed. A good paging scheme can avoid the occurrence of double-page data duplication or partial data loss in the dynamic calculation of page turning. Among them: Flipping down generally means that a certain amount of data is returned in order from newest to oldest. The more you turn down, the more you can view the latest data;

In the current data paging technology, data paging is usually performed on a single time base; when there are the same type of data at the same time point and distributed on different servers, the storage location of the data in the server is simply based on That is, the reading order of the server) to determine the paging point.

However, when the data storage location at the same time point changes relatively, if the original page turning mark is used to turn the page, part of the data or data repeated to the front end will be missed. For example, the sequence of data tuples {time, id} is set as follows:

Sequence: {1000, 9}, {1000, 11}, {1000, 16}, {1000,

7}, ....

Corresponding storage machine: Server 0 Server 1 Server 2 Server 0 When the user requests data, according to the server 0 server 1 server 2 server

The order of 0 is sequentially requested. Assuming that the sequence of data read is as shown in the above sequence, and the page turning point given last time is {1000, 16}, the user's next request to page down should return {1000, 7} And future data, so that the same user can request all the complete and non-repeating data. But on server 1, the data tuple {1000, 88} came at the same point in time and caused the tuple sequence to change: 3⁄4 down:

Sequence: {1000, 88}, {1000, 11}, {1000, 16}, {1000, 7}, {1000, 9},

Storage Machine: Server 1 Server 1 Server 2 Server 0 Server 0

If the paging point is still {1000, 16} and the server reading order is changed to Server 1 Server 2 Server 0 Server 1, the data {1000, 88} on Server 1 will not be presented to the client, ie the data will be Was missing. This means that as long as the server reads the order two times before and after, the same user will miss some data when requesting data at the same point in time.

Moreover, in a distributed system, since there are multiple sets of backup devices that provide read capability at the same time, there is no way to force the machine data to be read in the same order for each request. Similarly, when the page is turned up, if the data changes again at the same time, some data will not be correctly sent to the foreground. At the same time, to find { 1000, 88} in the data, you need to iterate through all data tuples with a time of 1000, which is very inefficient. Summary of the invention

Embodiments of the present invention provide a distributed data page turning method, which can improve the accuracy of data page turning.

Embodiments of the present invention provide a distributed data paging system that can improve the accuracy of data page turning.

The technical solution of the embodiment of the present invention is as follows:

A data page turning method for a distributed system, comprising:

Sorting the sequence S of unordered data tuples stored in the server, where the items included in S are n-dimensional data sets, and the sorting includes:

Setting the n-dimensional dimension order priority;

The items in the S are sorted according to the priority order of the dimension order, wherein the items of the same priority order of the dimension order are further in the order of the next dimension order of the same dimension order priority. The items are sorted until each item of the sequence S can determine a unique position in the entire sequence S, where n is a natural number;

The sequence of data tuples S after the sorting is paged, and the paged data is presented to the client.

The setting dimension order priority is: setting a dimension order priority for the n-dimensional according to the n-dimensional importance, wherein the greater the importance, the higher the dimension order priority.

The n is 2, and the n dimension is time and data ID.

The page is turned over by the sorted data tuple sequence S, and the paged data is presented to the client, including:

The sequenced data tuple sequence S is paged according to the page turning mark, and the paged data is presented to the client.

The data page turning method of a distributed system according to claim 4, wherein the page turning mark is a data tuple or a position index. A data paging system of a distributed system, comprising a sorting unit and a page turning unit, wherein: the sorting unit is configured to sort the unordered data tuple sequence S stored in the server, wherein the items included in the S For an n-dimensional data set, the ordering includes:

Setting the n-dimensional dimension order priority; sorting the items in the S according to the dimension order priority, wherein the items in the same order of the dimension order are further in the next dimension order of the same dimension order priority The priority sorts the items of the same order priority order until each item of the sequence S can determine a unique position in the entire sequence S, where n is a natural number;

The page turning unit is configured to page through the sorted data tuple sequence S, and present the paged data to the client.

The sorting unit includes an importance setting sub-unit for setting a dimension order priority for the n-dimensional according to the n-dimensional importance, wherein the greater the importance, the higher the dimension order priority.

The n is 2, and the n dimension is time and data ID.

The page turning unit includes a mark page turning subunit for turning pages of the sorted data tuple sequence S according to the page turning mark, and presenting the paged data to the client.

The page turning is marked as a data tuple or a location index.

As can be seen from the above technical solution, in the embodiment of the present invention, the unordered data tuple sequence S stored in the server is first sorted until each item of the sequence S can determine a unique position in the entire sequence S. Where n is a natural number; then the page sequence of the sorted data tuple S is paged, and the paged data is presented to the client. Therefore, after applying the embodiment of the present invention, a unified Identify methods to make full use of each dimension of the data until a single data item can be uniquely located, so multidimensional data can be uniquely located in a long sequence, thereby improving the accuracy of page turning, and the accuracy can reach 100%. And at the same time save more than 50% of the query time. BRIEF DESCRIPTION OF THE DRAWINGS

1 is a flowchart of a distributed data paging method according to an embodiment of the present invention;

2 is a structural diagram of a distributed data paging system according to an embodiment of the present invention. Mode for carrying out the invention

The present invention will be further described in detail with reference to the accompanying drawings and embodiments.

In an embodiment of the present invention, for a disordered multidimensional data tuple stored in a server, a uniform identification method is utilized to fully utilize each dimension of the data until a data item can be uniquely located.

1 is a flow chart of a distributed data page turning method according to an embodiment of the present invention.

As shown in Figure 1, the method includes:

Step 101: Sort the sequence S of the out-of-order data tuple stored in the server, where the items included in the S are n-dimensional data sets, and the sorting includes:

Setting the n-dimensional dimension order priority;

The dimension order priority may be set for the n-dimensional according to the importance of the n-dimensional, wherein the greater the importance, the higher the priority of the dimension order. For example, dimensions can be time and data IDs.

More specifically, in the general case, the sequence of unordered data tuples S={T1, T2, T3_Tn} is stored in the server, where the Xth term in S (Τχ=1, 2— η) Is a Multidimensional data set {kl, k2, k3, ... kn}.

At this point, first set the dimension order priority of n-dimensional (ie kl, k2, k3, ... kn ), then for each element T in S (x=l, 2-.n), first according to the order of dimensions The highest priority dimension is sorted (assumed to be kl), and when kl is the same, it is sorted by the next dimension order priority of kl (assumed to be k2). When k2 is the same, it is sorted by the next dimension order priority of k2 (assumed to be k3)... until Tx can find a unique position in the entire sequence S.

At this time, if the tuple sequence S is placed in a one-dimensional array, the array subscript of each Tx is fixed. Each time a new tuple is inserted, the entire S is reordered and each element re-finits its own fixed position. Thus, for a given paged Tx or array indexed index, one item T can be uniquely found to determine the point of demarcation.

For example, if there is a data tuple {time, id}, regardless of the state of the data stored in the server, regardless of the order in which the servers are read, logically they should ensure that their order has no semantic ambiguity. For example, you can sort by time dimension in reverse order, where at the same time point, then sort by numerical id in reverse order. E.g:

{2000, 7}, {1000, 7}, {1000, 9}, {1000, 11}, {1000, 16}, {1000, 88}, {500, 7}, ...

This way, when you add a { 1000, 19} data item, it has a unique location, which is between { 1000, 16}, {1000, 88}. At this time, regardless of whether the page turning mark is a data tuple or a position index, the paged data can be given only accurately, and the storage of the data in the bottom layer can be disordered without affecting the upper layer logic.

Not only that, when you need to find the data item { 1000, 11}, you can first find the data item with the time 1000, and then decide the left or right shift according to the number id without traversing the list of all the data items with the time 1000. On average, time is saved by 50%. For Bo's application of huge concurrent traffic at the same time, the effectiveness of the test is very good. Step 102: Paging the sequenced data tuple sequence S, and presenting the paged data to the client.

Here, the sorted data tuple sequence S can be paged according to the page turning mark, and the paged data is presented to the client. More specifically, the page turning mark can be a data tuple or a position index. That is, the page turning mark can be any one of the data tuple sequences or a pre-specified position index.

Based on the above analysis, an embodiment of the present invention also proposes a data paging system of a distributed system.

2 is a structural diagram of a distributed data paging system according to an embodiment of the present invention.

As shown in FIG. 2, the system includes a sorting unit 201 and a page turning unit 201, wherein: a sorting unit 201 is configured to sort the unordered data tuple sequence S stored in the server, where S includes an item n Dimension data collection, the ordering includes:

The page turning unit 201 is configured to page through the sorted data tuple sequence S, and present the paged data to the client.

The sorting unit 201 may include an importance setting sub-unit, configured to set a dimension order priority for the n-dimensional according to the importance of the n-dimensional, wherein the greater the importance, the higher the priority of the dimension order. Moreover, n can be specifically 2, and the n-dimensional can be time and data ID.

Moreover, the page turning unit 201 may include a mark page turning subunit for turning pages of the sorted data tuple sequence S according to the page turning mark, and presenting the paged data to the client. More specifically, the page turning tag can be a data tuple or a location index. In summary, in the embodiment of the present invention, the unordered data tuple sequence S stored in the server is first sorted until each item of the sequence S can determine a unique position in the entire sequence S, where n is The natural number; then the page sequence of the sorted data tuple S is paged, and the paged data is presented to the client. Therefore, after applying the embodiment of the present invention, a unified identification method can be used. Make full use of every dimension of the data until it can uniquely locate a data item, so multidimensional data can be uniquely located in a long sequence, which improves the accuracy of page turning, and the accuracy can even reach 100%, and at the same time save More than 50% of the query time.

The above is only the preferred embodiment of the present invention and is not intended to limit the scope of the present invention. Any modifications, equivalents, improvements, etc. made within the spirit and scope of the present invention are intended to be included within the scope of the present invention.

Claims

Claim

A data page turning method for a distributed system, comprising: sorting an unordered data tuple sequence S stored in a server, wherein an item included in S is an n-dimensional data set, and the sorting Includes:

Setting the n-dimensional dimension order priority;

The items in the S are sorted according to the priority order of the dimensions, wherein the items of the same priority order of the dimension order are further in the same order of priority of the next dimension order of the same dimension order priority. The items are sorted until each item of the sequence S can determine a unique position in the entire sequence S, where n is a natural number;

The data page turning method of the distributed system according to claim 1, wherein the setting dimension order priority is: setting a dimension order priority for the n-dimensional according to the n-dimensional importance, The greater the importance, the higher the priority of the dimension order.

3. The data page turning method of a distributed system according to claim 1, wherein said n is 2, and said n-dimensional is time and data ID.

The data page turning method of the distributed system according to claim 1, 2 or 3, wherein the page sequence of the sorted data tuple S is turned over, and the page is turned The data presented to the client includes:

5. The data page turning method of a distributed system according to claim 4, wherein the page turning mark is a data tuple or a position index.

6. A data paging system for a distributed system, comprising: a sorting unit and a page turning unit, wherein:

The sorting unit is configured to sort the unordered data tuple sequence S stored in the server, where the items included in the S are n-dimensional data sets, and the sorting includes:

The data paging system of the distributed system according to claim 6, wherein the sorting unit includes an importance setting sub-unit, configured to set a dimension for the n-dimensional according to the importance of the n-dimensional Sequential priority, where the greater the importance, the higher the priority of the dimension order.

8. The data paging system of a distributed system according to claim 6, wherein said n is 2, and said n-dimensional is time and data ID.

The data paging system of the distributed system according to claim 6, 7 or 8, wherein the page turning unit comprises a mark page turning subunit, configured to perform the sorting according to the page turning mark pair The data tuple sequence S is paged, and the paged data is presented to the client.

10. The data paging system of a distributed system according to claim 9, wherein the page turning is marked as a data tuple or a position index.