WO2012041235A1 - Page flipping method and system for distributed system - Google Patents

Page flipping method and system for distributed system Download PDF

Info

Publication number
WO2012041235A1
WO2012041235A1 PCT/CN2011/080321 CN2011080321W WO2012041235A1 WO 2012041235 A1 WO2012041235 A1 WO 2012041235A1 CN 2011080321 W CN2011080321 W CN 2011080321W WO 2012041235 A1 WO2012041235 A1 WO 2012041235A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
sequence
priority
page turning
order
Prior art date
Application number
PCT/CN2011/080321
Other languages
French (fr)
Chinese (zh)
Inventor
袁清
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Publication of WO2012041235A1 publication Critical patent/WO2012041235A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/957Browsing optimisation, e.g. caching or content distillation
    • G06F16/9577Optimising the visualization of content, e.g. distillation of HTML documents

Definitions

  • the present invention relates to the field of Internet application technologies, and more particularly to a distributed data paging method and system. Background of the invention
  • a good paging scheme can avoid the occurrence of double-page data duplication or partial data loss in the dynamic calculation of page turning.
  • Flipping down generally means that a certain amount of data is returned in order from newest to oldest. The more you turn down, the more you can view the latest data;
  • data paging is usually performed on a single time base; when there are the same type of data at the same time point and distributed on different servers, the storage location of the data in the server is simply based on That is, the reading order of the server) to determine the paging point.
  • the sequence of data tuples ⁇ time, id ⁇ is set as follows:
  • Corresponding storage machine Server 0 Server 1 Server 2 Server 0 When the user requests data, according to the server 0 server 1 server 2 server
  • Embodiments of the present invention provide a distributed data page turning method, which can improve the accuracy of data page turning.
  • Embodiments of the present invention provide a distributed data paging system that can improve the accuracy of data page turning.
  • a data page turning method for a distributed system comprising:
  • Sorting the sequence S of unordered data tuples stored in the server, where the items included in S are n-dimensional data sets, and the sorting includes:
  • the items in the S are sorted according to the priority order of the dimension order, wherein the items of the same priority order of the dimension order are further in the order of the next dimension order of the same dimension order priority.
  • the items are sorted until each item of the sequence S can determine a unique position in the entire sequence S, where n is a natural number;
  • sequence of data tuples S after the sorting is paged, and the paged data is presented to the client.
  • the setting dimension order priority is: setting a dimension order priority for the n-dimensional according to the n-dimensional importance, wherein the greater the importance, the higher the dimension order priority.
  • the n is 2, and the n dimension is time and data ID.
  • the page is turned over by the sorted data tuple sequence S, and the paged data is presented to the client, including:
  • sequenced data tuple sequence S is paged according to the page turning mark, and the paged data is presented to the client.
  • a data paging system of a distributed system comprising a sorting unit and a page turning unit, wherein: the sorting unit is configured to sort the unordered data tuple sequence S stored in the server, wherein the items included in the S For an n-dimensional data set, the ordering includes:
  • n-dimensional dimension order priority Sorting the items in the S according to the dimension order priority, wherein the items in the same order of the dimension order are further in the next dimension order of the same dimension order priority
  • the priority sorts the items of the same order priority order until each item of the sequence S can determine a unique position in the entire sequence S, where n is a natural number;
  • the page turning unit is configured to page through the sorted data tuple sequence S, and present the paged data to the client.
  • the sorting unit includes an importance setting sub-unit for setting a dimension order priority for the n-dimensional according to the n-dimensional importance, wherein the greater the importance, the higher the dimension order priority.
  • the n is 2, and the n dimension is time and data ID.
  • the page turning unit includes a mark page turning subunit for turning pages of the sorted data tuple sequence S according to the page turning mark, and presenting the paged data to the client.
  • the page turning is marked as a data tuple or a location index.
  • the unordered data tuple sequence S stored in the server is first sorted until each item of the sequence S can determine a unique position in the entire sequence S.
  • n is a natural number; then the page sequence of the sorted data tuple S is paged, and the paged data is presented to the client. Therefore, after applying the embodiment of the present invention, a unified Identify methods to make full use of each dimension of the data until a single data item can be uniquely located, so multidimensional data can be uniquely located in a long sequence, thereby improving the accuracy of page turning, and the accuracy can reach 100%. And at the same time save more than 50% of the query time.
  • FIG. 1 is a flowchart of a distributed data paging method according to an embodiment of the present invention
  • FIG. 2 is a structural diagram of a distributed data paging system according to an embodiment of the present invention. Mode for carrying out the invention
  • a uniform identification method is utilized to fully utilize each dimension of the data until a data item can be uniquely located.
  • FIG. 1 is a flow chart of a distributed data page turning method according to an embodiment of the present invention.
  • the method includes:
  • Step 101 Sort the sequence S of the out-of-order data tuple stored in the server, where the items included in the S are n-dimensional data sets, and the sorting includes:
  • the items in the S are sorted according to the priority order of the dimension order, wherein the items of the same priority order of the dimension order are further in the order of the next dimension order of the same dimension order priority.
  • the items are sorted until each item of the sequence S can determine a unique position in the entire sequence S, where n is a natural number;
  • sequence of data tuples S after the sorting is paged, and the paged data is presented to the client.
  • the dimension order priority may be set for the n-dimensional according to the importance of the n-dimensional, wherein the greater the importance, the higher the priority of the dimension order.
  • dimensions can be time and data IDs.
  • Step 102 Paging the sequenced data tuple sequence S, and presenting the paged data to the client.
  • the sorted data tuple sequence S can be paged according to the page turning mark, and the paged data is presented to the client.
  • the page turning mark can be a data tuple or a position index. That is, the page turning mark can be any one of the data tuple sequences or a pre-specified position index.
  • an embodiment of the present invention also proposes a data paging system of a distributed system.
  • FIG. 2 is a structural diagram of a distributed data paging system according to an embodiment of the present invention.
  • the system includes a sorting unit 201 and a page turning unit 201, wherein: a sorting unit 201 is configured to sort the unordered data tuple sequence S stored in the server, where S includes an item n Dimension data collection, the ordering includes:
  • n-dimensional dimension order priority Sorting the items in the S according to the dimension order priority, wherein the items in the same order of the dimension order are further in the next dimension order of the same dimension order priority
  • the priority sorts the items of the same order priority order until each item of the sequence S can determine a unique position in the entire sequence S, where n is a natural number;
  • the page turning unit 201 is configured to page through the sorted data tuple sequence S, and present the paged data to the client.
  • the sorting unit 201 may include an importance setting sub-unit, configured to set a dimension order priority for the n-dimensional according to the importance of the n-dimensional, wherein the greater the importance, the higher the priority of the dimension order.
  • n can be specifically 2
  • the n-dimensional can be time and data ID.
  • the page turning unit 201 may include a mark page turning subunit for turning pages of the sorted data tuple sequence S according to the page turning mark, and presenting the paged data to the client.
  • the page turning tag can be a data tuple or a location index.
  • the unordered data tuple sequence S stored in the server is first sorted until each item of the sequence S can determine a unique position in the entire sequence S, where n is The natural number; then the page sequence of the sorted data tuple S is paged, and the paged data is presented to the client. Therefore, after applying the embodiment of the present invention, a unified identification method can be used.

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

Disclosed in the embodiment of the present invention are a page flipping method and system for distributed systems, comprising: sorting an unordered data tuple sequence S stored in the server until every entry of sequence S can be ensured of a unique location within sequence S, wherein n is a natural number; flipping the pages of tuple sequence S after sorting, and displaying the data after page flipping to the client. After applying the embodiment of the present invention, a unified identification method can be used to fully utilize every dimension of the data, until it is possible to uniquely identify a data entry, thereby able to obtain a unique identification in a long sequence for multidimensional data, thus improving the accuracy of page flips, with accuracy reaching 100% while reducing query time by over 50%.

Description

一种分布式数据翻页方法和系统 技术领域  Distributed data page turning method and system
本发明涉及互联网应用技术领域, 更具体地, 涉及一种分布式数据 翻页方法和系统。 发明背景  The present invention relates to the field of Internet application technologies, and more particularly to a distributed data paging method and system. Background of the invention
目前在分布式系统中, 数据一般存储在后台服务器中, 上层用户会 请求各个服务器中的数据。 由于数据是用户产生的, 海量的用户必然会 催生出海量的数据, 同时又会带来海量的读写量。 如何存储这些海量数 据, 以及如何提供高并发的读写服务, 都是 UGC业务系统必然面临的 问题。 比如, 在目前微博系统中, 需要以页为单位(页是一定量数据的 集合)将大量的数据展示给前台客户端, 这就涉及到数据的翻页技术问 题。 前台客户端可以根据需求, 进行向上, 向下翻页, 或者跳跃到第一 页或最后一页。 好的分页方案, 可以在翻页动态计算时, 避免出现前后 两页数据重复, 或部分数据丟失的问题。 其中: 向下翻页一般指一定量 的数据按时间从新到旧的顺序返回, 越往下翻, 就越能查看到最新的数 据; 向上翻页则相反。  Currently in distributed systems, data is typically stored in a background server, and upper-level users request data from each server. Since the data is generated by users, a large number of users will inevitably generate massive amounts of data, and at the same time will bring a large amount of reading and writing. How to store this massive amount of data and how to provide high-concurrency read and write services are all issues that UGC business systems must face. For example, in the current Weibo system, it is necessary to display a large amount of data to the front-end client in units of pages (the page is a collection of a certain amount of data), which involves the page turning technical problem of the data. The front-end client can page up, down, or jump to the first or last page as needed. A good paging scheme can avoid the occurrence of double-page data duplication or partial data loss in the dynamic calculation of page turning. Among them: Flipping down generally means that a certain amount of data is returned in order from newest to oldest. The more you turn down, the more you can view the latest data;
在目前的数据翻页技术中,通常单一以时间为基准轴进行数据分页; 当同一时间点内有相同类型的数据, 且分布在不同的服务器时, 则单纯 根据数据在服务器中的存储位置(也就是服务器的读取顺序)来确定分 页点。  In the current data paging technology, data paging is usually performed on a single time base; when there are the same type of data at the same time point and distributed on different servers, the storage location of the data in the server is simply based on That is, the reading order of the server) to determine the paging point.
然而, 当同一时间点的数据存储位置发生相对变动时, 若还用原来 的翻页标记来翻页, 则会漏掉部分数据或给前端重复的数据。 例如, 设有数据元组 {time, id} 序列如下: However, when the data storage location at the same time point changes relatively, if the original page turning mark is used to turn the page, part of the data or data repeated to the front end will be missed. For example, the sequence of data tuples {time, id} is set as follows:
序列: {1000, 9}, {1000, 11}, {1000, 16}, { 1000, Sequence: {1000, 9}, {1000, 11}, {1000, 16}, {1000,
7}, .... 7}, ....
对应存储机器: 服务器 0 服务器 1 服务器 2 服务器 0 当用户请求数据时, 若按照服务器 0 服务器 1 服务器 2 服务器 Corresponding storage machine: Server 0 Server 1 Server 2 Server 0 When the user requests data, according to the server 0 server 1 server 2 server
0 的顺序依次请求, 假设读出的数据顺序如上序列所示, 且上次给出的 翻页点是 {1000, 16}, 则用户的下一次请求向下翻页应该返回 { 1000, 7}及以后的数据, 这样才能保证同一个用户能请求到所有完整且不重复 的数据。 但是当在服务器 1上, 同一时间点来了数据元组 { 1000, 88}, 并导致元组序列变动: ¾口下时: The order of 0 is sequentially requested. Assuming that the sequence of data read is as shown in the above sequence, and the page turning point given last time is {1000, 16}, the user's next request to page down should return {1000, 7} And future data, so that the same user can request all the complete and non-repeating data. But on server 1, the data tuple {1000, 88} came at the same point in time and caused the tuple sequence to change: 3⁄4 down:
序列: {1000, 88}, { 1000, 11}, {1000, 16}, { 1000, 7}, {1000, 9},  Sequence: {1000, 88}, {1000, 11}, {1000, 16}, {1000, 7}, {1000, 9},
存储机器: 服务器 1 服务器 1 服务器 2 服 务 器 0 服务器 0  Storage Machine: Server 1 Server 1 Server 2 Server 0 Server 0
若此时分页点仍然是 {1000, 16},而服务器读取顺序改为服务器 1 服务器 2 服务器 0 服务器 1, 则服务器 1上的数据 {1000, 88}将无法 呈现给客户端, 即数据会被漏掉。 这表明, 只要服务器读取顺序前后两 次请求不一样, 则同一个用户请求相同时间点内的数据时, 会漏掉部分 数据。  If the paging point is still {1000, 16} and the server reading order is changed to Server 1 Server 2 Server 0 Server 1, the data {1000, 88} on Server 1 will not be presented to the client, ie the data will be Was missing. This means that as long as the server reads the order two times before and after, the same user will miss some data when requesting data at the same point in time.
而且, 在分布式系统中, 由于有多套备份设备同时提供读的能力, 是没有办法强制每次请求时, 按同样的顺序读取机器数据的。 同样, 在 向上翻页时, 若同一时间内数据再次有变化, 将导致某些数据不能正确 的给到前台。 同时, 若要在数据中查找 { 1000, 88}, 需要遍历所有 time 为 1000的数据元组, 效率非常低下。 发明内容 Moreover, in a distributed system, since there are multiple sets of backup devices that provide read capability at the same time, there is no way to force the machine data to be read in the same order for each request. Similarly, when the page is turned up, if the data changes again at the same time, some data will not be correctly sent to the foreground. At the same time, to find { 1000, 88} in the data, you need to iterate through all data tuples with a time of 1000, which is very inefficient. Summary of the invention
本发明实施方式提出一种分布式数据翻页方法, 该方法能够提高数 据翻页的准确性。  Embodiments of the present invention provide a distributed data page turning method, which can improve the accuracy of data page turning.
本发明实施方式提出一种分布式数据翻页系统, 该系统能够提高数 据翻页的准确性。  Embodiments of the present invention provide a distributed data paging system that can improve the accuracy of data page turning.
本发明实施方式的技术方案如下:  The technical solution of the embodiment of the present invention is as follows:
一种分布式系统的数据翻页方法, 包括:  A data page turning method for a distributed system, comprising:
针对存储于服务器中的无序数据元组序列 S进行排序, 其中 S所包 含的项为 n维数据集合, 所述排序包括:  Sorting the sequence S of unordered data tuples stored in the server, where the items included in S are n-dimensional data sets, and the sorting includes:
设置所述 n维的维度顺序优先级;  Setting the n-dimensional dimension order priority;
按照所述维度顺序优先级对 S中的项进行排序, 其中对于维度顺序 优先级相同的项, 进一步按照该相同的维度顺序优先级的下一维度顺序 优先级对所述维度顺序优先级相同的项进行排序, 直到所述序列 S的每 一项都可以在整个序列 S中确定唯一位置, 其中 n为自然数;  The items in the S are sorted according to the priority order of the dimension order, wherein the items of the same priority order of the dimension order are further in the order of the next dimension order of the same dimension order priority. The items are sorted until each item of the sequence S can determine a unique position in the entire sequence S, where n is a natural number;
对经过所述排序后的数据元组序列 S进行翻页, 并将翻页后的数据 呈现给客户端。  The sequence of data tuples S after the sorting is paged, and the paged data is presented to the client.
所述设置维度顺序优先级为: 根据所述 n维的重要性为所述 n维设 置维度顺序优先级, 其中重要性越大, 维度顺序优先级越高。  The setting dimension order priority is: setting a dimension order priority for the n-dimensional according to the n-dimensional importance, wherein the greater the importance, the higher the dimension order priority.
所述 n为 2 , 所述 n维为时间和数据 ID。  The n is 2, and the n dimension is time and data ID.
所述对经过所述排序后的数据元组序列 S进行翻页, 并将翻页后的 数据呈现给客户端包括:  The page is turned over by the sorted data tuple sequence S, and the paged data is presented to the client, including:
根据翻页标记对经过所述排序后的数据元组序列 S进行翻页, 并将 翻页后的数据呈现给客户端。  The sequenced data tuple sequence S is paged according to the page turning mark, and the paged data is presented to the client.
根据权利要求 4所述的分布式系统的数据翻页方法, 其特征在于, 所述翻页标记为数据元组或位置索引。 一种分布式系统的数据翻页系统, 包括排序单元和翻页单元,其中: 所述排序单元, 用于针对存储于服务器中的无序数据元组序列 S进 行排序, 其中 S所包含的项为 n维数据集合, 所述排序包括: The data page turning method of a distributed system according to claim 4, wherein the page turning mark is a data tuple or a position index. A data paging system of a distributed system, comprising a sorting unit and a page turning unit, wherein: the sorting unit is configured to sort the unordered data tuple sequence S stored in the server, wherein the items included in the S For an n-dimensional data set, the ordering includes:
设置所述 n维的维度顺序优先级; 按照所述维度顺序优先级对 S中 的项进行排序, 其中对于维度顺序优先级相同的项, 进一步按照该相同 的维度顺序优先级的下一维度顺序优先级对所述维度顺序优先级相同 的项进行排序, 直到所述序列 S的每一项都可以在整个序列 S中确定唯 一位置, 其中 n为自然数;  Setting the n-dimensional dimension order priority; sorting the items in the S according to the dimension order priority, wherein the items in the same order of the dimension order are further in the next dimension order of the same dimension order priority The priority sorts the items of the same order priority order until each item of the sequence S can determine a unique position in the entire sequence S, where n is a natural number;
所述翻页单元,用于对经过所述排序后的数据元组序列 S进行翻页, 并将翻页后的数据呈现给客户端。  The page turning unit is configured to page through the sorted data tuple sequence S, and present the paged data to the client.
所述排序单元包括重要性设置子单元, 用于根据所述 n维的重要性 为所述 n维设置维度顺序优先级, 其中重要性越大, 维度顺序优先级越 高。  The sorting unit includes an importance setting sub-unit for setting a dimension order priority for the n-dimensional according to the n-dimensional importance, wherein the greater the importance, the higher the dimension order priority.
所述 n为 2 , 所述 n维为时间和数据 ID。  The n is 2, and the n dimension is time and data ID.
所述翻页单元包括标记翻页子单元, 用于根据翻页标记对经过所述 排序后的数据元组序列 S进行翻页, 并将翻页后的数据呈现给客户端。  The page turning unit includes a mark page turning subunit for turning pages of the sorted data tuple sequence S according to the page turning mark, and presenting the paged data to the client.
所述翻页标记为数据元组或位置索引。  The page turning is marked as a data tuple or a location index.
从上述技术方案可以看出, 在本发明实施方式中, 首先针对存储于 服务器中的无序数据元组序列 S进行排序, 直到序列 S的每一项都可以 在整个序列 S中确定唯一位置, 其中 n为自然数; 然后再对经过所述排 序后的数据元组序列 S进行翻页, 并将翻页后的数据呈现给客户端, 因 此, 应用本发明实施方式以后, 可以用一种统一的标识方法来充分利用 数据的每一个维度, 直到能够唯一的定位一个数据项, 因此多维数据可 以在长序列中获得唯一的定位, 从而提高了翻页的准确率, 准确率甚至 可以达到 100% , 而且同时节省了 50%以上的查询时间。 附图简要说明 As can be seen from the above technical solution, in the embodiment of the present invention, the unordered data tuple sequence S stored in the server is first sorted until each item of the sequence S can determine a unique position in the entire sequence S. Where n is a natural number; then the page sequence of the sorted data tuple S is paged, and the paged data is presented to the client. Therefore, after applying the embodiment of the present invention, a unified Identify methods to make full use of each dimension of the data until a single data item can be uniquely located, so multidimensional data can be uniquely located in a long sequence, thereby improving the accuracy of page turning, and the accuracy can reach 100%. And at the same time save more than 50% of the query time. BRIEF DESCRIPTION OF THE DRAWINGS
图 1为本发明实施方式的分布式数据翻页方法流程图;  1 is a flowchart of a distributed data paging method according to an embodiment of the present invention;
图 2为本发明实施方式的分布式数据翻页系统结构图。 实施本发明的方式  2 is a structural diagram of a distributed data paging system according to an embodiment of the present invention. Mode for carrying out the invention
为使本发明的目的、 技术方案和优点表达得更加清楚明白, 下面结 合附图及具体实施方式对本发明再作进一步详细的说明。  The present invention will be further described in detail with reference to the accompanying drawings and embodiments.
在本发明实施方式中, 对于存储于服务器中的无序多维数据元组, 用一种统一的标识方法来充分利用数据的每一个维度, 直到能够唯一的 定位一个数据项。  In an embodiment of the present invention, for a disordered multidimensional data tuple stored in a server, a uniform identification method is utilized to fully utilize each dimension of the data until a data item can be uniquely located.
图 1为根据本发明实施方式的分布式数据翻页方法流程图。  1 is a flow chart of a distributed data page turning method according to an embodiment of the present invention.
如图 1所示, 该方法包括:  As shown in Figure 1, the method includes:
步骤 101: 针对存储于服务器中的无序数据元组序列 S进行排序, 其中 S所包含的项为 n维数据集合, 所述排序包括:  Step 101: Sort the sequence S of the out-of-order data tuple stored in the server, where the items included in the S are n-dimensional data sets, and the sorting includes:
设置所述 n维的维度顺序优先级;  Setting the n-dimensional dimension order priority;
按照所述维度顺序优先级对 S中的项进行排序, 其中对于维度顺序 优先级相同的项, 进一步按照该相同的维度顺序优先级的下一维度顺序 优先级对所述维度顺序优先级相同的项进行排序, 直到所述序列 S的每 一项都可以在整个序列 S中确定唯一位置, 其中 n为自然数;  The items in the S are sorted according to the priority order of the dimension order, wherein the items of the same priority order of the dimension order are further in the order of the next dimension order of the same dimension order priority. The items are sorted until each item of the sequence S can determine a unique position in the entire sequence S, where n is a natural number;
对经过所述排序后的数据元组序列 S进行翻页, 并将翻页后的数据 呈现给客户端。  The sequence of data tuples S after the sorting is paged, and the paged data is presented to the client.
其中, 可以根据 n维的重要性为 n维设置维度顺序优先级, 其中重 要性越大, 维度顺序优先级越高。 比如, 维度可以为时间和数据 ID。  The dimension order priority may be set for the n-dimensional according to the importance of the n-dimensional, wherein the greater the importance, the higher the priority of the dimension order. For example, dimensions can be time and data IDs.
更具体地, 在通用情况下, 殳有无序数据元组序列 S= { T1 , T2, T3— Tn }存储于服务器中, 其中 S中的第 X项 Τχ ( χ=1 , 2— η )是一个 多维数据集合 {kl, k2, k3, ...kn}。 More specifically, in the general case, the sequence of unordered data tuples S={T1, T2, T3_Tn} is stored in the server, where the Xth term in S (Τχ=1, 2— η) Is a Multidimensional data set {kl, k2, k3, ... kn}.
此时, 首先设置 n维(即 kl, k2, k3, ... kn ) 的维度顺序优先 级, 然后对 S 中的每个元素 Tx (x=l, 2-.n), 首先按照维度顺序优先 级最高的维度来排序 (假设为 kl ), 当 kl相同的情况下, 转而按 kl的 下一维度顺序优先级(假设为 k2)来排序。 当 k2相同的情况下, 转而 按 k2的下一维度顺序优先级(假设为 k3 ) 来排序 ......直到 Tx可以在整 个序列中 S找到一个唯一确定的位置为止。  At this point, first set the dimension order priority of n-dimensional (ie kl, k2, k3, ... kn ), then for each element T in S (x=l, 2-.n), first according to the order of dimensions The highest priority dimension is sorted (assumed to be kl), and when kl is the same, it is sorted by the next dimension order priority of kl (assumed to be k2). When k2 is the same, it is sorted by the next dimension order priority of k2 (assumed to be k3)... until Tx can find a unique position in the entire sequence S.
此时, 若将元组序列 S放在一个一维数组中, 则每一个 Tx的数组 下标是固定的。 每插入一个新的元组, 整个 S会重新排序, 每个元素重 新找到自己的固定位置。 这样, 对于给定分页 Tx或数组下标索引, 都 可以唯一的找到一个项 T来确定分界点。  At this time, if the tuple sequence S is placed in a one-dimensional array, the array subscript of each Tx is fixed. Each time a new tuple is inserted, the entire S is reordered and each element re-finits its own fixed position. Thus, for a given paged Tx or array indexed index, one item T can be uniquely found to determine the point of demarcation.
比如,假如有数据元组 {time, id}, 则无论数据在服务器中的各个 储状态如何, 无论服务器的读取顺序如何, 在逻辑上它们应确保其顺序 无语义歧义。 比如, 可以先按时间维度倒序排序, 其中在同一时间点下, 再按数字 id倒序排序。 例如:  For example, if there is a data tuple {time, id}, regardless of the state of the data stored in the server, regardless of the order in which the servers are read, logically they should ensure that their order has no semantic ambiguity. For example, you can sort by time dimension in reverse order, where at the same time point, then sort by numerical id in reverse order. E.g:
{2000, 7}, {1000, 7}, {1000, 9}, { 1000, 11}, { 1000, 16}, {1000, 88}, {500, 7}, ...  {2000, 7}, {1000, 7}, {1000, 9}, {1000, 11}, {1000, 16}, {1000, 88}, {500, 7}, ...
这样当新增 { 1000, 19}数据项时, 它有唯一确定的位置, 即在 { 1000, 16}, {1000, 88}之间。 此时, 无论翻页标记是一个数据元 组, 还是一个位置索引, 都能唯一准确的给出分页数据, 而且数据在底 层的存储可以无序, 不影响上层逻辑。  This way, when you add a { 1000, 19} data item, it has a unique location, which is between { 1000, 16}, {1000, 88}. At this time, regardless of whether the page turning mark is a data tuple or a position index, the paged data can be given only accurately, and the storage of the data in the bottom layer can be disordered without affecting the upper layer logic.
不仅如此, 当需要查找数据项 { 1000, 11}时, 可以先二分查找时 间为 1000的数据项, 然后根据数字 id来决定左移或右移, 而无需遍历 所有时间为 1000的数据项列表。 平均情况下, 时间节省 50%。 对于 博这种同一时刻巨大并发流量的应用来讲, 效验收益非常好。 步骤 102: 对经过所述排序后的数据元组序列 S进行翻页, 并将翻 页后的数据呈现给客户端。 Not only that, when you need to find the data item { 1000, 11}, you can first find the data item with the time 1000, and then decide the left or right shift according to the number id without traversing the list of all the data items with the time 1000. On average, time is saved by 50%. For Bo's application of huge concurrent traffic at the same time, the effectiveness of the test is very good. Step 102: Paging the sequenced data tuple sequence S, and presenting the paged data to the client.
在这里, 可以根据翻页标记对经过排序后的数据元组序列 S进行翻 页, 并将翻页后的数据呈现给客户端。 更具体地, 翻页标记可以为数据 元组或位置索引。 也就是说, 翻页标记可以是数据元组序列中的任一项 或者预先指定的位置索引。  Here, the sorted data tuple sequence S can be paged according to the page turning mark, and the paged data is presented to the client. More specifically, the page turning mark can be a data tuple or a position index. That is, the page turning mark can be any one of the data tuple sequences or a pre-specified position index.
基于上述分析, 本发明实施方式还提出了一种分布式系统的数据翻 页系统。  Based on the above analysis, an embodiment of the present invention also proposes a data paging system of a distributed system.
图 2为根据本发明实施方式的分布式数据翻页系统结构图。  2 is a structural diagram of a distributed data paging system according to an embodiment of the present invention.
如图 2所示, 该系统包括排序单元 201和翻页单元 201 , 其中: 排序单元 201 , 用于针对存储于服务器中的无序数据元组序列 S进 行排序, 其中 S所包含的项为 n维数据集合, 所述排序包括:  As shown in FIG. 2, the system includes a sorting unit 201 and a page turning unit 201, wherein: a sorting unit 201 is configured to sort the unordered data tuple sequence S stored in the server, where S includes an item n Dimension data collection, the ordering includes:
设置所述 n维的维度顺序优先级; 按照所述维度顺序优先级对 S中 的项进行排序, 其中对于维度顺序优先级相同的项, 进一步按照该相同 的维度顺序优先级的下一维度顺序优先级对所述维度顺序优先级相同 的项进行排序, 直到所述序列 S的每一项都可以在整个序列 S中确定唯 一位置, 其中 n为自然数;  Setting the n-dimensional dimension order priority; sorting the items in the S according to the dimension order priority, wherein the items in the same order of the dimension order are further in the next dimension order of the same dimension order priority The priority sorts the items of the same order priority order until each item of the sequence S can determine a unique position in the entire sequence S, where n is a natural number;
翻页单元 201 ,用于对经过所述排序后的数据元组序列 S进行翻页, 并将翻页后的数据呈现给客户端。  The page turning unit 201 is configured to page through the sorted data tuple sequence S, and present the paged data to the client.
其中, 排序单元 201可以包括重要性设置子单元, 用于根据所述 n 维的重要性为所述 n维设置维度顺序优先级, 其中重要性越大, 维度顺 序优先级越高。 而且, n可以具体为 2, 此时 n维可以为时间和数据 ID。  The sorting unit 201 may include an importance setting sub-unit, configured to set a dimension order priority for the n-dimensional according to the importance of the n-dimensional, wherein the greater the importance, the higher the priority of the dimension order. Moreover, n can be specifically 2, and the n-dimensional can be time and data ID.
而且, 翻页单元 201可以包括标记翻页子单元, 用于根据翻页标记 对经过所述排序后的数据元组序列 S进行翻页, 并将翻页后的数据呈现 给客户端。 更具体地, 翻页标记可以为数据元组或位置索引。 综上所述, 在本发明实施方式中, 首先针对存储于服务器中的无序 数据元组序列 S进行排序, 直到序列 S的每一项都可以在整个序列 S中 确定唯一位置, 其中 n为自然数; 然后再对经过所述排序后的数据元组 序列 S进行翻页, 并将翻页后的数据呈现给客户端, 因此, 应用本发明 实施方式以后, 可以用一种统一的标识方法来充分利用数据的每一个维 度, 直到能够唯一的定位一个数据项, 因此多维数据可以在长序列中获 得唯一的定位, 从而提高了翻页的准确率, 准确率甚至可以达到 100%, 而且同时节省了 50%以上的查询时间。 Moreover, the page turning unit 201 may include a mark page turning subunit for turning pages of the sorted data tuple sequence S according to the page turning mark, and presenting the paged data to the client. More specifically, the page turning tag can be a data tuple or a location index. In summary, in the embodiment of the present invention, the unordered data tuple sequence S stored in the server is first sorted until each item of the sequence S can determine a unique position in the entire sequence S, where n is The natural number; then the page sequence of the sorted data tuple S is paged, and the paged data is presented to the client. Therefore, after applying the embodiment of the present invention, a unified identification method can be used. Make full use of every dimension of the data until it can uniquely locate a data item, so multidimensional data can be uniquely located in a long sequence, which improves the accuracy of page turning, and the accuracy can even reach 100%, and at the same time save More than 50% of the query time.
以上所述, 仅为本发明的较佳实施方式而已, 并非用于限定本发明 的保护范围。 凡在本发明的精神和原则之内, 所作的任何修改、 等同替 换、 改进等, 均应包含在本发明的保护范围之内。  The above is only the preferred embodiment of the present invention and is not intended to limit the scope of the present invention. Any modifications, equivalents, improvements, etc. made within the spirit and scope of the present invention are intended to be included within the scope of the present invention.

Claims

权利要求书 Claim
1、 一种分布式系统的数据翻页方法, 其特征在于, 包括: 针对存储于服务器中的无序数据元组序列 S进行排序, 其中 S所包 含的项为 n维数据集合, 所述排序包括: A data page turning method for a distributed system, comprising: sorting an unordered data tuple sequence S stored in a server, wherein an item included in S is an n-dimensional data set, and the sorting Includes:
设置所述 n维的维度顺序优先级;  Setting the n-dimensional dimension order priority;
按照所述维度顺序优先级对 S 中的项进行排序, 其中对 于维度顺序优先级相同的项, 进一步按照该相同的维度顺序 优先级的下一维度顺序优先级对所述维度顺序优先级相同的 项进行排序, 直到所述序列 S 的每一项都可以在整个序列 S 中确定唯一位置, 其中 n为自然数;  The items in the S are sorted according to the priority order of the dimensions, wherein the items of the same priority order of the dimension order are further in the same order of priority of the next dimension order of the same dimension order priority. The items are sorted until each item of the sequence S can determine a unique position in the entire sequence S, where n is a natural number;
对经过所述排序后的数据元组序列 S进行翻页, 并将翻页后的数据 呈现给客户端。  The sequence of data tuples S after the sorting is paged, and the paged data is presented to the client.
2、根据权利要求 1所述的分布式系统的数据翻页方法,其特征在于, 所述设置维度顺序优先级为: 根据所述 n维的重要性为所述 n维设置维 度顺序优先级, 其中重要性越大, 维度顺序优先级越高。  The data page turning method of the distributed system according to claim 1, wherein the setting dimension order priority is: setting a dimension order priority for the n-dimensional according to the n-dimensional importance, The greater the importance, the higher the priority of the dimension order.
3、根据权利要求 1所述的分布式系统的数据翻页方法,其特征在于, 所述 n为 2 , 所述 n维为时间和数据 ID。  3. The data page turning method of a distributed system according to claim 1, wherein said n is 2, and said n-dimensional is time and data ID.
4、 根据权利要求 1、 2或 3所述的分布式系统的数据翻页方法, 其 特征在于, 所述对经过所述排序后的数据元组序列 S进行翻页, 并将翻 页后的数据呈现给客户端包括:  The data page turning method of the distributed system according to claim 1, 2 or 3, wherein the page sequence of the sorted data tuple S is turned over, and the page is turned The data presented to the client includes:
根据翻页标记对经过所述排序后的数据元组序列 S进行翻页, 并将 翻页后的数据呈现给客户端。  The sequenced data tuple sequence S is paged according to the page turning mark, and the paged data is presented to the client.
5、根据权利要求 4所述的分布式系统的数据翻页方法,其特征在于, 所述翻页标记为数据元组或位置索引。 5. The data page turning method of a distributed system according to claim 4, wherein the page turning mark is a data tuple or a position index.
6、 一种分布式系统的数据翻页系统, 其特征在于, 包括排序单元和 翻页单元, 其中: 6. A data paging system for a distributed system, comprising: a sorting unit and a page turning unit, wherein:
所述排序单元, 用于针对存储于服务器中的无序数据元组序列 S进 行排序, 其中 S所包含的项为 n维数据集合, 所述排序包括:  The sorting unit is configured to sort the unordered data tuple sequence S stored in the server, where the items included in the S are n-dimensional data sets, and the sorting includes:
设置所述 n维的维度顺序优先级; 按照所述维度顺序优先级对 S中 的项进行排序, 其中对于维度顺序优先级相同的项, 进一步按照该相同 的维度顺序优先级的下一维度顺序优先级对所述维度顺序优先级相同 的项进行排序, 直到所述序列 S的每一项都可以在整个序列 S中确定唯 一位置, 其中 n为自然数;  Setting the n-dimensional dimension order priority; sorting the items in the S according to the dimension order priority, wherein the items in the same order of the dimension order are further in the next dimension order of the same dimension order priority The priority sorts the items of the same order priority order until each item of the sequence S can determine a unique position in the entire sequence S, where n is a natural number;
所述翻页单元,用于对经过所述排序后的数据元组序列 S进行翻页, 并将翻页后的数据呈现给客户端。  The page turning unit is configured to page through the sorted data tuple sequence S, and present the paged data to the client.
7、根据权利要求 6所述的分布式系统的数据翻页系统,其特征在于, 所述排序单元包括重要性设置子单元, 用于根据所述 n维的重要性为所 述 n维设置维度顺序优先级, 其中重要性越大, 维度顺序优先级越高。  The data paging system of the distributed system according to claim 6, wherein the sorting unit includes an importance setting sub-unit, configured to set a dimension for the n-dimensional according to the importance of the n-dimensional Sequential priority, where the greater the importance, the higher the priority of the dimension order.
8、根据权利要求 6所述的分布式系统的数据翻页系统,其特征在于, 所述 n为 2 , 所述 n维为时间和数据 ID。  8. The data paging system of a distributed system according to claim 6, wherein said n is 2, and said n-dimensional is time and data ID.
9、 根据权利要求 6、 7或 8所述的分布式系统的数据翻页系统, 其 特征在于, 所述翻页单元包括标记翻页子单元, 用于根据翻页标记对经 过所述排序后的数据元组序列 S进行翻页, 并将翻页后的数据呈现给客 户端。  The data paging system of the distributed system according to claim 6, 7 or 8, wherein the page turning unit comprises a mark page turning subunit, configured to perform the sorting according to the page turning mark pair The data tuple sequence S is paged, and the paged data is presented to the client.
10、 根据权利要求 9所述的分布式系统的数据翻页系统, 其特征在 于, 所述翻页标记为数据元组或位置索引。  10. The data paging system of a distributed system according to claim 9, wherein the page turning is marked as a data tuple or a position index.
PCT/CN2011/080321 2010-09-28 2011-09-28 Page flipping method and system for distributed system WO2012041235A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN2010102995389A CN102419756A (en) 2010-09-28 2010-09-28 Distributed data page turning method and system
CN201010299538.9 2010-09-28

Publications (1)

Publication Number Publication Date
WO2012041235A1 true WO2012041235A1 (en) 2012-04-05

Family

ID=45891967

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2011/080321 WO2012041235A1 (en) 2010-09-28 2011-09-28 Page flipping method and system for distributed system

Country Status (2)

Country Link
CN (1) CN102419756A (en)
WO (1) WO2012041235A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105827721B (en) * 2016-04-20 2019-06-21 努比亚技术有限公司 A kind of data transmission method and device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1716244A (en) * 2003-12-29 2006-01-04 西安迪戈科技有限责任公司 Intelligent search, intelligent files system and automatic intelligent assistant
WO2008002527A2 (en) * 2006-06-28 2008-01-03 Microsoft Corporation Intelligently guiding search based on user dialog
WO2008144740A1 (en) * 2007-05-21 2008-11-27 Amazon Technologies, Inc. Consumption of items via a user device
CN101641674A (en) * 2006-10-05 2010-02-03 斯普兰克公司 Time series search engine
CN101699440A (en) * 2009-11-24 2010-04-28 中国电信股份有限公司 Service-based retrieving method and service-based retrieving system

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101237347B (en) * 2008-02-19 2011-06-29 中兴通讯股份有限公司 Page processing method for alarm data

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1716244A (en) * 2003-12-29 2006-01-04 西安迪戈科技有限责任公司 Intelligent search, intelligent files system and automatic intelligent assistant
WO2008002527A2 (en) * 2006-06-28 2008-01-03 Microsoft Corporation Intelligently guiding search based on user dialog
CN101641674A (en) * 2006-10-05 2010-02-03 斯普兰克公司 Time series search engine
WO2008144740A1 (en) * 2007-05-21 2008-11-27 Amazon Technologies, Inc. Consumption of items via a user device
CN101699440A (en) * 2009-11-24 2010-04-28 中国电信股份有限公司 Service-based retrieving method and service-based retrieving system

Also Published As

Publication number Publication date
CN102419756A (en) 2012-04-18

Similar Documents

Publication Publication Date Title
US9852204B2 (en) Read-only operations processing in a paxos replication system
US8886598B1 (en) Tag-based synchronization
US11294920B2 (en) Method and apparatus for accessing time series data in memory
US20130117227A1 (en) Cache based key-value store mapping and replication
Bai Feasibility analysis of big log data real time search based on Hbase and ElasticSearch
CN105069048A (en) Small file storage method, query method and device
CN105160039A (en) Query method based on big data
CN105117502A (en) Search method based on big data
WO2020140622A1 (en) Distributed storage system, storage node device and data duplicate deletion method
CN109284273B (en) Massive small file query method and system adopting suffix array index
Mortazavi et al. Toward session consistency for the edge
WO2016066109A1 (en) Data storage method and storage device
CN107113341A (en) The system of the high-throughput processing of affairs in the Distributed Relation Database Management System divided for data
CN100433019C (en) Data storage and retrieving method and system
US12019605B2 (en) Immediately-consistent lock-free indexing for distributed applications
CN104965835B (en) A kind of file read/write method and device of distributed file system
WO2023179787A1 (en) Metadata management method and apparatus for distributed file system
CN114003657A (en) Data processing method, system, device and storage medium for distributed database
CN101770504B (en) Data storage method, data reading method, and data reading equipment
CN102193988A (en) Method and system for retrieving node data in graphic database
CN103177016A (en) Visiting method and device for relational database
EP1967968B1 (en) Sharing of database objects
CN106354724A (en) File archiving method and file reading method and device
CN102521383A (en) Method for storing and accessing mass files in distributed system
WO2012041235A1 (en) Page flipping method and system for distributed system

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 11828131

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 12/08/2013)

122 Ep: pct application non-entry in european phase

Ref document number: 11828131

Country of ref document: EP

Kind code of ref document: A1