CN111782733A - Multi-level data summarizing method, distributed data management system and summarized data management system - Google Patents

Multi-level data summarizing method, distributed data management system and summarized data management system Download PDF

Info

Publication number
CN111782733A
CN111782733A CN202010709878.8A CN202010709878A CN111782733A CN 111782733 A CN111782733 A CN 111782733A CN 202010709878 A CN202010709878 A CN 202010709878A CN 111782733 A CN111782733 A CN 111782733A
Authority
CN
China
Prior art keywords
data
records
data interaction
database
management system
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010709878.8A
Other languages
Chinese (zh)
Inventor
刘晶
雷金铭
廖海林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alipay Hangzhou Information Technology Co Ltd
Original Assignee
Alipay Hangzhou Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alipay Hangzhou Information Technology Co Ltd filed Critical Alipay Hangzhou Information Technology Co Ltd
Priority to CN202010709878.8A priority Critical patent/CN111782733A/en
Publication of CN111782733A publication Critical patent/CN111782733A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • G06F16/278Data partitioning, e.g. horizontal or vertical partitioning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/04Trading; Exchange, e.g. stocks, commodities, derivatives or currency exchange

Abstract

The application provides a method for summarizing multilevel data, a distributed data management system and a summarized data management system. The method for multi-level data summarization comprises the following steps: acquiring M data interaction records; storing the M data interaction records into a distributed database, wherein the distributed database comprises N unit databases, and each unit database stores a plurality of data interaction records in the M data interaction records; summarizing the multiple data interaction records for each unit database in the N unit databases based on the target account number to generate J data interaction intermediate summarized records; and sending the plurality of data interaction intermediate summary records in the N unit databases to a summary database.

Description

Multi-level data summarizing method, distributed data management system and summarized data management system
Technical Field
The present application relates to the field of data processing, and in particular, to a method for aggregating multi-level data, a distributed data management system, and an aggregated data management system.
Background
A Database (DB) is a collection of data that is stored long term in a computer, organized, shareable, and uniformly manageable.
The summary settlement service is a key service of the merchant settlement system. The summarization and scattering business is mainly used for processing transaction details scattered in a sublibrary separated according to the UID dimension of a buyer according to a business dimension summarization data. In order to ensure the consistency of summarized data under a distributed database, the current merchant settlement system adopts the steps that firstly, details to be summarized are registered into a single database one by one, and then, the details are summarized in the single database. However, this approach can lead to an infinite expansion of the aggregated database, which is not only data redundant but also poor performance.
Disclosure of Invention
To address the above-mentioned deficiencies of the prior art, the present application discloses a method for multi-level data summarization, comprising: obtaining M data interaction records, wherein M is an integer greater than 1, and each data interaction record in the M data interaction records at least comprises: a target data value, a source account number from which the target data flows out, and a destination account number from which the target data flows out; storing the M data interaction records into a distributed database, wherein the distributed database comprises N unit databases, each unit database stores a plurality of data interaction records in the M data interaction records, and N is an integer greater than 1; summarizing the multiple data interaction records for each unit database in the N unit databases based on the target account number to generate J data interaction intermediate summarized records, wherein J is a natural number; and sending the plurality of data interaction intermediate summary records in the N unit databases to a summary database.
In some embodiments, the storing the M data interaction records into a distributed database includes: and storing the M data interaction records into a distributed database based on the source account.
In some embodiments, the storing the M data interaction records into a distributed database based on the source account number includes: for each data interaction record in the M data interaction records: acquiring K digits of a preset position in the source account, wherein K is a natural number; and storing the data interaction records in the unit database corresponding to the K-bit digits.
In some embodiments, the K digits of the predetermined position are the digits of the second last digit and the third last digit of the data interaction record.
In some embodiments, the multiple data interaction records stored in each unit database correspond to J destination account numbers; the generating the J data interaction intermediate summary records comprises: and sequentially acquiring the J target account numbers, setting the J target account numbers as current account numbers, traversing the data interaction records, summing all target data values in the data interaction records corresponding to the current account numbers, and setting the target data values as intermediate summary target data values.
In some embodiments, the method further comprises, in the summary database: and summarizing a plurality of data interaction intermediate summary records from the N unit databases based on the J target account numbers to generate J data interaction final summary records.
In some embodiments, the generating the J data interaction final summary records includes: sequentially acquiring the J target account numbers and setting the J target account numbers as current account numbers; and traversing the plurality of data interaction intermediate summary records, summing all intermediate summary target data values in the plurality of data interaction intermediate summary records corresponding to the current account, and setting the sum as a final summary target data value.
In some embodiments, wherein: the data interaction record is a transaction bill; the target data value comprises a transaction amount of the transaction bill; the source account number is a buyer account number of the transaction bill; and the destination account number is a seller account number of the transaction bill.
The application also discloses a distributed data management system, including: at least one memory including at least one instruction set; and at least one processor, communicatively coupled to the at least one memory, the at least one processor executing the method for multi-level data summarization according to the at least one instruction set.
The present application also discloses a summarized data management system, including: at least one memory including at least one instruction set; at least one processor, communicatively coupled to the at least one memory, that when the summarized data management system is operating, reads and executes the at least one instruction set, and performs, as indicated by the at least one instruction set: receiving a plurality of data interaction intermediate summary records from N unit databases in a distributed data management system, wherein a plurality of data interaction records are stored in each unit database of the N unit databases, the plurality of data interaction records stored in each unit database correspond to J destination account numbers, each unit database summarizes the plurality of data interaction records stored in each unit database based on the J destination account numbers to generate J data interaction intermediate summary records, sends the J data interaction intermediate summary records to the summary data management system, and summarizes the plurality of data interaction intermediate summary records based on the J destination account numbers to generate a J data interaction final summary record.
In some embodiments, the generating the J data interaction final summary records includes: sequentially acquiring the J target account numbers and setting the J target account numbers as current account numbers; and traversing the plurality of data interaction intermediate summary records, summing all intermediate summary target data values in the plurality of data interaction intermediate summary records corresponding to the current account, and setting the sum as a final summary target data value.
According to the distributed data management system, before detail data to be summarized are registered in a summarizing database, the detail data to be summarized in each sub-table are firstly summarized into small-batch task data in the distributed database according to service dimensions; and then, the small batch of task data is registered in a summary task table in a summary database one by one. And the summary data management system then summarizes the small batches of tasks in the summary task list into a summary batch list according to different service dimensions.
The application provides a method for multi-level data summarization: firstly, before the detail data to be summarized is registered in a summarizing database, the detail data to be summarized, which are split in different sub-tables according to the dimension of a buyer, are firstly summarized into small batches of tasks with different service dimensions in a distributed database in real time, and each service dimension only has N small batches of tasks at most, so that the data distribution can be balanced, and the data inclination caused by excessive detail data of a certain dimension can be avoided; secondly, when small batches of tasks are registered in the summary database, the summary task list in the summary database cannot be increased progressively due to the increase of the detail data, and cannot be expanded due to the rapid development of services, so that the problem of data redundancy in the summary database caused by one-to-one synchronization of the detail data is solved; furthermore, the interaction times of the distributed database and the summarized database can be greatly reduced, the system overhead is reduced, and the data summarizing performance is improved.
Drawings
Fig. 1 illustrates an application scenario of a multi-level data summarization method provided in an embodiment of the present application;
fig. 2 is a schematic structural diagram of a server provided according to an embodiment of the present application;
FIG. 3 illustrates a method of data summarization provided according to an embodiment of the present application;
FIG. 4 is a flow chart illustrating a method for aggregating multiple levels of data according to an embodiment of the present application; and
fig. 5 is a schematic diagram illustrating an operation process of a multi-level data summarization method according to an embodiment of the present application.
Detailed Description
The following description is presented to enable any person skilled in the art to make and use the present disclosure, and is provided in the context of a particular application and its requirements. Various modifications to the disclosed embodiments will be readily apparent to those skilled in the art, and the general principles defined herein may be applied to other embodiments and applications without departing from the spirit and scope of the present application. Thus, the present application is not limited to the embodiments shown, but is to be accorded the widest scope consistent with the claims.
The terminology used herein is for the purpose of describing particular example embodiments only and is not intended to be limiting.
These and other features of the present application, as well as the operation and function of the related elements of structure and the combination of parts and economies of manufacture, may be significantly improved upon consideration of the following description. All of which form a part of this application, with reference to the accompanying drawings. It is to be expressly understood, however, that the drawings are for the purpose of illustration and description only and are not intended as a definition of the limits of the application.
These and other features of the present application, as well as the operation and function of the related elements of the structure, and the economic efficiency of assembly and manufacture, are significantly improved by the following description. All of which form a part of this application with reference to the accompanying drawings. It is to be expressly understood, however, that the drawings are for the purpose of illustration and description only and are not intended as a definition of the limits of the application. It should also be understood that the drawings are not drawn to scale.
The embodiment of the application provides a method for summarizing multi-level data. Fig. 1 shows an application scenario of a multi-level data summarization method provided in an embodiment of the present application. Specifically, the application scenario may include the business system 600, the data management system 100, and the payment system 700.
Business system 600 is upstream of data management system 100. The service system 600 generates service data during the process of processing the service. For example, the business system 600 may be a clothing store merchant, and the business data may include order data generated by a buyer after purchasing clothing at the clothing store.
Business system 600 may store the business data in a distributed database. The service system 600 may split the service data into a plurality of different sub-tables based on the sub-database sub-table rule, and store the plurality of different sub-tables into different sub-databases, respectively. For example, the service system 600 may split the service data into 100 data sub-tables, and store the 100 data sub-tables into 100 data sub-databases, respectively.
In some embodiments, the sub-base and sub-table rules may include: the data sub-base and the data sub-table are located based on the buyer identification in the business data.
For example, assuming that the buyer is identified as userId (hereinafter referred to as UID), the sub-table rule may be configured to locate the data sub-database and the data sub-table corresponding to the service data of the UID by taking a value of a specific bit in the UID.
Taking UID as an example, 1111222233335678, the rules of the sub-library and sub-table may be configured as: and taking the second last three digits of the UID to locate the data sub-database and the data sub-table. Since the second to last three digits of the UID are 67, the data sub-database 67 and the data sub-table 67 can be located. The service data corresponding to the UID will be stored in the data sub-table 67 under the data sub-database 67.
Of course, other rules of database and table division can be used. Or, for example, UID 1111222233335678, the rules of the sub-library and sub-table may be configured as: and (4) taking the last three digits of the UID to locate a data sub-database, and taking the second and third digits of the UID to locate a data sub-table. Since the last three digits of the UID are 6, the data sub-database 06 can be located, and the last three digits of the UID are 67, the data sub-table 67 can be located. The subsequent service data corresponding to the UID will be stored in the data sub-table 67 under the data sub-database 05.
The banking and sub-table rules described above are merely exemplary, and those skilled in the art will appreciate that other banking and sub-table rules may be employed by business system 600 to store business data to a distributed database system without departing from the core spirit of the present application.
It can be seen that in this way, the service data corresponding to the service executed by the service system 600 can be inserted into different data sub-tables for storage "uniformly". After the database sub-tables are completed, the service data generated by the service system 600 will be distributed in a plurality of data sub-tables in a plurality of different database sub-tables.
The data management system 100 may aggregate business data for the business system 600. By way of example, the data management system 100 may be some settlement platform. The settlement platform may aggregate the expense details generated by the business system 600 to generate a settlement order.
Data management system 100 may include a distributed data management system 400 and a summarized data management system 300.
Distributed data management system 400 may include distributed database management system 410 and distributed database 420. Distributed database 420 is built by distributed database management system 410. Distributed database management system 410 may also manage and control distributed database 420. For example, distributed database management system 410 may query, add, update, delete, sum, sort, etc. data in distributed database 420. The distributed database 420 includes a plurality of databases, such as database db00, database db01, database db03, database db04, and so on. Each data sub-base can store one or more data sub-tables. Each data sub-table may store the service data.
Summary data management system 300 may include a summary database management system 310 and a summary database 320. The summary database 320 is established by the summary database management system 310. The summary database management system 310 may also manage and control the summary database 320. For example, the summary database management system 310 may query, add, update, delete, sum, sort, etc. the data in the summary database 320. Summary database 320 may have summary tables stored therein. The summary table may include data after summarizing the sub-tables in each sub-library in distributed database 420.
The data management system 100 may include a server 200. The server 200 may be a stand-alone server or a server cluster. Server 200 may be a server of distributed database system 400, a server of summarized data management system 300, or both distributed data management system 400 and summarized data management system 300. As an example, fig. 2 shows a schematic structural diagram of a server 200 provided according to an embodiment of the present application. For ease of understanding, in the following description of the present application, the structure and functions of the server 200 are described by taking the server 200 as an example of the server of the distributed data management system 400.
Server 200 includes at least one memory 230 and at least one processor 220. In some embodiments, server 200 may also include a communication port 250 and an internal communication bus 210. Meanwhile, the server 200 may also include an I/O component 260.
Internal communication bus 210 may connect various system components including memory 230 and processor 220.
I/O components 260 support input/output between server 200 and other components.
Memory 230 may include a data storage device. The data storage device may be a non-transitory storage medium or a transitory storage medium. For example, the data storage devices may include one or more of a disk 232, Read Only Memory (ROM)234, or Random Access Memory (RAM) 236. The memory 230 also includes at least one instruction set stored in the data storage device. The set of instructions is computer program code that may include programs, routines, objects, components, data structures, procedures, modules, and the like that perform the data summarization methods provided herein.
The at least one processor 220 communicates with the at least one memory 230 via an internal communication bus 210. The at least one processor 220 is configured to execute the at least one instruction set, and when the at least one processor 220 executes the at least one instruction set, the server 200 implements the data summarization method provided herein. The processor 220 may perform some of the steps included in the data summarization method. Processor 220 may be in the form of one or more processors, and in some embodiments, processor 220 may include one or more hardware processors, such as microcontrollers, microprocessors, Reduced Instruction Set Computers (RISC), Application Specific Integrated Circuits (ASICs), application specific instruction set processors (ASIPs), Central Processing Units (CPUs), Graphics Processing Units (GPUs), Physical Processing Units (PPUs), microcontroller units, Digital Signal Processors (DSPs), Field Programmable Gate Arrays (FPGAs), Advanced RISC Machines (ARM), Programmable Logic Devices (PLDs), any circuit or processor capable of executing one or more functions, or the like, or any combination thereof. For illustrative purposes only, only one processor 220 is depicted in server 200 in the present application. It should be noted, however, that the server 200 may also include multiple processors, and thus, the operations and/or method steps disclosed herein may be performed by one processor, as described herein, or by a combination of multiple processors. For example, if in the present application the processor 220 of the server 200 performs step a and step B, it should be understood that step a and step B may also be performed jointly or separately by two different processors 220 (e.g., a first processor performs step a, a second processor performs step B, or a first and second processor performs steps a and B together).
The port 250 is used for data communication between the server 200 and the outside. For example, the server 200 may be connected to the network 130 through the port 250, and further receive the service data sent from the service system 600, or send the preliminarily summarized service data to the summarized data management system 300.
With continued reference to fig. 1, the payment system 700 may unify payment and collection for the completed statement aggregated by the data management system 100. By way of example, the payment system 700 may include, but is not limited to, a third party payment platform, a bank, and the like.
Fig. 3 illustrates a method for data aggregation provided according to an embodiment of the present application. The business detail data to be aggregated for business system 600 is scattered in 100 separate-base sub-tables (i.e., 00 table, 01 table, 99 table). The 100 parts of the sub-tables are split according to the reciprocal two and the three of the dimensionality of the UID of the buyer. The distributed data management system 400 registers these detailed data one-to-one into the summary task table of the summary database 320. The summary data management system 300 summarizes the business data in the summary task table into a summary batch table according to different business dimensions. The business dimension may refer to a dimension of a seller (i.e., a merchant). That is, one business dimension corresponds to one seller.
The method of data summarization shown in FIG. 3: firstly, each detail data to be summarized is registered in a summary task table one by one, and summary tasks registered every day are in the level of ten million, so that the summary task table is increased quickly, although a Database Administrator (hereinafter referred to as DBA) can help to clean summary task data regularly, the summary task table is increased too fast, and manual cleaning is not a sustainable method; secondly, since all detail data need to be registered in the summary task table, when summary is performed according to different service dimensions, overtime data fishing occurs due to excessive detail data of a certain dimension, and system stability is affected; thirdly, the distributed data management system 400 and the summarized data management system 300 where the detail data are located need to interact frequently, the specific interaction times are equal to the number of the detail data, extra system overhead is increased, and the performance of data processing is affected.
The application also provides a method for summarizing the multi-level data. Fig. 4 shows a flowchart of a multi-level data summarization method S100 according to an embodiment of the present application. Fig. 5 is a schematic diagram illustrating an operation process of a multi-level data summarization method S100 according to an embodiment of the present application.
The process S100 shown in fig. 4 includes a data summarization method performed by the distributed data management system 400 and a data summarization method performed by the summarized data management system 300. Some of the steps of process S100 may be performed by distributed data management system 400 and some of the steps may be performed by summarized data management system 300.
The illustrated operations of the flow S100 presented below are intended to be illustrative and not limiting. In some embodiments, the process S100 may be implemented with one or more additional operations not described, and/or with one or more operations described herein. Further, the order of the operations shown in FIG. 4 and described below is not intended to be limiting.
S110, the distributed data management system 400 acquires M data interaction records.
M may be an integer greater than 1. Each of the M interaction records at least comprises: a target data value, a source account number from which the target data flows out, and a destination account number from which the target data flows out.
As an example, the data interaction record may be business data generated by business system 600. As an example, the data interaction record may be a transaction bill for a transaction between the buyer and the buyer. An interaction record may be a transaction bill. As an example, the target data value includes a transaction amount of the transaction bill. As an example, the source account number is a purchaser account number of the transaction bill. As an example, the destination account number is a seller account number of the transaction bill. In some embodiments, the transaction bill may also include a transaction time.
S120, the distributed data management system 400 stores the M data interaction records into the distributed database 420.
Distributed database 420 may include N cell databases. The N cell databases may be N sub-libraries of distributed database 420. Wherein N is an integer greater than 1. For ease of understanding, in the following description of the present application, the data summarization method S100 is described in detail by taking N equal to 100 as an example.
In some embodiments, distributed data management system 400 stores the M data interaction records in distributed database 420 based on the source account number. As an example, the process of the distributed data management system 400 storing the M data interaction records in the distributed database 420 based on the source account may include:
s121, for each data interaction record in the M-ratio data interaction records, the distributed data management system 400 obtains K digits of a predetermined position in the source account.
And S122, for each data interaction record in the M-ratio data interaction records, the distributed data management system 400 stores the data interaction record in the unit database corresponding to the K-bit number.
And K is a natural number. As an example, the K-bit number of the predetermined location may be a number of a second last bit and a third last bit of the data interaction record.
As can be seen from the foregoing description, the service system 600 may generate the M data interaction records, and insert each data interaction record in the M data interaction records into the corresponding sub-database sub-table according to the preset sub-database sub-table rule.
For example, assume the rules of database and table are: and taking the second last three digits of the UID to locate the data sub-database and the data sub-table. The business system 600 stores each data interaction record in the sub-repository sub-table corresponding to the second to last and three digits of its UID.
As can be appreciated from the foregoing description, each sub-bank of distributed database 420 may have a sub-table stored therein. Then, there are a total of 100 sublists for the 100 sublots. Referring to fig. 5, this is: 00 gauge, 01 gauge, 99 gauge. Each sub-table can comprise a plurality of data interaction records in the M data interaction records. A data interaction record may be a row record in a sublist.
S130, the distributed data management system 400 summarizes the multiple data interaction records for each of the N unit databases based on the destination account number, and generates J data interaction intermediate summary records.
The J is a natural number. The multiple data interaction records stored in each unit database correspond to J destination account numbers. For example, the destination account number may comprise a seller account number, as described above. One destination account corresponds to one seller account. The J destination accounts correspond to the J seller accounts.
Referring to fig. 5, taking N-100 and J-2 as an example, the traffic details to be summarized are scattered in 100 sub-pool sub-tables (00 table, 01 table, 99 table) divided by 100 three bits from the last to the last of the buyer UID. Wherein, one sub-table contains a plurality of detail data. Taking the 00 table as an example, the 00 table includes a plurality of detail data, a row of record in the 00 table may represent one piece of detail data, and taking the detail data as a transaction bill as an example, one piece of detail data may include information such as a transaction amount, a buyer account number, a seller account number, and the like. If the partition is divided according to the dimension of the seller account, one sub-table (for example, a 00 table) may include 2 seller accounts, namely a seller account a and a seller account B, that is, the dimension a and the dimension B in fig. 5. That is, all the detail data in the 00 table belong to two seller accounts, namely, a seller account A and a seller account B.
The distributed data management system 400 may summarize, for each sub-table in each sub-base, the multiple data interaction records in the sub-table based on the seller account number, and generate 2 intermediate summarized records. Taking the 00 table as an example, the distributed data management system 400 summarizes the detail data in the 00 table according to the seller account a and the seller account B to generate 2 intermediate summary records, where one intermediate summary record includes all transaction amounts of one seller account.
Specifically, the generating of the J data interaction intermediate summary records by the distributed data management system 400 may include:
s131, the distributed data management system 400 sequentially acquires the J destination account numbers and sets the J destination account numbers as current destination account numbers; and
s132, the distributed data management system 400 traverses the multiple data interaction records, sums up the target data values in all the data interaction records corresponding to the current account, and sets the sum as the intermediate summarized target data value.
Taking the 00 table shown in fig. 5 as an example, the distributed data management system 400 obtains two seller accounts, namely a seller account a and a seller account B, in the 00 table.
The distributed data management system 400 sets the seller account a as the current destination account, traverses the detail data of all rows in the 00 table, sums up the transaction amounts of the detail data of which all the seller accounts are a in the 00 table, and generates the small-batch task data 10 of the dimension a.
The distributed data management system 400 sets the seller account B as the current destination account, traverses the detail data of all rows in the 00 table, sums up the transaction amount of the detail data of which the seller account is B in the 00 table, and generates the small-batch task data 20 of the dimension B.
The small batch of task data 10 of dimension a and the small batch of task data 20 of dimension B may each be an added row of records in the 00 table by the distributed database management system 410.
Similarly, the distributed data management system 400 processes all the sub-tables of the 100 sub-tables in sequence according to the above steps.
S140, the distributed data management system 400 sends the multiple data interaction intermediate summary records in the N unit databases to the summary database 320.
S150, the summarized data management system 300 summarizes the plurality of data interaction intermediate summarized records in the N unit databases based on the target account number, and generates J data interaction final summarized records.
With continued reference to FIG. 5, the distributed data management system 400 registers a plurality of intermediate summary records (i.e., the small batch of tasks of FIG. 5) with the summary database 320, and the summary database 320 forms a summary task table 30 including the plurality of intermediate summary records. Summary data management system 300 may aggregate the data in summary task table 30 to generate a final summary record according to steps 151 and 152:
s151, the summarized data management system 300 sequentially obtains the J destination account numbers, and sets them as the current destination account numbers.
S152, the summarized data management system 300 traverses the multiple data interaction intermediate summarized records, sums all intermediate summarized target data values in the multiple data interaction intermediate summarized records corresponding to the current account, and sets the sum as a final summarized target data value.
With continued reference to fig. 5, the summary data management system 300 obtains two vendor accounts in the summary task table 30, namely a vendor account a and a vendor account B,
the summary data management system 300 sets the seller account a as the current destination account, traverses the small-batch task data of all rows in the summary task table 30, sums the transaction amounts of the small-batch task data of all the seller accounts a in the summary task table 30, and generates the summary batch 41 of the dimension a. The transaction amount of the aggregated batch 41 of dimension a is the statement of the seller account a.
The summary data management system 300 sets the seller account B as the current destination account, traverses the small-batch task data of all rows in the summary task table 30, sums the transaction amounts of the small-batch task data of all the seller accounts B in the summary task table 30, and generates the summary batch 42 of the dimension B. The transaction amount of the aggregated batch 42 of dimension B is the statement of the seller's account B.
The summary data management system 300 generates a summary batch table 40, with a row of data of the summary batch table 40 including a statement for a seller dimension. For example, in FIG. 5, the aggregated lot table 40 includes two rows of records, the record 41 includes a statement of the account A of the seller, and the record 42 includes a statement of the account B of the seller.
The payment system 700 may then proceed to subsequently settle the seller account number a and the seller account number B according to the aggregated lot table 40.
To sum up, before the detail data to be summarized is registered in the summarizing database 320, the detail data to be summarized in each sub-table is summarized into small batch of task data (such as the small batch of task data 10 of dimension a and the small batch of task data 20 of dimension B) in the distributed database 420 according to the service dimensions (such as dimension a and dimension B); the small batch of task data is then registered one-to-one in the summary task table 30 in the summary database 320. Wherein the one-to-one relationship indicates that a small batch of task data in the distributed database 420 corresponds to a small batch of tasks in the summary task table 30. The summary data management system 300 then aggregates the small batches of tasks in the summary task table 30 into the summary batch table 40 according to different business dimensions (e.g., dimension a and dimension B).
Compared with the data summarizing method shown in fig. 3, the multi-level data summarizing method of the present application includes:
firstly, before the detail data to be summarized is registered in the summarizing database 320, the detail data to be summarized, which is split into different sub-tables according to the buyer dimension, is firstly summarized into small batches of tasks (such as the small batch of tasks 10 and the small batch of tasks 20) with different business dimensions in real time in the distributed database 420, and each business dimension has only N small batches of tasks at most (such as the small batch of tasks in the dimension a has only 100 at most), so that the data distribution can be balanced, and the data inclination caused by excessive detail data of a certain dimension can be avoided.
Secondly, when small batches of tasks are registered in the summary database 420, the summary task table 30 in the summary database 320 is not incremented due to the increase of the detail data, and is not expanded due to the rapid development of services, so that the problem of data redundancy in the summary database 320 caused by one-to-one synchronization of the detail data in the data summary method shown in fig. 3 is solved.
Further, the number of interactions between distributed database 420 and summary database 320 may be substantially reduced. For example, in the summarization method shown in fig. 3, the number of interactions required by distributed database 420 and summarization database 320 is equal to the number of detail data; in the summarization method shown in fig. 5, the number of interactions between the distributed database 420 and the summarization database 320 is equal to "the number of service dimensions" x "the number of sub-tables", that is, 2 × 100 is 200, which is much smaller than the number of details, thereby reducing the system overhead and improving the performance of data summarization.
In conclusion, upon reading the present detailed disclosure, those skilled in the art will appreciate that the foregoing detailed disclosure can be presented by way of example only, and not limitation. Those skilled in the art will appreciate that the present application is intended to cover various reasonable variations, adaptations, and modifications of the embodiments described herein, although not explicitly described herein. Such alterations, improvements, and modifications are intended to be suggested by this application and are within the spirit and scope of the exemplary embodiments of the application.
Furthermore, certain terminology has been used in this application to describe embodiments of the application. For example, "one embodiment," "an embodiment," and/or "some embodiments" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. Therefore, it is emphasized and should be appreciated that two or more references to "an embodiment" or "one embodiment" or "an alternative embodiment" in various portions of this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined as suitable in one or more embodiments of the application.
It should be appreciated that in the foregoing description of embodiments of the present application, various features are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure aiding in the understanding of such feature. Alternatively, various features may be dispersed throughout several embodiments of the application. This is not to be taken as an admission that any of the features of the claims are essential, and it is fully possible for a person skilled in the art to extract some of them as separate embodiments when reading the present application. That is, embodiments in the present application may also be understood as an integration of multiple sub-embodiments. And each sub-embodiment described herein is equally applicable to less than all features of a single foregoing disclosed embodiment.
In some embodiments, numbers expressing quantities or properties useful for describing and claiming certain embodiments of the present application are to be understood as being modified in certain instances by the terms "about", "approximately" or "substantially". For example, "about", "approximately" or "substantially" may mean a ± 20% variation of the value it describes, unless otherwise specified. Accordingly, in some embodiments, the numerical parameters set forth in the written description and attached claims are approximations that may vary depending upon the desired properties sought to be obtained by a particular embodiment. In some embodiments, numerical parameters should be construed in light of the number of reported significant digits and by applying ordinary rounding techniques. Notwithstanding that the numerical ranges and parameters setting forth the broad scope of the embodiments of the application are approximations, the numerical values set forth in the specific examples are reported as precisely as possible.
Each patent, patent application, publication of a patent application, and other material, such as articles, books, descriptions, publications, documents, articles, and the like, cited herein is hereby incorporated by reference. All matters hithertofore set forth herein except as related to any prosecution history, may be inconsistent or conflicting with this document or any prosecution history which may have a limiting effect on the broadest scope of the claims. Now or later associated with this document. For example, if there is any inconsistency or conflict in the description, definition, and/or use of terms associated with any of the included materials with respect to the terms, descriptions, definitions, and/or uses associated with this document, the terms in this document are used.
Finally, it should be understood that the embodiments of the application disclosed herein are illustrative of the principles of the embodiments of the present application. Other modified embodiments are also within the scope of the present application. Accordingly, the disclosed embodiments are presented by way of example only, and not limitation. Those skilled in the art may implement the present application in alternative configurations according to the embodiments of the present application. Thus, embodiments of the present application are not limited to those embodiments described with precision in the application.

Claims (11)

1. A method of multi-level data summarization, comprising:
obtaining M data interaction records, wherein M is an integer greater than 1, and each data interaction record in the M data interaction records at least comprises: a target data value, a source account number from which the target data flows out, and a destination account number from which the target data flows out;
storing the M data interaction records into a distributed database, wherein the distributed database comprises N unit databases, each unit database stores a plurality of data interaction records in the M data interaction records, and N is an integer greater than 1;
summarizing the multiple data interaction records for each unit database in the N unit databases based on the target account number to generate J data interaction intermediate summarized records, wherein J is a natural number; and
and sending the plurality of data interaction intermediate summary records in the N unit databases to a summary database.
2. The method of claim 1 wherein said storing said M data interaction records into a distributed database comprises:
and storing the M data interaction records into a distributed database based on the source account.
3. The method of claim 2, wherein the storing the M data interaction records in a distributed database based on the source account number comprises, for each of the M data interaction records:
acquiring K digits of a preset position in the source account, wherein K is a natural number; and
and storing the data interaction records into the unit database corresponding to the K-bit digits.
4. The method of claim 3, wherein the K digits of the predetermined location are the second to last and third to last digits of the data interaction record.
5. The method of claim 1, wherein the plurality of data interaction records stored in each unit database correspond to J destination account numbers;
the generating the J data interaction intermediate summary records comprises:
sequentially acquiring the J destination account numbers, setting the J destination account numbers as current account numbers, and
and traversing the plurality of data interaction records, summing the target data values in all the data interaction records corresponding to the current account, and setting the sum as a middle summary target data value.
6. The method of claim 5, further comprising, in the summary database:
and summarizing a plurality of data interaction intermediate summary records from the N unit databases based on the J target account numbers to generate J data interaction final summary records.
7. The method of claim 6, wherein the generating J data interaction final summary records comprises:
sequentially acquiring the J target account numbers and setting the J target account numbers as current account numbers; and
and traversing the plurality of data interaction intermediate summary records, summing all intermediate summary target data values in the plurality of data interaction intermediate summary records corresponding to the current account, and setting the sum as a final summary target data value.
8. The method of claim 1, wherein: the data interaction record is a transaction bill; the target data value comprises a transaction amount of the transaction bill; the source account number is a buyer account number of the transaction bill; and the destination account number is a seller account number of the transaction bill.
9. A distributed data management system, comprising:
at least one memory including at least one instruction set; and
at least one processor, communicatively coupled to the at least one memory, the at least one processor performing the method for multi-level data summarization of any of claims 1-5 and 8 according to the at least one instruction set.
10. A summarized data management system comprising:
at least one memory including at least one instruction set;
at least one processor, communicatively coupled to the at least one memory, that when the summarized data management system is operating, reads and executes the at least one instruction set, and performs, as indicated by the at least one instruction set:
receiving a plurality of data interaction intermediate summary records from N unit databases in a distributed data management system, wherein a plurality of data interaction records are stored in each unit database of the N unit databases, the plurality of data interaction records stored in each unit database correspond to J destination account numbers, each unit database summarizes the plurality of data interaction records stored in each unit database based on the J destination account numbers to generate J data interaction intermediate summary records, and sends the J data interaction intermediate summary records to the summary data management system, and
and summarizing the plurality of data interaction intermediate summary records based on the J destination account numbers to generate J data interaction final summary records.
11. The summary data management system of claim 10, wherein the generating J data interaction final summary records comprises:
sequentially acquiring the J target account numbers and setting the J target account numbers as current account numbers; and
and traversing the plurality of data interaction intermediate summary records, summing all intermediate summary target data values in the plurality of data interaction intermediate summary records corresponding to the current account, and setting the sum as a final summary target data value.
CN202010709878.8A 2020-07-22 2020-07-22 Multi-level data summarizing method, distributed data management system and summarized data management system Pending CN111782733A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010709878.8A CN111782733A (en) 2020-07-22 2020-07-22 Multi-level data summarizing method, distributed data management system and summarized data management system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010709878.8A CN111782733A (en) 2020-07-22 2020-07-22 Multi-level data summarizing method, distributed data management system and summarized data management system

Publications (1)

Publication Number Publication Date
CN111782733A true CN111782733A (en) 2020-10-16

Family

ID=72764359

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010709878.8A Pending CN111782733A (en) 2020-07-22 2020-07-22 Multi-level data summarizing method, distributed data management system and summarized data management system

Country Status (1)

Country Link
CN (1) CN111782733A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112598327A (en) * 2020-12-31 2021-04-02 平安银行股份有限公司 Service processing system, method, device and storage medium

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017016257A1 (en) * 2015-07-24 2017-02-02 广州交易猫信息技术有限公司 Account product transaction processing method and system, and transaction server
CN106570029A (en) * 2015-10-12 2017-04-19 阿里巴巴集团控股有限公司 Data processing method and data processing system of distributed relation type database
CN107038182A (en) * 2016-09-27 2017-08-11 阿里巴巴集团控股有限公司 Divide the completeness inspection method and device of table data
WO2017166174A1 (en) * 2016-03-31 2017-10-05 李昕光 Central clearing and settlement method
CN107273192A (en) * 2016-04-06 2017-10-20 阿里巴巴集团控股有限公司 A kind of propulsion method of product trading, server and system
CN108009883A (en) * 2017-11-30 2018-05-08 泰康保险集团股份有限公司 Method and device for order processing
CN109034988A (en) * 2018-07-26 2018-12-18 北京京东金融科技控股有限公司 A kind of accounting entry generation method and device
CN109145051A (en) * 2018-07-03 2019-01-04 阿里巴巴集团控股有限公司 The data summarization method and device and electronic equipment of distributed data base
CN109615514A (en) * 2018-11-27 2019-04-12 宝付网络科技(上海)有限公司 Hot spot account trading system and method
CN110298746A (en) * 2019-07-04 2019-10-01 中国工商银行股份有限公司 Hot spot account concurrent data processing system and method

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017016257A1 (en) * 2015-07-24 2017-02-02 广州交易猫信息技术有限公司 Account product transaction processing method and system, and transaction server
CN106570029A (en) * 2015-10-12 2017-04-19 阿里巴巴集团控股有限公司 Data processing method and data processing system of distributed relation type database
WO2017166174A1 (en) * 2016-03-31 2017-10-05 李昕光 Central clearing and settlement method
CN107273192A (en) * 2016-04-06 2017-10-20 阿里巴巴集团控股有限公司 A kind of propulsion method of product trading, server and system
CN107038182A (en) * 2016-09-27 2017-08-11 阿里巴巴集团控股有限公司 Divide the completeness inspection method and device of table data
CN108009883A (en) * 2017-11-30 2018-05-08 泰康保险集团股份有限公司 Method and device for order processing
CN109145051A (en) * 2018-07-03 2019-01-04 阿里巴巴集团控股有限公司 The data summarization method and device and electronic equipment of distributed data base
CN109034988A (en) * 2018-07-26 2018-12-18 北京京东金融科技控股有限公司 A kind of accounting entry generation method and device
CN109615514A (en) * 2018-11-27 2019-04-12 宝付网络科技(上海)有限公司 Hot spot account trading system and method
CN110298746A (en) * 2019-07-04 2019-10-01 中国工商银行股份有限公司 Hot spot account concurrent data processing system and method

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112598327A (en) * 2020-12-31 2021-04-02 平安银行股份有限公司 Service processing system, method, device and storage medium

Similar Documents

Publication Publication Date Title
US7908242B1 (en) Systems and methods for optimizing database queries
CN112199366A (en) Data table processing method, device and equipment
CN104794146B (en) The method and apparatus that commodity are screened and sorted in real time
CN103827908A (en) Systems and methods for a large-scale credit data processing architecture
CN103177062A (en) Accelerated query operators for high-speed, in-memory online analytical processing queries and operations
CN108256113B (en) Data blood relationship mining method and device
CN112597153B (en) Block chain-based data storage method, device and storage medium
CN105550225A (en) Index construction method and query method and apparatus
CN108415835A (en) Distributed data library test method, device, equipment and computer-readable medium
US10599614B1 (en) Intersection-based dynamic blocking
US20220229814A1 (en) Maintaining stable record identifiers in the presence of updated data records
CN112667612A (en) Data quality checking method and device, electronic equipment and storage medium
CN111782733A (en) Multi-level data summarizing method, distributed data management system and summarized data management system
CN114253930A (en) Data processing method, device, equipment and storage medium
CN107209763B (en) Rules for specifying and applying data
US20100169267A1 (en) Method and system for data processing using multidimensional filtering
CN115422205A (en) Data processing method and device, electronic equipment and storage medium
WO2023098034A1 (en) Business data report classification method and apparatus
CN114564501A (en) Database data storage and query methods, devices, equipment and medium
CN111723129B (en) Report generation method, report generation device and electronic equipment
CN112905601A (en) Routing method and device for database sub-tables
CN112634012A (en) Service data processing method, device, server and storage medium
CN106484378A (en) Data processing method and device that a kind of nothing is landed
US6442562B1 (en) Apparatus and method for using incomplete cached balance sets to generate incomplete or complete cached balance sets and balance values
KR102411806B1 (en) Systems and methods for database query efficiency improvement

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination