CN111782733A - Multi-level data summarizing method, distributed data management system and summarized data management system - Google Patents
Multi-level data summarizing method, distributed data management system and summarized data management system Download PDFInfo
- Publication number
- CN111782733A CN111782733A CN202010709878.8A CN202010709878A CN111782733A CN 111782733 A CN111782733 A CN 111782733A CN 202010709878 A CN202010709878 A CN 202010709878A CN 111782733 A CN111782733 A CN 111782733A
- Authority
- CN
- China
- Prior art keywords
- data
- records
- data interaction
- database
- management system
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000013523 data management Methods 0.000 title claims abstract description 72
- 238000000034 method Methods 0.000 title claims abstract description 52
- 230000003993 interaction Effects 0.000 claims abstract description 118
- 238000007726 management method Methods 0.000 description 9
- 230000008569 process Effects 0.000 description 8
- 238000012545 processing Methods 0.000 description 7
- 238000004891 communication Methods 0.000 description 5
- 238000013500 data storage Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 4
- 230000006870 function Effects 0.000 description 4
- 230000000670 limiting effect Effects 0.000 description 4
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 238000003860 storage Methods 0.000 description 3
- 210000004460 N cell Anatomy 0.000 description 2
- 230000004931 aggregating effect Effects 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 238000009826 distribution Methods 0.000 description 2
- 238000004519 manufacturing process Methods 0.000 description 2
- 230000006978 adaptation Effects 0.000 description 1
- 230000002776 aggregation Effects 0.000 description 1
- 238000004220 aggregation Methods 0.000 description 1
- 230000004075 alteration Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 238000004140 cleaning Methods 0.000 description 1
- 238000004590 computer program Methods 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 230000003203 everyday effect Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 238000005192 partition Methods 0.000 description 1
- 238000011144 upstream manufacturing Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/27—Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
- G06F16/278—Data partitioning, e.g. horizontal or vertical partitioning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2282—Tablespace storage structures; Management thereof
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q40/00—Finance; Insurance; Tax strategies; Processing of corporate or income taxes
- G06Q40/04—Trading; Exchange, e.g. stocks, commodities, derivatives or currency exchange
Abstract
The application provides a method for summarizing multilevel data, a distributed data management system and a summarized data management system. The method for multi-level data summarization comprises the following steps: acquiring M data interaction records; storing the M data interaction records into a distributed database, wherein the distributed database comprises N unit databases, and each unit database stores a plurality of data interaction records in the M data interaction records; summarizing the multiple data interaction records for each unit database in the N unit databases based on the target account number to generate J data interaction intermediate summarized records; and sending the plurality of data interaction intermediate summary records in the N unit databases to a summary database.
Description
Technical Field
The present application relates to the field of data processing, and in particular, to a method for aggregating multi-level data, a distributed data management system, and an aggregated data management system.
Background
A Database (DB) is a collection of data that is stored long term in a computer, organized, shareable, and uniformly manageable.
The summary settlement service is a key service of the merchant settlement system. The summarization and scattering business is mainly used for processing transaction details scattered in a sublibrary separated according to the UID dimension of a buyer according to a business dimension summarization data. In order to ensure the consistency of summarized data under a distributed database, the current merchant settlement system adopts the steps that firstly, details to be summarized are registered into a single database one by one, and then, the details are summarized in the single database. However, this approach can lead to an infinite expansion of the aggregated database, which is not only data redundant but also poor performance.
Disclosure of Invention
To address the above-mentioned deficiencies of the prior art, the present application discloses a method for multi-level data summarization, comprising: obtaining M data interaction records, wherein M is an integer greater than 1, and each data interaction record in the M data interaction records at least comprises: a target data value, a source account number from which the target data flows out, and a destination account number from which the target data flows out; storing the M data interaction records into a distributed database, wherein the distributed database comprises N unit databases, each unit database stores a plurality of data interaction records in the M data interaction records, and N is an integer greater than 1; summarizing the multiple data interaction records for each unit database in the N unit databases based on the target account number to generate J data interaction intermediate summarized records, wherein J is a natural number; and sending the plurality of data interaction intermediate summary records in the N unit databases to a summary database.
In some embodiments, the storing the M data interaction records into a distributed database includes: and storing the M data interaction records into a distributed database based on the source account.
In some embodiments, the storing the M data interaction records into a distributed database based on the source account number includes: for each data interaction record in the M data interaction records: acquiring K digits of a preset position in the source account, wherein K is a natural number; and storing the data interaction records in the unit database corresponding to the K-bit digits.
In some embodiments, the K digits of the predetermined position are the digits of the second last digit and the third last digit of the data interaction record.
In some embodiments, the multiple data interaction records stored in each unit database correspond to J destination account numbers; the generating the J data interaction intermediate summary records comprises: and sequentially acquiring the J target account numbers, setting the J target account numbers as current account numbers, traversing the data interaction records, summing all target data values in the data interaction records corresponding to the current account numbers, and setting the target data values as intermediate summary target data values.
In some embodiments, the method further comprises, in the summary database: and summarizing a plurality of data interaction intermediate summary records from the N unit databases based on the J target account numbers to generate J data interaction final summary records.
In some embodiments, the generating the J data interaction final summary records includes: sequentially acquiring the J target account numbers and setting the J target account numbers as current account numbers; and traversing the plurality of data interaction intermediate summary records, summing all intermediate summary target data values in the plurality of data interaction intermediate summary records corresponding to the current account, and setting the sum as a final summary target data value.
In some embodiments, wherein: the data interaction record is a transaction bill; the target data value comprises a transaction amount of the transaction bill; the source account number is a buyer account number of the transaction bill; and the destination account number is a seller account number of the transaction bill.
The application also discloses a distributed data management system, including: at least one memory including at least one instruction set; and at least one processor, communicatively coupled to the at least one memory, the at least one processor executing the method for multi-level data summarization according to the at least one instruction set.
The present application also discloses a summarized data management system, including: at least one memory including at least one instruction set; at least one processor, communicatively coupled to the at least one memory, that when the summarized data management system is operating, reads and executes the at least one instruction set, and performs, as indicated by the at least one instruction set: receiving a plurality of data interaction intermediate summary records from N unit databases in a distributed data management system, wherein a plurality of data interaction records are stored in each unit database of the N unit databases, the plurality of data interaction records stored in each unit database correspond to J destination account numbers, each unit database summarizes the plurality of data interaction records stored in each unit database based on the J destination account numbers to generate J data interaction intermediate summary records, sends the J data interaction intermediate summary records to the summary data management system, and summarizes the plurality of data interaction intermediate summary records based on the J destination account numbers to generate a J data interaction final summary record.
In some embodiments, the generating the J data interaction final summary records includes: sequentially acquiring the J target account numbers and setting the J target account numbers as current account numbers; and traversing the plurality of data interaction intermediate summary records, summing all intermediate summary target data values in the plurality of data interaction intermediate summary records corresponding to the current account, and setting the sum as a final summary target data value.
According to the distributed data management system, before detail data to be summarized are registered in a summarizing database, the detail data to be summarized in each sub-table are firstly summarized into small-batch task data in the distributed database according to service dimensions; and then, the small batch of task data is registered in a summary task table in a summary database one by one. And the summary data management system then summarizes the small batches of tasks in the summary task list into a summary batch list according to different service dimensions.
The application provides a method for multi-level data summarization: firstly, before the detail data to be summarized is registered in a summarizing database, the detail data to be summarized, which are split in different sub-tables according to the dimension of a buyer, are firstly summarized into small batches of tasks with different service dimensions in a distributed database in real time, and each service dimension only has N small batches of tasks at most, so that the data distribution can be balanced, and the data inclination caused by excessive detail data of a certain dimension can be avoided; secondly, when small batches of tasks are registered in the summary database, the summary task list in the summary database cannot be increased progressively due to the increase of the detail data, and cannot be expanded due to the rapid development of services, so that the problem of data redundancy in the summary database caused by one-to-one synchronization of the detail data is solved; furthermore, the interaction times of the distributed database and the summarized database can be greatly reduced, the system overhead is reduced, and the data summarizing performance is improved.
Drawings
Fig. 1 illustrates an application scenario of a multi-level data summarization method provided in an embodiment of the present application;
fig. 2 is a schematic structural diagram of a server provided according to an embodiment of the present application;
FIG. 3 illustrates a method of data summarization provided according to an embodiment of the present application;
FIG. 4 is a flow chart illustrating a method for aggregating multiple levels of data according to an embodiment of the present application; and
fig. 5 is a schematic diagram illustrating an operation process of a multi-level data summarization method according to an embodiment of the present application.
Detailed Description
The following description is presented to enable any person skilled in the art to make and use the present disclosure, and is provided in the context of a particular application and its requirements. Various modifications to the disclosed embodiments will be readily apparent to those skilled in the art, and the general principles defined herein may be applied to other embodiments and applications without departing from the spirit and scope of the present application. Thus, the present application is not limited to the embodiments shown, but is to be accorded the widest scope consistent with the claims.
The terminology used herein is for the purpose of describing particular example embodiments only and is not intended to be limiting.
These and other features of the present application, as well as the operation and function of the related elements of structure and the combination of parts and economies of manufacture, may be significantly improved upon consideration of the following description. All of which form a part of this application, with reference to the accompanying drawings. It is to be expressly understood, however, that the drawings are for the purpose of illustration and description only and are not intended as a definition of the limits of the application.
These and other features of the present application, as well as the operation and function of the related elements of the structure, and the economic efficiency of assembly and manufacture, are significantly improved by the following description. All of which form a part of this application with reference to the accompanying drawings. It is to be expressly understood, however, that the drawings are for the purpose of illustration and description only and are not intended as a definition of the limits of the application. It should also be understood that the drawings are not drawn to scale.
The embodiment of the application provides a method for summarizing multi-level data. Fig. 1 shows an application scenario of a multi-level data summarization method provided in an embodiment of the present application. Specifically, the application scenario may include the business system 600, the data management system 100, and the payment system 700.
In some embodiments, the sub-base and sub-table rules may include: the data sub-base and the data sub-table are located based on the buyer identification in the business data.
For example, assuming that the buyer is identified as userId (hereinafter referred to as UID), the sub-table rule may be configured to locate the data sub-database and the data sub-table corresponding to the service data of the UID by taking a value of a specific bit in the UID.
Taking UID as an example, 1111222233335678, the rules of the sub-library and sub-table may be configured as: and taking the second last three digits of the UID to locate the data sub-database and the data sub-table. Since the second to last three digits of the UID are 67, the data sub-database 67 and the data sub-table 67 can be located. The service data corresponding to the UID will be stored in the data sub-table 67 under the data sub-database 67.
Of course, other rules of database and table division can be used. Or, for example, UID 1111222233335678, the rules of the sub-library and sub-table may be configured as: and (4) taking the last three digits of the UID to locate a data sub-database, and taking the second and third digits of the UID to locate a data sub-table. Since the last three digits of the UID are 6, the data sub-database 06 can be located, and the last three digits of the UID are 67, the data sub-table 67 can be located. The subsequent service data corresponding to the UID will be stored in the data sub-table 67 under the data sub-database 05.
The banking and sub-table rules described above are merely exemplary, and those skilled in the art will appreciate that other banking and sub-table rules may be employed by business system 600 to store business data to a distributed database system without departing from the core spirit of the present application.
It can be seen that in this way, the service data corresponding to the service executed by the service system 600 can be inserted into different data sub-tables for storage "uniformly". After the database sub-tables are completed, the service data generated by the service system 600 will be distributed in a plurality of data sub-tables in a plurality of different database sub-tables.
The data management system 100 may aggregate business data for the business system 600. By way of example, the data management system 100 may be some settlement platform. The settlement platform may aggregate the expense details generated by the business system 600 to generate a settlement order.
Distributed data management system 400 may include distributed database management system 410 and distributed database 420. Distributed database 420 is built by distributed database management system 410. Distributed database management system 410 may also manage and control distributed database 420. For example, distributed database management system 410 may query, add, update, delete, sum, sort, etc. data in distributed database 420. The distributed database 420 includes a plurality of databases, such as database db00, database db01, database db03, database db04, and so on. Each data sub-base can store one or more data sub-tables. Each data sub-table may store the service data.
Summary data management system 300 may include a summary database management system 310 and a summary database 320. The summary database 320 is established by the summary database management system 310. The summary database management system 310 may also manage and control the summary database 320. For example, the summary database management system 310 may query, add, update, delete, sum, sort, etc. the data in the summary database 320. Summary database 320 may have summary tables stored therein. The summary table may include data after summarizing the sub-tables in each sub-library in distributed database 420.
The data management system 100 may include a server 200. The server 200 may be a stand-alone server or a server cluster. Server 200 may be a server of distributed database system 400, a server of summarized data management system 300, or both distributed data management system 400 and summarized data management system 300. As an example, fig. 2 shows a schematic structural diagram of a server 200 provided according to an embodiment of the present application. For ease of understanding, in the following description of the present application, the structure and functions of the server 200 are described by taking the server 200 as an example of the server of the distributed data management system 400.
I/O components 260 support input/output between server 200 and other components.
The at least one processor 220 communicates with the at least one memory 230 via an internal communication bus 210. The at least one processor 220 is configured to execute the at least one instruction set, and when the at least one processor 220 executes the at least one instruction set, the server 200 implements the data summarization method provided herein. The processor 220 may perform some of the steps included in the data summarization method. Processor 220 may be in the form of one or more processors, and in some embodiments, processor 220 may include one or more hardware processors, such as microcontrollers, microprocessors, Reduced Instruction Set Computers (RISC), Application Specific Integrated Circuits (ASICs), application specific instruction set processors (ASIPs), Central Processing Units (CPUs), Graphics Processing Units (GPUs), Physical Processing Units (PPUs), microcontroller units, Digital Signal Processors (DSPs), Field Programmable Gate Arrays (FPGAs), Advanced RISC Machines (ARM), Programmable Logic Devices (PLDs), any circuit or processor capable of executing one or more functions, or the like, or any combination thereof. For illustrative purposes only, only one processor 220 is depicted in server 200 in the present application. It should be noted, however, that the server 200 may also include multiple processors, and thus, the operations and/or method steps disclosed herein may be performed by one processor, as described herein, or by a combination of multiple processors. For example, if in the present application the processor 220 of the server 200 performs step a and step B, it should be understood that step a and step B may also be performed jointly or separately by two different processors 220 (e.g., a first processor performs step a, a second processor performs step B, or a first and second processor performs steps a and B together).
The port 250 is used for data communication between the server 200 and the outside. For example, the server 200 may be connected to the network 130 through the port 250, and further receive the service data sent from the service system 600, or send the preliminarily summarized service data to the summarized data management system 300.
With continued reference to fig. 1, the payment system 700 may unify payment and collection for the completed statement aggregated by the data management system 100. By way of example, the payment system 700 may include, but is not limited to, a third party payment platform, a bank, and the like.
Fig. 3 illustrates a method for data aggregation provided according to an embodiment of the present application. The business detail data to be aggregated for business system 600 is scattered in 100 separate-base sub-tables (i.e., 00 table, 01 table, 99 table). The 100 parts of the sub-tables are split according to the reciprocal two and the three of the dimensionality of the UID of the buyer. The distributed data management system 400 registers these detailed data one-to-one into the summary task table of the summary database 320. The summary data management system 300 summarizes the business data in the summary task table into a summary batch table according to different business dimensions. The business dimension may refer to a dimension of a seller (i.e., a merchant). That is, one business dimension corresponds to one seller.
The method of data summarization shown in FIG. 3: firstly, each detail data to be summarized is registered in a summary task table one by one, and summary tasks registered every day are in the level of ten million, so that the summary task table is increased quickly, although a Database Administrator (hereinafter referred to as DBA) can help to clean summary task data regularly, the summary task table is increased too fast, and manual cleaning is not a sustainable method; secondly, since all detail data need to be registered in the summary task table, when summary is performed according to different service dimensions, overtime data fishing occurs due to excessive detail data of a certain dimension, and system stability is affected; thirdly, the distributed data management system 400 and the summarized data management system 300 where the detail data are located need to interact frequently, the specific interaction times are equal to the number of the detail data, extra system overhead is increased, and the performance of data processing is affected.
The application also provides a method for summarizing the multi-level data. Fig. 4 shows a flowchart of a multi-level data summarization method S100 according to an embodiment of the present application. Fig. 5 is a schematic diagram illustrating an operation process of a multi-level data summarization method S100 according to an embodiment of the present application.
The process S100 shown in fig. 4 includes a data summarization method performed by the distributed data management system 400 and a data summarization method performed by the summarized data management system 300. Some of the steps of process S100 may be performed by distributed data management system 400 and some of the steps may be performed by summarized data management system 300.
The illustrated operations of the flow S100 presented below are intended to be illustrative and not limiting. In some embodiments, the process S100 may be implemented with one or more additional operations not described, and/or with one or more operations described herein. Further, the order of the operations shown in FIG. 4 and described below is not intended to be limiting.
S110, the distributed data management system 400 acquires M data interaction records.
M may be an integer greater than 1. Each of the M interaction records at least comprises: a target data value, a source account number from which the target data flows out, and a destination account number from which the target data flows out.
As an example, the data interaction record may be business data generated by business system 600. As an example, the data interaction record may be a transaction bill for a transaction between the buyer and the buyer. An interaction record may be a transaction bill. As an example, the target data value includes a transaction amount of the transaction bill. As an example, the source account number is a purchaser account number of the transaction bill. As an example, the destination account number is a seller account number of the transaction bill. In some embodiments, the transaction bill may also include a transaction time.
S120, the distributed data management system 400 stores the M data interaction records into the distributed database 420.
Distributed database 420 may include N cell databases. The N cell databases may be N sub-libraries of distributed database 420. Wherein N is an integer greater than 1. For ease of understanding, in the following description of the present application, the data summarization method S100 is described in detail by taking N equal to 100 as an example.
In some embodiments, distributed data management system 400 stores the M data interaction records in distributed database 420 based on the source account number. As an example, the process of the distributed data management system 400 storing the M data interaction records in the distributed database 420 based on the source account may include:
s121, for each data interaction record in the M-ratio data interaction records, the distributed data management system 400 obtains K digits of a predetermined position in the source account.
And S122, for each data interaction record in the M-ratio data interaction records, the distributed data management system 400 stores the data interaction record in the unit database corresponding to the K-bit number.
And K is a natural number. As an example, the K-bit number of the predetermined location may be a number of a second last bit and a third last bit of the data interaction record.
As can be seen from the foregoing description, the service system 600 may generate the M data interaction records, and insert each data interaction record in the M data interaction records into the corresponding sub-database sub-table according to the preset sub-database sub-table rule.
For example, assume the rules of database and table are: and taking the second last three digits of the UID to locate the data sub-database and the data sub-table. The business system 600 stores each data interaction record in the sub-repository sub-table corresponding to the second to last and three digits of its UID.
As can be appreciated from the foregoing description, each sub-bank of distributed database 420 may have a sub-table stored therein. Then, there are a total of 100 sublists for the 100 sublots. Referring to fig. 5, this is: 00 gauge, 01 gauge, 99 gauge. Each sub-table can comprise a plurality of data interaction records in the M data interaction records. A data interaction record may be a row record in a sublist.
S130, the distributed data management system 400 summarizes the multiple data interaction records for each of the N unit databases based on the destination account number, and generates J data interaction intermediate summary records.
The J is a natural number. The multiple data interaction records stored in each unit database correspond to J destination account numbers. For example, the destination account number may comprise a seller account number, as described above. One destination account corresponds to one seller account. The J destination accounts correspond to the J seller accounts.
Referring to fig. 5, taking N-100 and J-2 as an example, the traffic details to be summarized are scattered in 100 sub-pool sub-tables (00 table, 01 table, 99 table) divided by 100 three bits from the last to the last of the buyer UID. Wherein, one sub-table contains a plurality of detail data. Taking the 00 table as an example, the 00 table includes a plurality of detail data, a row of record in the 00 table may represent one piece of detail data, and taking the detail data as a transaction bill as an example, one piece of detail data may include information such as a transaction amount, a buyer account number, a seller account number, and the like. If the partition is divided according to the dimension of the seller account, one sub-table (for example, a 00 table) may include 2 seller accounts, namely a seller account a and a seller account B, that is, the dimension a and the dimension B in fig. 5. That is, all the detail data in the 00 table belong to two seller accounts, namely, a seller account A and a seller account B.
The distributed data management system 400 may summarize, for each sub-table in each sub-base, the multiple data interaction records in the sub-table based on the seller account number, and generate 2 intermediate summarized records. Taking the 00 table as an example, the distributed data management system 400 summarizes the detail data in the 00 table according to the seller account a and the seller account B to generate 2 intermediate summary records, where one intermediate summary record includes all transaction amounts of one seller account.
Specifically, the generating of the J data interaction intermediate summary records by the distributed data management system 400 may include:
s131, the distributed data management system 400 sequentially acquires the J destination account numbers and sets the J destination account numbers as current destination account numbers; and
s132, the distributed data management system 400 traverses the multiple data interaction records, sums up the target data values in all the data interaction records corresponding to the current account, and sets the sum as the intermediate summarized target data value.
Taking the 00 table shown in fig. 5 as an example, the distributed data management system 400 obtains two seller accounts, namely a seller account a and a seller account B, in the 00 table.
The distributed data management system 400 sets the seller account a as the current destination account, traverses the detail data of all rows in the 00 table, sums up the transaction amounts of the detail data of which all the seller accounts are a in the 00 table, and generates the small-batch task data 10 of the dimension a.
The distributed data management system 400 sets the seller account B as the current destination account, traverses the detail data of all rows in the 00 table, sums up the transaction amount of the detail data of which the seller account is B in the 00 table, and generates the small-batch task data 20 of the dimension B.
The small batch of task data 10 of dimension a and the small batch of task data 20 of dimension B may each be an added row of records in the 00 table by the distributed database management system 410.
Similarly, the distributed data management system 400 processes all the sub-tables of the 100 sub-tables in sequence according to the above steps.
S140, the distributed data management system 400 sends the multiple data interaction intermediate summary records in the N unit databases to the summary database 320.
S150, the summarized data management system 300 summarizes the plurality of data interaction intermediate summarized records in the N unit databases based on the target account number, and generates J data interaction final summarized records.
With continued reference to FIG. 5, the distributed data management system 400 registers a plurality of intermediate summary records (i.e., the small batch of tasks of FIG. 5) with the summary database 320, and the summary database 320 forms a summary task table 30 including the plurality of intermediate summary records. Summary data management system 300 may aggregate the data in summary task table 30 to generate a final summary record according to steps 151 and 152:
s151, the summarized data management system 300 sequentially obtains the J destination account numbers, and sets them as the current destination account numbers.
S152, the summarized data management system 300 traverses the multiple data interaction intermediate summarized records, sums all intermediate summarized target data values in the multiple data interaction intermediate summarized records corresponding to the current account, and sets the sum as a final summarized target data value.
With continued reference to fig. 5, the summary data management system 300 obtains two vendor accounts in the summary task table 30, namely a vendor account a and a vendor account B,
the summary data management system 300 sets the seller account a as the current destination account, traverses the small-batch task data of all rows in the summary task table 30, sums the transaction amounts of the small-batch task data of all the seller accounts a in the summary task table 30, and generates the summary batch 41 of the dimension a. The transaction amount of the aggregated batch 41 of dimension a is the statement of the seller account a.
The summary data management system 300 sets the seller account B as the current destination account, traverses the small-batch task data of all rows in the summary task table 30, sums the transaction amounts of the small-batch task data of all the seller accounts B in the summary task table 30, and generates the summary batch 42 of the dimension B. The transaction amount of the aggregated batch 42 of dimension B is the statement of the seller's account B.
The summary data management system 300 generates a summary batch table 40, with a row of data of the summary batch table 40 including a statement for a seller dimension. For example, in FIG. 5, the aggregated lot table 40 includes two rows of records, the record 41 includes a statement of the account A of the seller, and the record 42 includes a statement of the account B of the seller.
The payment system 700 may then proceed to subsequently settle the seller account number a and the seller account number B according to the aggregated lot table 40.
To sum up, before the detail data to be summarized is registered in the summarizing database 320, the detail data to be summarized in each sub-table is summarized into small batch of task data (such as the small batch of task data 10 of dimension a and the small batch of task data 20 of dimension B) in the distributed database 420 according to the service dimensions (such as dimension a and dimension B); the small batch of task data is then registered one-to-one in the summary task table 30 in the summary database 320. Wherein the one-to-one relationship indicates that a small batch of task data in the distributed database 420 corresponds to a small batch of tasks in the summary task table 30. The summary data management system 300 then aggregates the small batches of tasks in the summary task table 30 into the summary batch table 40 according to different business dimensions (e.g., dimension a and dimension B).
Compared with the data summarizing method shown in fig. 3, the multi-level data summarizing method of the present application includes:
firstly, before the detail data to be summarized is registered in the summarizing database 320, the detail data to be summarized, which is split into different sub-tables according to the buyer dimension, is firstly summarized into small batches of tasks (such as the small batch of tasks 10 and the small batch of tasks 20) with different business dimensions in real time in the distributed database 420, and each business dimension has only N small batches of tasks at most (such as the small batch of tasks in the dimension a has only 100 at most), so that the data distribution can be balanced, and the data inclination caused by excessive detail data of a certain dimension can be avoided.
Secondly, when small batches of tasks are registered in the summary database 420, the summary task table 30 in the summary database 320 is not incremented due to the increase of the detail data, and is not expanded due to the rapid development of services, so that the problem of data redundancy in the summary database 320 caused by one-to-one synchronization of the detail data in the data summary method shown in fig. 3 is solved.
Further, the number of interactions between distributed database 420 and summary database 320 may be substantially reduced. For example, in the summarization method shown in fig. 3, the number of interactions required by distributed database 420 and summarization database 320 is equal to the number of detail data; in the summarization method shown in fig. 5, the number of interactions between the distributed database 420 and the summarization database 320 is equal to "the number of service dimensions" x "the number of sub-tables", that is, 2 × 100 is 200, which is much smaller than the number of details, thereby reducing the system overhead and improving the performance of data summarization.
In conclusion, upon reading the present detailed disclosure, those skilled in the art will appreciate that the foregoing detailed disclosure can be presented by way of example only, and not limitation. Those skilled in the art will appreciate that the present application is intended to cover various reasonable variations, adaptations, and modifications of the embodiments described herein, although not explicitly described herein. Such alterations, improvements, and modifications are intended to be suggested by this application and are within the spirit and scope of the exemplary embodiments of the application.
Furthermore, certain terminology has been used in this application to describe embodiments of the application. For example, "one embodiment," "an embodiment," and/or "some embodiments" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. Therefore, it is emphasized and should be appreciated that two or more references to "an embodiment" or "one embodiment" or "an alternative embodiment" in various portions of this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined as suitable in one or more embodiments of the application.
It should be appreciated that in the foregoing description of embodiments of the present application, various features are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure aiding in the understanding of such feature. Alternatively, various features may be dispersed throughout several embodiments of the application. This is not to be taken as an admission that any of the features of the claims are essential, and it is fully possible for a person skilled in the art to extract some of them as separate embodiments when reading the present application. That is, embodiments in the present application may also be understood as an integration of multiple sub-embodiments. And each sub-embodiment described herein is equally applicable to less than all features of a single foregoing disclosed embodiment.
In some embodiments, numbers expressing quantities or properties useful for describing and claiming certain embodiments of the present application are to be understood as being modified in certain instances by the terms "about", "approximately" or "substantially". For example, "about", "approximately" or "substantially" may mean a ± 20% variation of the value it describes, unless otherwise specified. Accordingly, in some embodiments, the numerical parameters set forth in the written description and attached claims are approximations that may vary depending upon the desired properties sought to be obtained by a particular embodiment. In some embodiments, numerical parameters should be construed in light of the number of reported significant digits and by applying ordinary rounding techniques. Notwithstanding that the numerical ranges and parameters setting forth the broad scope of the embodiments of the application are approximations, the numerical values set forth in the specific examples are reported as precisely as possible.
Each patent, patent application, publication of a patent application, and other material, such as articles, books, descriptions, publications, documents, articles, and the like, cited herein is hereby incorporated by reference. All matters hithertofore set forth herein except as related to any prosecution history, may be inconsistent or conflicting with this document or any prosecution history which may have a limiting effect on the broadest scope of the claims. Now or later associated with this document. For example, if there is any inconsistency or conflict in the description, definition, and/or use of terms associated with any of the included materials with respect to the terms, descriptions, definitions, and/or uses associated with this document, the terms in this document are used.
Finally, it should be understood that the embodiments of the application disclosed herein are illustrative of the principles of the embodiments of the present application. Other modified embodiments are also within the scope of the present application. Accordingly, the disclosed embodiments are presented by way of example only, and not limitation. Those skilled in the art may implement the present application in alternative configurations according to the embodiments of the present application. Thus, embodiments of the present application are not limited to those embodiments described with precision in the application.
Claims (11)
1. A method of multi-level data summarization, comprising:
obtaining M data interaction records, wherein M is an integer greater than 1, and each data interaction record in the M data interaction records at least comprises: a target data value, a source account number from which the target data flows out, and a destination account number from which the target data flows out;
storing the M data interaction records into a distributed database, wherein the distributed database comprises N unit databases, each unit database stores a plurality of data interaction records in the M data interaction records, and N is an integer greater than 1;
summarizing the multiple data interaction records for each unit database in the N unit databases based on the target account number to generate J data interaction intermediate summarized records, wherein J is a natural number; and
and sending the plurality of data interaction intermediate summary records in the N unit databases to a summary database.
2. The method of claim 1 wherein said storing said M data interaction records into a distributed database comprises:
and storing the M data interaction records into a distributed database based on the source account.
3. The method of claim 2, wherein the storing the M data interaction records in a distributed database based on the source account number comprises, for each of the M data interaction records:
acquiring K digits of a preset position in the source account, wherein K is a natural number; and
and storing the data interaction records into the unit database corresponding to the K-bit digits.
4. The method of claim 3, wherein the K digits of the predetermined location are the second to last and third to last digits of the data interaction record.
5. The method of claim 1, wherein the plurality of data interaction records stored in each unit database correspond to J destination account numbers;
the generating the J data interaction intermediate summary records comprises:
sequentially acquiring the J destination account numbers, setting the J destination account numbers as current account numbers, and
and traversing the plurality of data interaction records, summing the target data values in all the data interaction records corresponding to the current account, and setting the sum as a middle summary target data value.
6. The method of claim 5, further comprising, in the summary database:
and summarizing a plurality of data interaction intermediate summary records from the N unit databases based on the J target account numbers to generate J data interaction final summary records.
7. The method of claim 6, wherein the generating J data interaction final summary records comprises:
sequentially acquiring the J target account numbers and setting the J target account numbers as current account numbers; and
and traversing the plurality of data interaction intermediate summary records, summing all intermediate summary target data values in the plurality of data interaction intermediate summary records corresponding to the current account, and setting the sum as a final summary target data value.
8. The method of claim 1, wherein: the data interaction record is a transaction bill; the target data value comprises a transaction amount of the transaction bill; the source account number is a buyer account number of the transaction bill; and the destination account number is a seller account number of the transaction bill.
9. A distributed data management system, comprising:
at least one memory including at least one instruction set; and
at least one processor, communicatively coupled to the at least one memory, the at least one processor performing the method for multi-level data summarization of any of claims 1-5 and 8 according to the at least one instruction set.
10. A summarized data management system comprising:
at least one memory including at least one instruction set;
at least one processor, communicatively coupled to the at least one memory, that when the summarized data management system is operating, reads and executes the at least one instruction set, and performs, as indicated by the at least one instruction set:
receiving a plurality of data interaction intermediate summary records from N unit databases in a distributed data management system, wherein a plurality of data interaction records are stored in each unit database of the N unit databases, the plurality of data interaction records stored in each unit database correspond to J destination account numbers, each unit database summarizes the plurality of data interaction records stored in each unit database based on the J destination account numbers to generate J data interaction intermediate summary records, and sends the J data interaction intermediate summary records to the summary data management system, and
and summarizing the plurality of data interaction intermediate summary records based on the J destination account numbers to generate J data interaction final summary records.
11. The summary data management system of claim 10, wherein the generating J data interaction final summary records comprises:
sequentially acquiring the J target account numbers and setting the J target account numbers as current account numbers; and
and traversing the plurality of data interaction intermediate summary records, summing all intermediate summary target data values in the plurality of data interaction intermediate summary records corresponding to the current account, and setting the sum as a final summary target data value.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010709878.8A CN111782733A (en) | 2020-07-22 | 2020-07-22 | Multi-level data summarizing method, distributed data management system and summarized data management system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010709878.8A CN111782733A (en) | 2020-07-22 | 2020-07-22 | Multi-level data summarizing method, distributed data management system and summarized data management system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111782733A true CN111782733A (en) | 2020-10-16 |
Family
ID=72764359
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010709878.8A Pending CN111782733A (en) | 2020-07-22 | 2020-07-22 | Multi-level data summarizing method, distributed data management system and summarized data management system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111782733A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112598327A (en) * | 2020-12-31 | 2021-04-02 | 平安银行股份有限公司 | Service processing system, method, device and storage medium |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2017016257A1 (en) * | 2015-07-24 | 2017-02-02 | 广州交易猫信息技术有限公司 | Account product transaction processing method and system, and transaction server |
CN106570029A (en) * | 2015-10-12 | 2017-04-19 | 阿里巴巴集团控股有限公司 | Data processing method and data processing system of distributed relation type database |
CN107038182A (en) * | 2016-09-27 | 2017-08-11 | 阿里巴巴集团控股有限公司 | Divide the completeness inspection method and device of table data |
WO2017166174A1 (en) * | 2016-03-31 | 2017-10-05 | 李昕光 | Central clearing and settlement method |
CN107273192A (en) * | 2016-04-06 | 2017-10-20 | 阿里巴巴集团控股有限公司 | A kind of propulsion method of product trading, server and system |
CN108009883A (en) * | 2017-11-30 | 2018-05-08 | 泰康保险集团股份有限公司 | Method and device for order processing |
CN109034988A (en) * | 2018-07-26 | 2018-12-18 | 北京京东金融科技控股有限公司 | A kind of accounting entry generation method and device |
CN109145051A (en) * | 2018-07-03 | 2019-01-04 | 阿里巴巴集团控股有限公司 | The data summarization method and device and electronic equipment of distributed data base |
CN109615514A (en) * | 2018-11-27 | 2019-04-12 | 宝付网络科技(上海)有限公司 | Hot spot account trading system and method |
CN110298746A (en) * | 2019-07-04 | 2019-10-01 | 中国工商银行股份有限公司 | Hot spot account concurrent data processing system and method |
-
2020
- 2020-07-22 CN CN202010709878.8A patent/CN111782733A/en active Pending
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2017016257A1 (en) * | 2015-07-24 | 2017-02-02 | 广州交易猫信息技术有限公司 | Account product transaction processing method and system, and transaction server |
CN106570029A (en) * | 2015-10-12 | 2017-04-19 | 阿里巴巴集团控股有限公司 | Data processing method and data processing system of distributed relation type database |
WO2017166174A1 (en) * | 2016-03-31 | 2017-10-05 | 李昕光 | Central clearing and settlement method |
CN107273192A (en) * | 2016-04-06 | 2017-10-20 | 阿里巴巴集团控股有限公司 | A kind of propulsion method of product trading, server and system |
CN107038182A (en) * | 2016-09-27 | 2017-08-11 | 阿里巴巴集团控股有限公司 | Divide the completeness inspection method and device of table data |
CN108009883A (en) * | 2017-11-30 | 2018-05-08 | 泰康保险集团股份有限公司 | Method and device for order processing |
CN109145051A (en) * | 2018-07-03 | 2019-01-04 | 阿里巴巴集团控股有限公司 | The data summarization method and device and electronic equipment of distributed data base |
CN109034988A (en) * | 2018-07-26 | 2018-12-18 | 北京京东金融科技控股有限公司 | A kind of accounting entry generation method and device |
CN109615514A (en) * | 2018-11-27 | 2019-04-12 | 宝付网络科技(上海)有限公司 | Hot spot account trading system and method |
CN110298746A (en) * | 2019-07-04 | 2019-10-01 | 中国工商银行股份有限公司 | Hot spot account concurrent data processing system and method |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112598327A (en) * | 2020-12-31 | 2021-04-02 | 平安银行股份有限公司 | Service processing system, method, device and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7908242B1 (en) | Systems and methods for optimizing database queries | |
CN112199366A (en) | Data table processing method, device and equipment | |
CN104794146B (en) | The method and apparatus that commodity are screened and sorted in real time | |
CN103827908A (en) | Systems and methods for a large-scale credit data processing architecture | |
CN103177062A (en) | Accelerated query operators for high-speed, in-memory online analytical processing queries and operations | |
CN108256113B (en) | Data blood relationship mining method and device | |
CN112597153B (en) | Block chain-based data storage method, device and storage medium | |
CN105550225A (en) | Index construction method and query method and apparatus | |
CN108415835A (en) | Distributed data library test method, device, equipment and computer-readable medium | |
US10599614B1 (en) | Intersection-based dynamic blocking | |
US20220229814A1 (en) | Maintaining stable record identifiers in the presence of updated data records | |
CN112667612A (en) | Data quality checking method and device, electronic equipment and storage medium | |
CN111782733A (en) | Multi-level data summarizing method, distributed data management system and summarized data management system | |
CN114253930A (en) | Data processing method, device, equipment and storage medium | |
CN107209763B (en) | Rules for specifying and applying data | |
US20100169267A1 (en) | Method and system for data processing using multidimensional filtering | |
CN115422205A (en) | Data processing method and device, electronic equipment and storage medium | |
WO2023098034A1 (en) | Business data report classification method and apparatus | |
CN114564501A (en) | Database data storage and query methods, devices, equipment and medium | |
CN111723129B (en) | Report generation method, report generation device and electronic equipment | |
CN112905601A (en) | Routing method and device for database sub-tables | |
CN112634012A (en) | Service data processing method, device, server and storage medium | |
CN106484378A (en) | Data processing method and device that a kind of nothing is landed | |
US6442562B1 (en) | Apparatus and method for using incomplete cached balance sets to generate incomplete or complete cached balance sets and balance values | |
KR102411806B1 (en) | Systems and methods for database query efficiency improvement |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |