CN113886383A - Data processing method, data processing apparatus, electronic device, storage medium, and program product - Google Patents

Data processing method, data processing apparatus, electronic device, storage medium, and program product Download PDF

Info

Publication number
CN113886383A
CN113886383A CN202110985220.4A CN202110985220A CN113886383A CN 113886383 A CN113886383 A CN 113886383A CN 202110985220 A CN202110985220 A CN 202110985220A CN 113886383 A CN113886383 A CN 113886383A
Authority
CN
China
Prior art keywords
data
target data
key
target
message platform
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110985220.4A
Other languages
Chinese (zh)
Inventor
李斌
雷嘉健
戴启军
周贤舜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Lakala Payment Co ltd
Original Assignee
Lakala Payment Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Lakala Payment Co ltd filed Critical Lakala Payment Co ltd
Priority to CN202110985220.4A priority Critical patent/CN113886383A/en
Publication of CN113886383A publication Critical patent/CN113886383A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2462Approximate or statistical queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24553Query execution of query operations
    • G06F16/24554Unary operations; Data partitioning operations
    • G06F16/24556Aggregation; Duplicate elimination
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Software Systems (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the disclosure discloses a data processing method, a data processing device, an electronic device, a storage medium and a program product, wherein the method comprises the following steps: consuming first target data from a messaging platform using a first task; the first target data comprises a first data key and a first data value; generating second target data based on the first target data; the second target data comprises a second data key and a second data value; the second data key is a preset summarizing dimension in the first data value, and the second data value comprises the first data key and other data except the preset summarizing dimension in the first data value; restoring the second target data to the message platform; consuming the second target data from the messaging platform with a second task; performing data summarization based on the second data key in the second target data received from the message platform.

Description

Data processing method, data processing apparatus, electronic device, storage medium, and program product
Technical Field
The disclosed embodiments relate to the field of computer technologies, and in particular, to a data processing method, an apparatus, an electronic device, a storage medium, and a program product.
Background
The data summarization refers to a process of performing aggregate summarization statistics on the stored business detail data according to a specified dimension or a plurality of dimensions.
In the related art, data summarization generally summarizes data within a preset time duration at a time at a certain preset time point. For example, in the financial industry, it is often necessary to perform data summarization for financial data generated on the same day at one time during the end of the day. The data summarization can also perform real-time aggregation summarization statistics according to one or more specified dimensions on the data generated by the data generation end.
However, as the amount of data gradually increases, the data summarization is performed more and more time-consuming; and when the data summarization execution fails, the data summarization must be performed again. In addition, summarizing the real-time data can affect the normal operation of the service system.
Therefore, in the data summarization process, how to improve the data processing efficiency and reduce the processing time is one of the main technical problems to be solved by those skilled in the art.
Disclosure of Invention
The disclosed embodiment provides a data processing method, a data processing device, an electronic device, a storage medium and a program product.
In a first aspect, an embodiment of the present disclosure provides a data processing method, including:
consuming first target data from a messaging platform using a first task; the first target data comprises a first data key and a first data value;
generating second target data based on the first target data; the second target data comprises a second data key and a second data value; the second data key is a preset summarizing dimension in the first data value, and the second data value comprises the first data key and other data except the preset summarizing dimension in the first data value;
restoring the second target data to the message platform;
consuming the second target data from the messaging platform with a second task;
performing data summarization based on the second data key in the second target data received from the message platform.
Further, the first target data consumed by the first task from the message platform is data written by a data generating terminal, and a first data key of the first target data in the message platform is different from the second data key of the second target data in the message platform.
Further, after consuming the first target data from the messaging platform using the first task, the method further comprises:
extracting the first data key in the first target data and the preset summarizing dimension in the first data value;
and when the first data key is consistent with the preset summarizing dimension, summarizing data based on the first data key in the first target data.
Further, prior to consuming the first target data from the messaging platform using the first task, the method further comprises:
extracting newly generated first target data from a data generation end in real time;
and sending the extracted first target data to the message platform for storage.
Further, consuming the second target data from the messaging platform with a second task, comprising:
consuming the second target data from the same partition of the messaging platform using the same second task.
Further, data summarization based on the second data key in the second target data received from the message platform includes:
and summarizing a plurality of second data values corresponding to the same second data key received from the same partition of the message platform.
Further, the data generation end stores the first target data into the message platform in a mode of writing the first target data corresponding to the same first data key into the same partition.
Further, the data summarization end stores the second target data to the message platform in a manner of writing the second target data corresponding to the same second data key into the same partition.
In a second aspect, an embodiment of the present disclosure provides a data processing method, including:
a data generation end generates first target data and writes the first target data into a message platform; the first target data comprises a first data key and a first data value;
the message platform receives and stores the first target data;
the data summarization end consumes the first target data from the message platform by utilizing a first task and generates second target data based on the first target data; the second target data comprises a second data key and a second data value; the second data key is a preset summarizing dimension in the first data value, and the second data value comprises the first data key and other data except the preset summarizing dimension in the first data value;
the data summarization end also stores the second target data into the message platform again;
after the message platform receives the second target data, storing the second data corresponding to the same second data key into the same partition;
the data summarization end consumes the second target data from the message platform by using a second task;
the data summarization end performs data summarization based on the second data key in the second target data received from the message platform.
Further, the first target data consumed by the first task from the message platform is data written by a data generating terminal, and a first data key of the first target data in the message platform is different from the second data key of the second target data in the message platform.
Further, after the data summarization end consumes the first target data from the message platform by using the first task, the method further comprises:
the data summarization end extracts the first data key in the first target data and the preset summarization dimension in the first data value;
and when the first data key is consistent with the preset summarizing dimension, the data summarizing end summarizes data based on the first data key in the first target data.
Further, before the data summarization end consumes the first target data from the message platform by using the first task, the method further comprises:
the data summarization end extracts newly generated first target data from the data generation end in real time;
and the data summarization end sends the extracted first target data to the message platform for storage.
Further, the data summarization end consumes the second target data from the message platform by using a second task, comprising:
and the data summarization end consumes the second target data from the same partition of the message platform by using the same second task.
Further, the data summarization terminal performs data summarization based on the second data key in the second target data received from the message platform, including:
and the data summarizing end summarizes a plurality of second data values corresponding to the same second data key received from the same partition of the message platform.
Further, the data generation end stores the first target data into the message platform in a mode of writing the first target data corresponding to the same first data key into the same partition.
Further, the data summarization end stores the second target data to the message platform in a manner of writing the second target data corresponding to the same second data key into the same partition.
In a third aspect, an embodiment of the present disclosure provides a data processing apparatus, including:
a first consumption module configured to consume first target data from a message platform with a first task; the first target data comprises a first data key and a first data value;
a first generation module configured to generate second target data based on the first target data; the second target data comprises a second data key and a second data value; the second data key is a preset summarizing dimension in the first data value, and the second data value comprises the first data key and other data except the preset summarizing dimension in the first data value;
a first storage module configured to restore the second target data into the message platform;
a second consumption module configured to consume the second target data from the message platform with a second task;
a first summarization module configured to perform data summarization based on the second data key in the second target data received from the message platform.
The functions can be realized by hardware, and the functions can also be realized by executing corresponding software by hardware. The hardware or software includes one or more modules corresponding to the above-described functions.
In one possible design, the apparatus includes a memory configured to store one or more computer instructions that enable the apparatus to perform the corresponding method, and a processor configured to execute the computer instructions stored in the memory. The apparatus may also include a communication interface for the apparatus to communicate with other devices or a communication network.
In a fourth aspect, an embodiment of the present disclosure provides a data processing system, including a data generation end, a message platform, and a data summarization end, where:
the data generation end generates first target data and writes the first target data into the message platform; the first target data comprises a first data key and a first data value;
the message platform receives and stores the first target data;
the data summarization end consumes the first target data from a message platform by utilizing a first task and generates second target data based on the first target data; the second target data comprises a second data key and a second data value; the second data key is a preset summarizing dimension in the first data value, and the second data value comprises the first data key and other data except the preset summarizing dimension in the first data value;
the data summarization end also stores the second target data into the message platform again;
after the message platform receives the second target data, storing the second data corresponding to the same second data key into the same partition;
the data summarization end consumes the second target data from the message platform by using a second task;
the data summarization end performs data summarization based on the second data key in the second target data received from the message platform.
In a fifth aspect, the disclosed embodiment provides an electronic device, including a memory for storing one or more computer instructions for supporting any of the above apparatuses to perform the corresponding methods described above, and a processor configured to execute the computer instructions stored in the memory. Any of the above may also include a communication interface for communicating with other devices or a communication network.
In a sixth aspect, the disclosed embodiments provide a computer-readable storage medium for storing computer instructions for use by any of the above-mentioned apparatuses, including computer instructions for performing any of the above-mentioned methods.
In a seventh aspect, the disclosed embodiments provide a computer program product comprising computer instructions for implementing the steps of the method according to any one of the above aspects when executed by a processor.
The technical scheme provided by the embodiment of the disclosure can have the following beneficial effects:
according to the embodiment of the disclosure, in the process of summarizing data generated by a data production end in real time, the data summarizing end writes first target data of a message platform into the first task consumption data generation end, and generates second target data based on the first target data, wherein data keys of the first target data and the second target data are different, and the data key of the second target data is consistent with a preset summarizing dimension; and after the second target data are rewritten into the message platform, the data summarization end consumes the second target data from the message platform again by using the second task, and performs data summarization based on a second data key of the second target data. Because in the embodiment of the present disclosure, the data key of the second target data rewritten in the message platform is the preset aggregation dimension, so that the second target data with the same preset aggregation dimension can be written into the same partition, when data aggregation is performed based on the preset aggregation dimension, the same second task can be started for each partition to perform data consumption, and because the second target data corresponding to the same preset aggregation dimension is located in the same partition, the phenomenon of data competition does not occur, and the data aggregation efficiency can be improved.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of embodiments of the disclosure.
Drawings
Other features, objects, and advantages of embodiments of the disclosure will become more apparent from the following detailed description of non-limiting embodiments when taken in conjunction with the accompanying drawings. In the drawings:
FIG. 1 shows a flow diagram of a data processing method according to an embodiment of the present disclosure;
FIG. 2 shows a flow diagram of a data processing method according to another embodiment of the present disclosure;
FIG. 3 is a schematic diagram illustrating an application scenario of a data processing method according to an embodiment of the present disclosure;
FIG. 4 shows a block diagram of a data processing apparatus according to an embodiment of the present disclosure;
FIG. 5 shows a block diagram of a data processing system, according to an embodiment of the present disclosure;
FIG. 6 is a schematic block diagram of a computer system suitable for use in implementing a data processing method according to an embodiment of the present disclosure.
Detailed Description
Hereinafter, exemplary embodiments of the disclosed embodiments will be described in detail with reference to the accompanying drawings so that they can be easily implemented by those skilled in the art. Also, for the sake of clarity, parts not relevant to the description of the exemplary embodiments are omitted in the drawings.
In the disclosed embodiments, it is to be understood that terms such as "including" or "having," etc., are intended to indicate the presence of the disclosed features, numbers, steps, behaviors, components, parts, or combinations thereof, and are not intended to preclude the possibility that one or more other features, numbers, steps, behaviors, components, parts, or combinations thereof may be present or added.
It should be further noted that the embodiments and features of the embodiments in the present disclosure may be combined with each other without conflict. The embodiments of the present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.
Fig. 1 shows a flowchart of a data processing method according to an embodiment of the present disclosure, as shown in fig. 1, the data processing method includes the steps of:
in step S101, consuming first target data from a messaging platform with a first task; the first target data comprises a first data key and a first data value;
in step S102, generating second target data based on the first target data; the second target data comprises a second data key and a second data value; the second data key is a preset summarizing dimension in the first data value, and the second data value comprises the first data key and other data except the preset summarizing dimension in the first data value;
in step S103, the second target data is stored into the message platform again;
consuming the second target data from the messaging platform with a second task in step S104;
in step S105, data summarization is performed based on the second data key in the second target data received from the message platform.
In the above, data summarization refers to a process of performing aggregate summary statistics on stored business detail data according to a specified dimension or dimensions.
In the related art, data summarization generally summarizes data within a preset time duration at a time at a certain preset time point. For example, in the financial industry, it is often necessary to perform data summarization for financial data generated on the same day at one time during the end of the day. The data summarization can also perform real-time aggregation summarization statistics according to one or more specified dimensions on the data generated by the data generation end.
However, as the amount of data gradually increases, the data summarization is performed more and more time-consuming; and when the data summarization execution fails, the data summarization must be performed again. In addition, summarizing the real-time data can affect the normal operation of the service system.
Therefore, the embodiment of the disclosure provides a data processing method.
In an embodiment of the present disclosure, the data processing method may be applied to be executed by a data summarization end.
In an embodiment of the present disclosure, for data analysis, data aggregation is generally performed on different entities of a certain object, where the data may be data generated at a data generation end, and the data generation end pushes the generated data to a database or a message platform in real time for storage. For example, the data generating end may be a transaction system, the transaction system may generate transaction data of users at different merchants, and the data summarizing end may perform summary statistics on the transaction data generated by the transaction system, for example, statistics on transaction amount may be performed for different merchants. Generally, the transaction system stores the generated transaction data in a database or a message platform, and when the transaction system stores the data in the database or the message platform, the transaction system usually stores the data in the form of a data key-data value (i.e. key-va l ue), for example, after a piece of transaction information is generated in the transaction system, the piece of transaction information may include, but is not limited to, a transaction number, a transaction amount, a user, a merchant, and the like, the transaction system may use the transaction number therein as a data key, and store other data as a data value corresponding to the data key in the database or the message platform.
The data summarization end may obtain data stored by the data production end from a database or a message platform in order to summarize data of each entity in a certain class of objects.
The following illustrates the existing aggregation manner in the transaction data aggregation process:
in the personal payment scenario, the transaction involves two important participants: individuals and merchants. There may be several cash-receiving terminal devices under one merchant, and especially for large merchants, hundreds of thousands of transactions per day are also common. In order for a merchant to see the general overview of the transactions on the day in real time every day, the data summarization end needs to quickly accumulate all the transaction information under the merchant. For example, a merchant may wish to see daily summary data as follows:
the merchant name: xxx Co
Transaction total amount: 9882324.23 yuan
Total number of transactions: 100323 Pen
Total handling charge for transaction: 988.23 yuan
In a daily transaction volume hundred million-level transaction system, taking the case that the transaction system stores data in a database, accumulating transaction information for all merchants one by one, and establishing a data table in the database (taking relational data as an example) to store the accumulated information, assuming that main fields involved are as follows:
merchant number/merchant _ id
Statistics date/clear _ date
Transaction total/tran _ amt
Transaction count/tran _ count
Total handling charge/tran _ fe
A piece of data for the completed clearing is generated at the transaction system and stored in the database, for example as follows:
the payer: xxx
Payment amount: 59.00
Payment date: 2021-07-06
The merchant number: 1001
And (4) handling fee: 0.59
Other fields
If the transaction is the first transaction, the data summarization segment can complete the storage of summarized data through the following SQL statements:
insert into merchant_trans_summary_daily(merchant_id,clear_date,tran_amt,tran_count,tran_fee)values('1001','2021-07-06',59.00,1,0.59)。
however, if the transaction is not the first transaction, the update of the summary data may be accomplished through the following SQL statement:
update merchant_trans_summary_daily set tran_amt=tran_amt+59.00,tran_count=tran_count+1,tran_fee=tran_fee+0.59where merchant_id='1001'and clear_date='2021-07-06'。
therefore, in order to count the daily transaction situation of a merchant, when the transaction amount of the merchant is large, a large number of update statements operate the same record. If the database uses pessimistic locks, each update statement can compete for the row lock, and after a certain record obtains the row lock, other update requests can be suspended, so that the update efficiency is low. Even if the data is set as an optimistic lock, the transaction amount of a single merchant is still large in a short time (such as involving a second-kill type of robbery), so that a large number of requests need to be retried, and the efficiency is still not improved.
In fact, such requirements cannot be realized in the trading system under the above scenario, which is mainly considered due to the high requirements on performance and stability of the trading system, and in addition, statistics is a requirement of relatively consuming resources, and if data statistics is performed in the trading system, performance bottlenecks are likely to occur, which affects the stable operation of the trading system.
It is common practice at present to arrange a data summarization end specially if the transaction system stores newly generated transaction data in a message platform (e.g. kafka, pulsar, etc.), and then summarize the transaction data by consuming the transaction data in the message platform for real-time accumulation. If the data of the transaction system is not stored in the message platform, because a real-time extraction scheme is available for the mainstream relational data at present, corresponding message records can be generated and stored in the message platform through the real-time extraction scheme based on the write-in operation (insert, update, delete, etc.) of the transaction system to the database, and then the data are consumed from the message platform by the data summarization end and are accumulated in real time.
However, problems still remain in this manner. After the transaction summary end acquires data from the message platform, if data summary is directly performed through update operation, the above problem (lock contention or frequent retry) still faces.
To address this issue, embodiments of the present disclosure may utilize the partition property of the message platform such as kafka, pulsar, etc. to avoid the above-mentioned problems.
The following is an example of simplified transaction data:
id merchant_id clear_date tran_time tran_amt tran_fee
1 0001 2021-07-05 08:00:01 50.00 0.50
2 0002 2021-07-05 08:00:02 100.00 1.00
3 0003 2021-07-05 08:00:03 2000.00 20.00
4 0004 2021-07-05 08:00:04 400.00 4.00
5 0005 2021-07-05 08:00:05 1000.00 10.00
6 0006 2021-07-05 08:00:06 5000.00 50.00
the above 6 transactions belong to transactions under three merchants of 0001, 0002 and 0003.
Assuming that data of an original transaction system is written into a message platform and then stored in 3 partitions, and meanwhile, a primary key of a transaction record is selected as an id, the ids of all records are subjected to a hash algorithm (assuming that the hash algorithm is simple and the value of the id is directly obtained) and modulo 3 to obtain three values of 0, 1 and 2, so that the hash values of the ids of the data in the 3 partitions are the same after modulo. The data in such 3 partitions (stored in the form of key-values) is as follows:
the data within partition 0 is as follows:
key:{"id":3}value:{"id":3,"merchant_id":"0002","clear_date":"2021-07-05","tran_time":"2021-07-05 08:00:03","tran_amt":2000.00,"tran_fee":20.00}
key:{"id":6}value:{"id":6,"merchant_id":"0003","clear_date":"2021-07-05","tran_time":"2021-07-05 08:00:06","tran_amt":5000.00,"tran_fee":50.00}
the data within partition 1 is as follows:
key:{"id":1}value:{"id":1,"merchant_id":"0001","clear_date":"2021-07-05","tran_time":"2021-07-05 08:00:01","tran_amt":50.00,"tran_fee":0.50}
key:{"id":4}value:{"id":4,"merchant_id":"0002","clear_date":"2021-07-05","tran_time":"2021-07-05 08:00:04","tran_amt":400.00,"tran_fee":4.00}
the data within partition 2 is as follows:
key:{"id":2}value:{"id":2,"merchant_id":"0003","clear_date":"2021-07-05","tran_time":"2021-07-05 08:00:02","tran_amt":100.00,"tran_fee":1.00}
key:{"id":5}value:{"id":5,"merchant_id":"0001","clear_date":"2021-07-05","tran_time":"2021-07-05 08:00:05","tran_amt":1000.00,"tran_fee":10.00}
if the data summarization end directly consumes the data in the 3 partitions, considering that the consumption mode is that each partition only has one consumer at a certain time, that is, one task to consume, the consumed data is directly used for updating the table mergence _ trans _ summary _ date (the table for storing the statistical information mentioned in the above scenario).
However, when the data in the partition 0 is consumed, the merchant statistical data that needs to be updated includes two merchants 0002 and 0003, the merchants corresponding to the partition 1 are 0001 and 0002, and the merchants corresponding to the partition 2 are 0003 and 0001, and at this time, different consumption tasks (or one thread) have a problem of competition for the data that needs to be updated.
Therefore, in order to avoid the above problem, the data summarization end may create a task that is written into the message platform immediately after the data is consumed from the message platform, but the key of the written message is changed into a queue _ id, and the data content is as follows, also taking 3 partitions as an example:
the data within partition 0 is as follows:
key:{"merchant_id":"0001"}value:{"id":3,"merchant_id":"0001","clear_date":"2021-07-05","tran_time":"2021-07-05 08:00:01","tran_amt":50.00,"tran_fee":0.50}
key:{"merchant_id":"0001"}value:{"id":6,"merchant_id":"0001","clear_date":"2021-07-05","tran_time":"2021-07-05 08:00:05","tran_amt":5000.00,"tran_fee":50.00}
the data within partition 1 is as follows:
key:{"merchant_id":"0002"}value:{"id":3,"merchant_id":"0002","clear_date":"2021-07-05","tran_time":"2021-07-05 08:00:03","tran_amt":2000.00,"tran_fee":20.00}
key:{"merchant_id":"0002"}value:{"id":4,"merchant_id":"0002","clear_date":"2021-07-05","tran_time":"2021-07-05 08:00:04","tran_amt":400.00,"tran_fee":4.00}
the data within partition 2 is as follows:
key:{"merchant_id":"0003"}value:{"id":6,"merchant_id":"0003","clear_date":"2021-07-05","tran_time":"2021-07-05 08:00:06","tran_amt":5000.00,"tran_fee":50.00}
key:{"merchant_id":"0003"}value:{"id":2,"merchant_id":"0003","clear_date":"2021-07-05","tran_time":"2021-07-05 08:00:02","tran_amt":100.00,"tran_fee":1.00}
therefore, after the data are partitioned, the data of the same merchant enter the same partition, and therefore, a corresponding task can be created at the data summarization end for consumption.
In an embodiment of the present disclosure, the first task is first target data created on the data summarization end for use in the consumption message platform, the first target data is original data for performing data statistics, and the first target data may be data generated by the data generation end.
In an embodiment of the present disclosure, a storage format of the first target data in the message platform is in a key-value form, and the first target data includes a first data key and a first data value, the first data key is a primary key used when the data generation end writes data into the message platform, and the first data value is other data in a piece of data except the primary key.
In an embodiment of the present disclosure, the first data key in the first target data is different from a primary key used by the data summarization end for data summarization, and the primary key refers to a dimension for data summarization. For example, the first target data is transaction data, the first data key is a transaction ID in the transaction data, and the primary key used by the data summarization end for data summarization is a merchant ID in the transaction data.
In an embodiment of the present disclosure, the second target data has the same data content as the first target data, and is also in a Key-value form, and the first target data and the second target data are different in that a Key in the first target data and a Key in the second target data are different dimensions, so that after the first target data is acquired, the second target data can be generated by changing a primary Key of the first target data. If the first target data comprises a first data key and a first data value, and the first data value comprises values of a plurality of dimensions, the second target data comprises a second data key and a second data value, and the second data value comprises values of a plurality of dimensions; wherein the second data key is different from the first data key, and the second data key is a dimension in the first data value and the first data key is a dimension in the second data value.
The second data key may be predetermined, that is, the data summarization end may set a preset summarization dimension in advance, and then summarize data based on the preset summarization dimension. Therefore, when the second target data is generated, the data value corresponding to the preset summary dimension may be read from the first data value, and set as the second data key, and then the second target data may be obtained after the first data key and the data value corresponding to the other dimension in the first data value are taken as the second data value.
And the data summarization end rewrites the second target data into the message platform. In some embodiments, when writing the second target data to the message platform, the data summarization end uses a partition function of the message platform to write the data, so that the data identical to the second data key is written into the same partition. It will of course be appreciated that different data of the second data key may be written to different partitions or to the same partition.
One or more second tasks can be started at the data summarization end and used for consuming second target data in each partition, one second task can be started for the same partition to consume data, and data summarization can be performed on the basis of second data keys for the second target data consumed by the second tasks. Because the second target data with the same second data key are in the same partition, the data competition between the second tasks can not occur.
According to the embodiment of the disclosure, in the process of summarizing data generated by a data production end in real time, the data summarizing end writes first target data of a message platform into the first task consumption data generation end, and generates second target data based on the first target data, wherein data keys of the first target data and the second target data are different, and the data key of the second target data is consistent with a preset summarizing dimension; and after the second target data are rewritten into the message platform, the data summarization end consumes the second target data from the message platform again by using the second task, and performs data summarization based on a second data key of the second target data. Because in the embodiment of the present disclosure, the data key of the second target data rewritten in the message platform is the preset aggregation dimension, so that the second target data with the same preset aggregation dimension can be written into the same partition, when data aggregation is performed based on the preset aggregation dimension, the same second task can be started for each partition to perform data consumption, and because the second target data corresponding to the same preset aggregation dimension is located in the same partition, the phenomenon of data competition does not occur, and the data aggregation efficiency can be improved.
In an embodiment of the present disclosure, the first target data consumed by the first task from the message platform is data written by the data generating end, and a first data key of the first target data in the message platform is different from the second data key of the second target data in the message platform.
In an embodiment of the present disclosure, after the step S101 of consuming the first target data from the message platform by using the first task, the method may further include the following steps:
extracting the first data key in the first target data and the preset summarizing dimension in the first data value;
and when the first data key is consistent with the preset summarizing dimension, summarizing data based on the first data key in the first target data.
In this embodiment, when the data generation end writes data into the message platform, the used data key, that is, the first data key of the first target data, may be the same as the preset aggregation dimension. Therefore, after the data summarization end consumes the first target data from the message platform by using the first task, whether the first data key in the first target data is the same as the preset summarization dimension can be checked, if so, the data summarization can be performed directly based on the first data key in the first target data without regenerating the second target data; this is because, if the primary key used by the data generation end to write data to the message platform is consistent with the preset summarizing dimension, the same first data key in the message platform, that is, the first target data corresponding to the preset summarizing dimension, will be stored in the same partition, so that data competition will not occur even if data is consumed directly from the message platform and summarized, and thus, the data summarizing efficiency can be further improved.
In an embodiment of the present disclosure, before the step of consuming the first target data from the message platform by using the first task, the method further includes the following steps:
extracting newly generated first target data from a data generation end in real time;
and sending the extracted first target data to the message platform for storage.
In this optional implementation manner, for a scenario in which the data generation end does not write the generated data into the message platform, the newly generated first target data may be extracted in real time from the data generation end based on the function of extracting data in real time provided by the relational database, and then the first target data is written into the message platform. And then, the first task can be used for consuming the first target data from the message platform, generating second target data, writing the second target data into the message platform, consuming the second target data again and summarizing the data.
In an embodiment of the present disclosure, the step S104 of consuming the second target data from the message platform by using the second task may include the following steps:
consuming the second target data from the same partition of the messaging platform using the same second task.
In this embodiment, at the data summarization end, a second task may be created for the same partition in the message platform, and a second task consumes second target data in the same partition. Therefore, the same second task only consumes the second target data in the same partition, and the second target data corresponding to the same preset summarizing dimension are located in the same partition, so that the second task does not compete when consuming data, the condition that the task is suspended does not exist, and the data summarizing efficiency can be improved.
In an embodiment of the present disclosure, the step S105 of performing data aggregation based on the second data key in the second target data received from the message platform may include the following steps:
and summarizing a plurality of second data values corresponding to the same second data key received from the same partition of the message platform.
In this optional implementation manner, the second data key in the second target data is a preset aggregation dimension, and second target data corresponding to different second data keys may exist in the same partition of the message platform, so that when summarizing data, for second target data consumed from the same partition by the same second task, data aggregation may be performed based on the same second data key, that is, one or more dimensions of second data values in multiple pieces of second target data with the same second data key are aggregated respectively.
In an embodiment of the present disclosure, the data generation end stores the first target data into the message platform by writing the first target data corresponding to the same first data key into the same partition.
In an embodiment of the present disclosure, the data summarization end stores the second target data to the message platform by writing the second target data corresponding to the same second data key into the same partition.
When the data generation end writes data to the message platform, the first target data can be written into the message platform based on the partition storage mode provided by the message platform, and the first target data corresponding to the same first data key can be written into the same partition for data storage.
When the data summarization end writes data to the message platform, the second target data can be written into the message platform based on the partition storage mode provided by the message platform, and the second target data corresponding to the same second data key can be written into the same partition for data storage.
Fig. 2 shows a flowchart of a data processing method according to another embodiment of the present disclosure, which includes the following steps S201 to S207, as shown in fig. 2:
in step S201, a data generation end generates first target data, and writes the first target data into a message platform; the first target data comprises a first data key and a first data value;
in step S202, the message platform receives and stores the first target data;
in step S203, the data summarization end consumes the first target data from the message platform by using a first task, and generates a second target data based on the first target data; the second target data comprises a second data key and a second data value; the second data key is a preset summarizing dimension in the first data value, and the second data value comprises the first data key and other data except the preset summarizing dimension in the first data value;
in step S204, the data summarization end further stores the second target data into the message platform again;
in step S205, after receiving the second target data, the message platform performs data storage by storing the second number data corresponding to the same second data key to the same partition;
in step S206, the data summarization end consumes the second target data from the message platform by using a second task;
in step S207, the data summarization end performs data summarization based on the second data key in the second target data received from the message platform.
In the above, data summarization refers to a process of performing aggregate summary statistics on stored business detail data according to a specified dimension or dimensions.
In the related art, data summarization generally summarizes data within a preset time duration at a time at a certain preset time point. For example, in the financial industry, it is often necessary to perform data summarization for financial data generated on the same day at one time during the end of the day. The data summarization can also perform real-time aggregation summarization statistics according to one or more specified dimensions on the data generated by the data generation end.
However, as the amount of data gradually increases, the data summarization is performed more and more time-consuming; and when the data summarization execution fails, the data summarization must be performed again. In addition, summarizing the real-time data can affect the normal operation of the service system.
Therefore, the embodiment of the disclosure provides a data processing method.
In an embodiment of the present disclosure, the data processing method may be adapted to be executed on a data processing system where the data generating end and the data summarizing end are located.
In an embodiment of the present disclosure, for data analysis, data aggregation is generally performed on different entities of a certain object, where the data may be data generated at a data generation end, and the data generation end pushes the generated data to a database or a message platform in real time for storage. For example, the data generating end may be a transaction system, the transaction system may generate transaction data of users at different merchants, and the data summarizing end may perform summary statistics on the transaction data generated by the transaction system, for example, statistics on transaction amount may be performed for different merchants. Generally, the transaction system stores the generated transaction data in a database or a message platform, and when the transaction system stores the data in the database or the message platform, the transaction system usually stores the data in the form of a data key-data value (i.e. key-va l ue), for example, after a piece of transaction information is generated in the transaction system, the piece of transaction information may include, but is not limited to, a transaction number, a transaction amount, a user, a merchant, and the like, the transaction system may use the transaction number therein as a data key, and store other data as a data value corresponding to the data key in the database or the message platform.
The data summarization end may obtain data stored by the data production end from a database or a message platform in order to summarize data of each entity in a certain class of objects.
In an embodiment of the present disclosure, the first task is first target data created on the data summarization end for use in the consumption message platform, the first target data is original data for performing data statistics, and the first target data may be data generated by the data generation end.
In an embodiment of the present disclosure, a storage format of the first target data in the message platform is in a key-value form, and the first target data includes a first data key and a first data value, the first data key is a primary key used when the data generation end writes data into the message platform, and the first data value is other data in a piece of data except the primary key.
In an embodiment of the present disclosure, the first data key in the first target data is different from a primary key used by the data summarization end for data summarization, and the primary key refers to a dimension for data summarization. For example, the first target data is transaction data, the first data key is a transaction ID in the transaction data, and the primary key used by the data summarization end for data summarization is a merchant ID in the transaction data.
In an embodiment of the present disclosure, the second target data has the same data content as the first target data, and is also in a Key-value form, and the first target data and the second target data are different in that a Key in the first target data and a Key in the second target data are different dimensions, so that after the first target data is acquired, the second target data can be generated by changing a primary Key of the first target data. If the first target data comprises a first data key and a first data value, and the first data value comprises values of a plurality of dimensions, the second target data comprises a second data key and a second data value, and the second data value comprises values of a plurality of dimensions; wherein the second data key is different from the first data key, and the second data key is a dimension in the first data value and the first data key is a dimension in the second data value.
The second data key may be predetermined, that is, the data summarization end may set a preset summarization dimension in advance, and then summarize data based on the preset summarization dimension. Therefore, when the second target data is generated, the data value corresponding to the preset summary dimension may be read from the first data value, and set as the second data key, and then the second target data may be obtained after the first data key and the data value corresponding to the other dimension in the first data value are taken as the second data value.
And the data summarization end rewrites the second target data into the message platform. In some embodiments, when writing the second target data to the message platform, the data summarization end uses a partition function of the message platform to write the data, so that the data identical to the second data key is written into the same partition. It will of course be appreciated that different data of the second data key may be written to different partitions or to the same partition.
One or more second tasks can be started at the data summarization end and used for consuming second target data in each partition, one second task can be started for the same partition to consume data, and data summarization can be performed on the basis of second data keys for the second target data consumed by the second tasks. Because the second target data with the same second data key are in the same partition, the data competition between the second tasks can not occur.
According to the embodiment of the disclosure, in the process of summarizing data generated by a data production end in real time, the data summarizing end writes first target data of a message platform into the first task consumption data generation end, and generates second target data based on the first target data, wherein data keys of the first target data and the second target data are different, and the data key of the second target data is consistent with a preset summarizing dimension; and after the second target data are rewritten into the message platform, the data summarization end consumes the second target data from the message platform again by using the second task, and performs data summarization based on a second data key of the second target data. Because in the embodiment of the present disclosure, the data key of the second target data rewritten in the message platform is the preset aggregation dimension, so that the second target data with the same preset aggregation dimension can be written into the same partition, when data aggregation is performed based on the preset aggregation dimension, the same second task can be started for each partition to perform data consumption, and because the second target data corresponding to the same preset aggregation dimension is located in the same partition, the phenomenon of data competition does not occur, and the data aggregation efficiency can be improved.
In an embodiment of the present disclosure, the first target data consumed by the first task from the message platform is data written by the data generating end, and a first data key of the first target data in the message platform is different from the second data key of the second target data in the message platform.
In an embodiment of the present disclosure, after step S203, that is, after the step of consuming, by the data summarization end, the first target data from the message platform by using the first task, the method may further include the following steps:
the data summarization end extracts the first data key in the first target data and the preset summarization dimension in the first data value;
and when the first data key is consistent with the preset summarizing dimension, the data summarizing end summarizes data based on the first data key in the first target data.
In this embodiment, when the data generation end writes data into the message platform, the used data key, that is, the first data key of the first target data, may be the same as the preset aggregation dimension. Therefore, after the data summarization end consumes the first target data from the message platform by using the first task, whether the first data key in the first target data is the same as the preset summarization dimension can be checked, if so, the data summarization can be performed directly based on the first data key in the first target data without regenerating the second target data; this is because, if the primary key used by the data generation end to write data to the message platform is consistent with the preset summarizing dimension, the same first data key in the message platform, that is, the first target data corresponding to the preset summarizing dimension, will be stored in the same partition, so that data competition will not occur even if data is consumed directly from the message platform and summarized, and thus, the data summarizing efficiency can be further improved.
In an embodiment of the present disclosure, before the step S203, that is, the step of consuming the first target data from the message platform by using the first task, the method may further include the following steps:
the data summarization end extracts newly generated first target data from the data generation end in real time;
and the data summarization end sends the extracted first target data to the message platform for storage.
In this optional implementation manner, for a scenario in which the data generation end does not write the generated data into the message platform, the newly generated first target data may be extracted in real time from the data generation end based on the function of extracting data in real time provided by the relational database, and then the first target data is written into the message platform. And then, the first task can be used for consuming the first target data from the message platform, generating second target data, writing the second target data into the message platform, consuming the second target data again and summarizing the data.
In an embodiment of the present disclosure, the step S206, that is, the step of consuming, by the data summarization end, the second target data from the message platform by using a second task, may include the following steps:
and the data summarization end consumes the second target data from the same partition of the message platform by using the same second task.
In this embodiment, at the data summarization end, a second task may be created for the same partition in the message platform, and a second task consumes second target data in the same partition. Therefore, the same second task only consumes the second target data in the same partition, and the second target data corresponding to the same preset summarizing dimension are located in the same partition, so that the second task does not compete when consuming data, the condition that the task is suspended does not exist, and the data summarizing efficiency can be improved.
In an embodiment of the present disclosure, in step S207, that is, the step of the data summarization end performing data summarization based on the second data key in the second target data received from the message platform, may include the following steps:
and the data summarizing end summarizes a plurality of second data values corresponding to the same second data key received from the same partition of the message platform.
In this optional implementation manner, the second data key in the second target data is a preset aggregation dimension, and second target data corresponding to different second data keys may exist in the same partition of the message platform, so that when summarizing data, for second target data consumed from the same partition by the same second task, data aggregation may be performed based on the same second data key, that is, one or more dimensions of second data values in multiple pieces of second target data with the same second data key are aggregated respectively.
In an embodiment of the present disclosure, the data generation end stores the first target data into the message platform by writing the first target data corresponding to the same first data key into the same partition.
In an embodiment of the present disclosure, the data summarization end stores the second target data to the message platform by writing the second target data corresponding to the same second data key into the same partition.
When the data generation end writes data to the message platform, the first target data can be written into the message platform based on the partition storage mode provided by the message platform, and the first target data corresponding to the same first data key can be written into the same partition for data storage.
When the data summarization end writes data to the message platform, the second target data can be written into the message platform based on the partition storage mode provided by the message platform, and the second target data corresponding to the same second data key can be written into the same partition for data storage.
Technical terms and technical features related to the technical terms and technical features shown in fig. 2 and related embodiments are the same as or similar to those of the technical terms and technical features shown in fig. 1 and related embodiments, and for the explanation and description of the technical terms and technical features related to the technical terms and technical features shown in fig. 2 and related embodiments, reference may be made to the above explanation of the explanation of fig. 1 and related embodiments, and no further description is provided here.
Fig. 3 is a schematic diagram illustrating an application scenario of a data processing method according to an embodiment of the present disclosure. As shown in fig. 3, the transaction system generates transaction data whose main fields include transaction number, merchant number/merchant _ id, statistics date/clear _ date, transaction total/tran _ amt, transaction total/tran _ count, total commission/tran _ fe; the transaction system writes the transaction data into the message platform in real time, the transaction data written into the message platform takes the transaction number as a main key, and other fields are corresponding data values. After the information platform receives the transaction data written by the transaction system, the hash operation is carried out on the transaction number, so that the transaction data with the same transaction number are written into the same partition. The data summarization end creates a first task that consumes newly entered transaction data in real time from the messaging platform as a consumer. And the data summarizing end summarizes the transaction data generated by the data generating end, and the summarizing dimension selects a merchant number. And after receiving the transaction data from the message platform, the data summarization end changes the main key of the transaction data, and rewrites the original main key into the message platform after changing the transaction number into the merchant number. After the message platform receives the rewritten data, hash operation is carried out on the main key, namely the merchant number, so that the transaction data with the same merchant number can be written into the same partition. And the data summarizing end creates second tasks, the second tasks are used as the transaction data which are rewritten by consumers from the message platform, each second task only consumes the same subarea in the message platform, and the transaction data obtained from the message platform is summarized by taking the merchant number as the dimensionality. Because the same merchant number is in the same partition, that is, the data of the same merchant number are consumed and summarized by the same second task, the situation that other tasks compete for the same partition can not occur.
The following are embodiments of the disclosed apparatus that may be used to perform embodiments of the disclosed methods.
Fig. 4 shows a block diagram of a data processing apparatus according to an embodiment of the present disclosure, which may be implemented as part or all of an electronic device by software, hardware, or a combination of both. As shown in fig. 4, the data processing apparatus includes:
a first consuming module 401 configured to consume first target data from a message platform with a first task; the first target data comprises a first data key and a first data value;
a first generating module 402 configured to generate second target data based on the first target data; the second target data comprises a second data key and a second data value; the second data key is a preset summarizing dimension in the first data value, and the second data value comprises the first data key and other data except the preset summarizing dimension in the first data value;
a first storage module 403 configured to restore the second target data into the message platform;
a second consumption module 404 configured to consume the second target data from the message platform using a second task;
a first summarization module 405 configured to perform data summarization based on the second data key in the second target data received from the message platform.
In an embodiment of the present disclosure, the first target data consumed by the first task from the message platform is data written by the data generating end, and a first data key of the first target data in the message platform is different from the second data key of the second target data in the message platform.
In an embodiment of the present disclosure, after the first consuming module, the apparatus further includes:
an extraction module configured to extract the first data key in the first target data and the preset aggregation dimension in the first data value;
a second summarization module configured to summarize data based on the first data key in the first target data when the first data key is consistent with the preset summarization dimension.
In an embodiment of the present disclosure, before the first consuming module, the apparatus further includes:
the extraction module is configured to extract the newly generated first target data from the data generation end in real time;
a sending module configured to send the extracted first target data to the message platform for storage.
In an embodiment of the present disclosure, the second consumption module includes:
a consuming submodule configured to consume the second target data from a same partition of the messaging platform using a same second task.
In an embodiment of the present disclosure, the first summarizing module includes:
the summarizing submodule is configured to summarize a plurality of second data values corresponding to the same second data key received from the same partition of the message platform.
In an embodiment of the present disclosure, the data generation end stores the first target data into the message platform by writing the first target data corresponding to the same first data key into the same partition.
In an embodiment of the present disclosure, the data summarization end stores the second target data to the message platform by writing the second target data corresponding to the same second data key into the same partition.
FIG. 5 shows a block diagram of a data processing system according to an embodiment of the present disclosure. As shown in fig. 5, the data processing system includes a data generation end 501, a message platform 502, and a data summarization end 503, wherein:
the data generating end 501 generates first target data and writes the first target data into the message platform 502; the first target data comprises a first data key and a first data value;
the message platform 502 receives and stores the first target data;
the data summarization end 503 consumes the first target data from the message platform 502 with a first task and generates second target data based on the first target data; the second target data comprises a second data key and a second data value; the second data key is a preset summarizing dimension in the first data value, and the second data value comprises the first data key and other data except the preset summarizing dimension in the first data value;
the data summarization end 503 also stores the second target data to the message platform 502 again;
after the message platform 502 receives the second target data, data storage is performed in a manner of storing the second number data corresponding to the same second data key into the same partition;
the data summarization end 503 consumes the second target data from the message platform 502 using a second task;
the data summarization end 503 performs data summarization based on the second data key in the second target data received from the message platform 502.
In an embodiment of the present disclosure, the first target data consumed by the first task from the message platform 502 is data written by the data generating terminal 501, and a first data key of the first target data in the message platform 502 is different from the second data key of the second target data in the message platform 502.
In an embodiment of the present disclosure, after the data summarization end 503 consumes the first target data from the message platform 502 by using the first task, the data summarization end 503 further extracts the first data key in the first target data and the preset summarization dimension in the first data value; when the first data key is consistent with the preset summarizing dimension, the data summarizing end 503 summarizes data based on the first data key in the first target data.
In an embodiment of the present disclosure, before the data summarization end 503 consumes the first target data from the message platform 502 by using the first task, the data summarization end 503 extracts the newly generated first target data from the data generation end 501 in real time; the data summarization end 503 sends the extracted first target data to the message platform 502 for storage.
In an embodiment of the present disclosure, the data summarization end 503 consumes the second target data from the same partition of the message platform 502 using the same second task.
In an embodiment of the present disclosure, the data summarization end 503 summarizes a plurality of second data values corresponding to the same second data key received from the same partition of the message platform 502.
In an embodiment of the present disclosure, the data generating end 501 stores the first target data into the message platform 502 by writing the first target data corresponding to the same first data key into the same partition.
In an embodiment of the present disclosure, the data summarization end 503 stores the second target data to the message platform 502 by writing the second target data corresponding to the same second data key into the same partition.
The technical features related to the above device embodiments and the corresponding explanations and descriptions thereof are the same as, corresponding to or similar to the technical features related to the above method embodiments and the corresponding explanations and descriptions thereof, and for the technical features related to the above device embodiments and the corresponding explanations and descriptions thereof, reference may be made to the technical features related to the above method embodiments and the corresponding explanations and descriptions thereof, and details of the disclosure are not repeated herein.
The embodiment of the present disclosure also discloses an electronic device, which includes a memory and a processor; wherein the content of the first and second substances,
the memory is configured to store one or more computer instructions, wherein the one or more computer instructions are executed by the processor to perform any of the method steps described above.
FIG. 6 is a schematic block diagram of a computer system suitable for use in implementing a data processing method according to an embodiment of the present disclosure.
As shown in fig. 6, the computer system 600 includes a processing unit 601 which can execute various processes in the above-described embodiments according to a program stored in a Read Only Memory (ROM)602 or a program loaded from a storage section 608 into a Random Access Memory (RAM) 603. In the RAM603, various programs and data necessary for the operation of the computer system 600 are also stored. The processing unit 601, the ROM602, and the RAM603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.
The following components are connected to the I/O interface 605: an input portion 606 including a keyboard, a mouse, and the like; an output portion 607 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage section 608 including a hard disk and the like; and a communication section 609 including a network interface card such as a LAN card, a modem, or the like. The communication section 609 performs communication processing via a network such as the internet. The driver 610 is also connected to the I/O interface 605 as needed. A removable medium 611 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 610 as necessary, so that a computer program read out therefrom is mounted in the storage section 608 as necessary. The processing unit 601 may be implemented as a CPU, a GPU, a TPU, an FPGA, an NPU, or other processing units.
In particular, the above described methods may be implemented as computer software programs, according to embodiments of the present disclosure. For example, embodiments of the present disclosure include a computer program product comprising a computer program tangibly embodied on a machine-readable medium, the computer program comprising program code for performing the data transmission method. In such embodiments, the computer program may be downloaded and installed from a network through the communication section 609, and/or installed from the removable medium 611.
A computer program product is also disclosed in embodiments of the present disclosure, the computer program product comprising computer programs/instructions which, when executed by a processor, implement any of the above method steps.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowcharts or block diagrams may represent a module, a program segment, or a portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units or modules described in the embodiments of the present disclosure may be implemented by software or hardware. The units or modules described may also be provided in a processor, and the names of the units or modules do not in some cases constitute a limitation of the units or modules themselves.
As another aspect, the disclosed embodiment also provides a computer-readable storage medium, which may be the computer-readable storage medium included in the apparatus in the foregoing embodiment; or it may be a separate computer readable storage medium not incorporated into the device. The computer readable storage medium stores one or more programs for use by one or more processors in performing the methods described in the embodiments of the present disclosure.
The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention in the embodiments of the present disclosure is not limited to the specific combination of the above-mentioned features, but also encompasses other embodiments in which any combination of the above-mentioned features or their equivalents is made without departing from the inventive concept. For example, the above features and (but not limited to) the features with similar functions disclosed in the embodiments of the present disclosure are mutually replaced to form the technical solution.

Claims (10)

1. A method of data processing, comprising:
consuming first target data from a messaging platform using a first task; the first target data comprises a first data key and a first data value;
generating second target data based on the first target data; the second target data comprises a second data key and a second data value; the second data key is a preset summarizing dimension in the first data value, and the second data value comprises the first data key and other data except the preset summarizing dimension in the first data value;
restoring the second target data to the message platform;
consuming the second target data from the messaging platform with a second task;
performing data summarization based on the second data key in the second target data received from the message platform.
2. The method of claim 1, wherein the first target data consumed by the first task from a message platform is data written by a data producer, and a first data key of the first target data in the message platform is different from the second data key of the second target data in the message platform.
3. The method of claim 1 or 2, wherein after consuming the first target data from the messaging platform with the first task, the method further comprises:
extracting the first data key in the first target data and the preset summarizing dimension in the first data value;
and when the first data key is consistent with the preset summarizing dimension, summarizing data based on the first data key in the first target data.
4. The method of claim 1, wherein prior to consuming the first target data from the messaging platform with the first task, the method further comprises:
extracting newly generated first target data from a data generation end in real time;
and sending the extracted first target data to the message platform for storage.
5. A method of data processing, comprising:
a data generation end generates first target data and writes the first target data into a message platform; the first target data comprises a first data key and a first data value;
the message platform receives and stores the first target data;
the data summarization end consumes the first target data from the message platform by utilizing a first task and generates second target data based on the first target data; the second target data comprises a second data key and a second data value; the second data key is a preset summarizing dimension in the first data value, and the second data value comprises the first data key and other data except the preset summarizing dimension in the first data value;
the data summarization end also stores the second target data into the message platform again;
after the message platform receives the second target data, storing the second data corresponding to the same second data key into the same partition;
the data summarization end consumes the second target data from the message platform by using a second task;
the data summarization end performs data summarization based on the second data key in the second target data received from the message platform.
6. A data processing apparatus comprising:
a first consumption module configured to consume first target data from a message platform with a first task; the first target data comprises a first data key and a first data value;
a first generation module configured to generate second target data based on the first target data; the second target data comprises a second data key and a second data value; the second data key is a preset summarizing dimension in the first data value, and the second data value comprises the first data key and other data except the preset summarizing dimension in the first data value;
a first storage module configured to restore the second target data into the message platform;
a second consumption module configured to consume the second target data from the message platform with a second task;
a first summarization module configured to perform data summarization based on the second data key in the second target data received from the message platform.
7. A data processing system comprises a data generation end, a message platform and a data summarization end, wherein:
the data generation end generates first target data and writes the first target data into the message platform; the first target data comprises a first data key and a first data value;
the message platform receives and stores the first target data;
the data summarization end consumes the first target data from a message platform by utilizing a first task and generates second target data based on the first target data; the second target data comprises a second data key and a second data value; the second data key is a preset summarizing dimension in the first data value, and the second data value comprises the first data key and other data except the preset summarizing dimension in the first data value;
the data summarization end also stores the second target data into the message platform again;
after the message platform receives the second target data, storing the second data corresponding to the same second data key into the same partition;
the data summarization end consumes the second target data from the message platform by using a second task;
the data summarization end performs data summarization based on the second data key in the second target data received from the message platform.
8. An electronic device comprising a memory and a processor; wherein the content of the first and second substances,
the memory is configured to store one or more computer instructions, wherein the one or more computer instructions are executed by the processor to implement the steps of the method of any one of claims 1-5.
9. A computer readable storage medium having computer instructions stored thereon, wherein the computer instructions, when executed by a processor, implement the steps of the method of any one of claims 1-5.
10. A computer program product comprising computer programs/instructions which, when executed by a processor, carry out the steps of the method of any one of claims 1 to 5.
CN202110985220.4A 2021-08-26 2021-08-26 Data processing method, data processing apparatus, electronic device, storage medium, and program product Pending CN113886383A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110985220.4A CN113886383A (en) 2021-08-26 2021-08-26 Data processing method, data processing apparatus, electronic device, storage medium, and program product

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110985220.4A CN113886383A (en) 2021-08-26 2021-08-26 Data processing method, data processing apparatus, electronic device, storage medium, and program product

Publications (1)

Publication Number Publication Date
CN113886383A true CN113886383A (en) 2022-01-04

Family

ID=79011054

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110985220.4A Pending CN113886383A (en) 2021-08-26 2021-08-26 Data processing method, data processing apparatus, electronic device, storage medium, and program product

Country Status (1)

Country Link
CN (1) CN113886383A (en)

Similar Documents

Publication Publication Date Title
CN109034988B (en) Accounting entry generation method and device
JP5421269B2 (en) Non-overlapping ETL-less system and method for reporting OLTP data
CN108897874B (en) Method and apparatus for processing data
US20160381164A1 (en) Optimizing storage in a publish / subscribe environment
CN111427971B (en) Business modeling method, device, system and medium for computer system
US8407183B2 (en) Business intelligence data extraction on demand
CN111339073A (en) Real-time data processing method and device, electronic equipment and readable storage medium
CN113094434A (en) Database synchronization method, system, device, electronic equipment and medium
CN113076304A (en) Distributed version management method, device and system
CN110889754B (en) Method for improving processing efficiency of non-overdraft hot spot account
CN111611241A (en) Dictionary data operation method and device, readable storage medium and terminal equipment
CN106484791B (en) Data statistical method and device
CN113312259A (en) Interface testing method and device
CN113886383A (en) Data processing method, data processing apparatus, electronic device, storage medium, and program product
US9652766B1 (en) Managing data stored in memory locations having size limitations
CN116401270A (en) Data query method, device, computer equipment and storage medium
CN112905677A (en) Data processing method and device, service processing system and computer equipment
CN107678856B (en) Method and device for processing incremental information in business entity
CN112035503B (en) Transaction data updating method and device
CN112488708B (en) Block chain account relevance query method and false transaction screening method
TW202009846A (en) Floating income calculation method, apparatus and device, and computer-readable storage medium
CN112579605B (en) Data storage method, device, storage medium and server
CN117150112A (en) Push message generation method, device, equipment and storage medium based on redis
CN117196602A (en) Payment data processing method and device, computer equipment and storage medium
US7870098B2 (en) Method and system for maintaining historical data for data receivers

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination