Disclosure of Invention
One or more embodiments of the present specification describe a service data processing method, which can ensure the accuracy of a processing result under the condition of ensuring the timeliness of service data processing.
According to a first aspect, a method for processing service data is provided, the method comprising:
the method comprises the steps that an offline computing platform obtains a plurality of analysis results from a first middleware, wherein the analysis results are generated by analyzing a service data change log of a database by the first middleware;
Performing offline batch processing on the plurality of analysis results to obtain a first set of calculation results;
acquiring a target table generated by respectively carrying out online calculation on at least one analysis result acquired from the second middleware by the real-time calculation platform; the at least one analysis result belongs to an analysis result generated by analyzing the service data change log of the database by the second middleware;
comparing the first set with the target table;
when the calculation results in the first set and the calculation results in the target table are inconsistent, updating the target table based on the first set.
In some embodiments, wherein the business data is silver deposit data.
In some embodiments, wherein the first middleware is any one of canal, databus, keylet, otter;
the second intermediate piece is drc;
the link between the offline computing platform and the first middleware is different from the link between the real-time computing platform and the second middleware.
In some embodiments, the plurality of analysis results are results of analyzing, by the first middleware, change logs of the database in a preset period;
The at least one analysis result belongs to an analysis result generated by analyzing the change log of the database in the preset period by the second middleware.
In some embodiments, the plurality of analysis results are the results of the first middleware analyzing all change logs of the database;
the at least one analysis result belongs to an analysis result generated by analyzing a change log generated after the database is on line on the real-time computing platform by the second middleware.
In some embodiments, wherein said updating the target table based on the first set when the computation results in the first set and the computation results in the target table are inconsistent comprises:
when the first set has at least one more calculation result than the target table, writing the at least one calculation result into the target table to update the target table.
In some embodiments, wherein the first set includes first computation results corresponding to a first change log, the target table includes second computation results corresponding to the first change log; when the calculation results in the first set and the calculation results in the target table are inconsistent, updating the target table based on the first set includes:
And when the first calculation result is inconsistent with the second calculation result, correcting the second calculation result based on the first calculation result so as to update the target table.
In some embodiments, wherein the method further comprises: the initiating an alert when the first set and the target table are inconsistent.
According to a second aspect, there is provided a service data processing apparatus, the apparatus comprising:
the first acquisition unit is configured to acquire a plurality of analysis results from a first middleware by an offline computing platform, wherein the analysis results are generated by analyzing a service data change log of a database by the first middleware;
the batch processing unit is configured to perform offline batch processing on the plurality of analysis results to obtain a first set of calculation results;
the second acquisition unit is configured to acquire a target table generated by the real-time computing platform performing online computing on at least one analysis result acquired from the second middleware respectively; the at least one analysis result belongs to an analysis result generated by analyzing the service data change log of the database by the second middleware;
a comparison unit configured to compare the first set and the target table;
An updating unit configured to update the target table based on the first set when the calculation result in the first set and the calculation result in the target table are inconsistent.
According to a third aspect, there is provided a computer readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method of the first aspect.
According to a fourth aspect, there is provided a computing terminal comprising a memory having stored therein executable code and a processor that, when executing the executable code, implements the method of the first aspect.
The method and the device provided by the embodiment of the specification can perform offline calculation and can compare the results of real-time calculation of an offline calculation platform, so that error correction can be performed on the results of real-time calculation, and the accuracy of the results of processing the service data is further ensured under the condition of ensuring the timeliness of the processing of the service data; and through the comparison of the off-line calculation result and the real-time calculation result, abnormal data can be found conveniently, further faults causing data abnormality can be found conveniently, and the data quality can be further improved.
Detailed Description
The scheme provided by the specification is described below with reference to the accompanying drawings.
In many technical scenarios, the service resources are distributed in different databases in different parts of the world through distributed storage, which brings a difficult problem to the unified management of the service resources by the service party. Take the business resources as funds as an example. In order to make an efficient and accurate fund transfer scheme, the amount of funds in the bank accounts of different business branches in different regions needs to be sensed, and the bank deposit data of each business branch needs to be processed.
The bank deposit data is short for bank deposit data. Bank deposits are money deposited at a bank. In one example, the banking data of each business branch of the business party may be funds stored in a respective bank account of each branch of the multinational company.
Each branch company of the multinational company can acquire the banking data of the branch company in the bank through the open specification interface of the branch company and the bank in the intercommunicating agreement range negotiated with the bank. Specifically, for any branch company of the multinational company, the corresponding banking system may send the account change condition to the business system (e.g., irecon, ireserve, reservation, etc.), so that the business system obtains the silver storage data and stores the silver storage data in the database of the business system. It should be noted that, hereinafter, for convenience of description, the database of the business system is simply referred to as a database, and therefore, unless otherwise specified, the database of the business system is referred to hereinafter.
According to one scheme, the middleware corresponding to the offline computing platform can analyze change logs in a log file of the database, and input an analysis result to the offline computing platform according to a specific period (for example, the period of the period may be 24 hours) to perform offline computing, so as to obtain an offline computing result.
According to the scheme, the off-line calculation result is generated in batches according to a specific period, and the requirement of high frequency and high real-time performance of a real-time bill is difficult to meet. In addition, banks corresponding to different branch companies of the multinational company are distributed in different time zones, so that the sending time of the day end file of each branch company is greatly different, and the off-line calculation of the bank data of each branch company in the same batch cannot ensure that the day end bills in different time zones are generated in time.
According to another scheme, when a change log is generated in a log file, middleware corresponding to the real-time computing platform analyzes the change log, inputs the analysis result to the real-time computing platform, and performs real-time computing to obtain a real-time computing result.
The link between the middleware corresponding to the real-time computing platform and the real-time computing platform is unstable, data loss is easy to occur, data omission occurs in real-time computing, and further the real-time computing result is incomplete.
The embodiment of the present specification provides a service data processing method, and an application architecture of the service data processing method may be as shown in fig. 1. After the change logs generated by the database are analyzed by the first middleware, offline batch processing can be performed by the offline computing platform; after being analyzed by the second middleware, the real-time computing platform can perform online processing.
The service data processing method provided by the embodiment of the specification performs online calculation on the change log by using the real-time calculation platform, so that related personnel can sense service change in time, and the timeliness of the processing result of the change log is ensured.
The method can compare the off-line calculation result with the real-time calculation result of the off-line calculation platform, thereby correcting the real-time calculation result and ensuring the accuracy of the service data processing result; and through the comparison of the off-line calculation result and the real-time calculation result, abnormal data can be found conveniently, further faults causing data abnormality can be found conveniently, and the data quality can be further improved.
Next, referring to fig. 2, a service data processing method provided in an embodiment of the present specification is specifically described. The method may be performed by any apparatus, device, platform, cluster of devices having computing and processing capabilities. As shown in fig. 2, the method comprises the steps of: 200, an offline computing platform acquires a plurality of analysis results from a first middleware, wherein the analysis results are generated by analyzing a service data change log of a database by the first middleware; step 202, performing offline batch processing on the plurality of analysis results to obtain a first set of calculation results; step 204, acquiring a target table generated by respectively performing online calculation on at least one analysis result acquired from the second middleware by the real-time calculation platform; the at least one analysis result is generated by analyzing the service data change log of the database by the second middleware; step 206, comparing the first set with the target table; and step 208, when the calculation result in the first set is inconsistent with the calculation result in the target table, updating the target table based on the first set.
Next, each step described above will be specifically described with reference to specific examples.
First, in step 200, the offline computing platform obtains a plurality of analysis results from a first middleware, where the plurality of analysis results are generated by the first middleware analyzing a service data change log of a database.
The database may be mysql, oracle, oceanbase, etc. When the service data recorded in the database changes, for example, service data is added, deleted, and modified, a corresponding change log is generated. The change log is recorded in a log file (e.g., binlog). For any database, the corresponding first middleware can analyze the change log to obtain an analysis result.
In some embodiments, the business data is silver deposit data.
For convenience of description, unless otherwise specified, the service data change log is simply referred to as a change log, and when the service data is the silver storage data, the change log refers to the silver storage data change log.
In some embodiments, the first middleware may be any one of canal, databus, keylet, otter, etc.
At the starting time (e.g., 0 point 0 minute 0 second every day) of a preset period (e.g., one period every 24 hours), the analysis results of the plurality of first middleware may be imported to the offline computing platform (e.g., OPDS) at the same time.
In some embodiments, the plurality of analysis results are the results of analyzing all the change logs of the database by the first middleware, that is, all the analysis results of the first middleware may be imported to the offline computing platform, in other words, the analysis results of all the change logs of the database analyzed by the first middleware are imported to the offline computing platform. Therefore, the accuracy of the calculation result obtained by the subsequent off-line batch processing is higher.
In some embodiments, the plurality of analysis results are results obtained by analyzing, by the first middleware, change logs of the database in a preset period, that is, the analysis results of the change logs of the first middleware analysis database in the preset period may be imported to the offline computing platform. Thereby saving computing resources in subsequent offline batch processing.
Next, in step 202, the multiple analysis results are processed in an offline batch manner, so as to obtain a first set of calculation results.
The offline computing platform can perform offline batch processing on the analysis results, and the computing results corresponding to the analysis results form a first set of computing results.
In some embodiments, each parsing result may correspond to one calculation result each.
In some embodiments, the offline computing platform may cluster a plurality of computing results corresponding to the plurality of parsing results, and use a cluster obtained by clustering as a constituent element of the first set.
In some embodiments, the offline computing platform may cluster the plurality of parsing results, then perform offline batch processing on the clustered clusters, and use the obtained computing results as the constituent elements of the first set.
In step 204, a target table generated by the real-time computing platform performing online computing on at least one analysis result obtained from the second middleware is obtained; the at least one analysis result is generated by analyzing the service data change log of the database by the second middleware.
The real-time computing platform may be a kepler (kepler) real-time computing platform.
In some embodiments, the second middleware is drc (data Replication center), which may provide the real-time computing platform with the parsed results of the real-time incremental change log. drc may support real-time synchronization of homogeneous or heterogeneous databases, real-time data record change subscription services, and the like.
In step 204, each time a service data change occurs in the database, a change log is generated, and the second middleware corresponding to the database can analyze the change log; and immediately importing the analysis result into a real-time computing platform for online computing to obtain a real-time computing result, and writing the real-time computing result into a target table.
The off-line computing platform obtains the target table from the real-time computing platform at a starting time (e.g., 0 o' clock, 0 min, 0 sec every day) of a preset period (e.g., one period every 24 hours).
It should be noted that, due to the nature of online computing of real-time computing platforms, it requires middleware to be able to deliver data to it in a timely manner. While offline computing platforms perform offline batch processing, which has lower requirements for middleware. The second middleware is specially arranged and is used for transmitting data to the real-time computing platform in time and meeting the online computing characteristics of the real-time computing platform, and the first middleware can be universally arranged, mature and stable.
In general, the second middleware is unstable and is prone to lose data compared to the first middleware. And the link between the real-time computing platform and the second middleware is more complex than the link between the offline computing platform and the first middleware, and data is easily lost. In addition, the real-time computing platform only acquires the analysis result of the change log generated by the database on-line on the real-time computing platform from the second middleware to perform on-line computation, and does not process the analysis result of the change log before on-line.
For the above reasons, not all the analysis results after the second middleware analysis can be imported into the real-time computing platform. Therefore, the at least one analysis result is an analysis result of the second middleware successfully imported to the real-time computing platform, and belongs to an analysis result of the change log by the second middleware, and in most cases, the analysis result is not equal to the analysis result of the change log by the second middleware.
In some embodiments, when the plurality of analysis results are results obtained by analyzing all change logs of the database by the first middleware, the at least one analysis result belongs to an analysis result obtained by analyzing, by the second middleware, change logs generated after the database is online on the real-time computing platform.
In some embodiments, when the plurality of analysis results are results of analyzing, by the first middleware, change logs of the database in a preset period, the at least one analysis result belongs to an analysis result generated by analyzing, by the second middleware, the change logs of the database in the preset period.
Next, in step 206, the first set is compared to the target table.
As described above, the elements in the first set are calculation results obtained by performing offline batch processing on the offline calculation platform, and may be referred to as offline calculation results for short. The calculation result in the target table is a calculation result obtained by performing online processing on the real-time calculation platform, and may be referred to as a real-time calculation result for short.
In some embodiments, in step 206, a one-to-one comparison between the offline calculation results in the first set and the real-time calculation results in the target table may be performed to compare the real-time calculation results in the target table to the offline calculation results in the first set. It is easily understood that for any change log, it has a primary key (primary key). Accordingly, both the offline calculation result and the real-time calculation result of the change log have the primary key. Therefore, the change log, the off-line calculation result and the real-time calculation result can be associated with each other by the primary key.
In some embodiments, in step 206, the offline computation results and the real-time computation results corresponding to the same change log may be compared for consistency.
Then, in step 208, when the computation results in the first set and the computation results in the target table are inconsistent, the target table is updated based on the first set.
In some embodiments, after performing the one-to-one comparison between the offline calculation results in the first set and the real-time calculation results in the target table, if some of the offline results do not correspond to the real-time calculation results, which indicates that the change logs corresponding to the offline results are not analyzed by the second middleware or the analysis results are lost, the offline results may be written into the target table to update the target table.
In the embodiments, by comparing the offline calculation result in the first set with the real-time calculation result in the target table, it can be found whether data omission occurs, and further, a measure for the data omission phenomenon can be taken, so as to improve the subsequent data quality of the real-time calculation platform.
In some embodiments, if the offline calculation result and the real-time calculation result corresponding to the same change log are not consistent, the real-time calculation result is modified based on the offline calculation result to update the target table.
As described above, the second middleware is unstable compared to the first middleware, which may result in inaccurate analysis results and inaccurate real-time calculation results. Therefore, when the off-line calculation result is inconsistent with the real-time calculation result, the real-time calculation result is corrected by using the off-line calculation result.
In the embodiments, whether the off-line calculation result is consistent with the real-time calculation result or not is compared, so that whether the real-time calculation result is accurate or not can be determined, and further, a measure aiming at the phenomenon that the calculation result of the real-time calculation platform is inaccurate can be taken, so that the accuracy of the calculation result of the real-time calculation platform is improved.
In some embodiments, wherein the method further comprises: the initiating an alert when the first set and the target table are inconsistent.
In the embodiments, when the calculation result of the offline calculation platform is inconsistent with the calculation result of the real-time calculation platform, an alarm is initiated to remind relevant personnel to take measures such as checking data, modifying a business report generated according to the calculation result of the real-time calculation platform, and the like.
The service data processing method provided by the embodiment of the specification can perform offline calculation and can compare the real-time calculation result of the offline calculation platform, so that the real-time calculation result can be corrected, and the accuracy of the service data processing result is further ensured under the condition of ensuring the timeliness of service data processing; and through the comparison of the offline calculation result and the real-time calculation result, abnormal data can be found conveniently, so that the fault causing data abnormality can be found conveniently, and the data quality can be further improved.
In a second aspect, an embodiment of the present specification provides a service data processing apparatus 300. Referring to fig. 3, the apparatus 300 includes:
a first obtaining unit 310, configured to obtain, by an offline computing platform, multiple analysis results from a first middleware, where the multiple analysis results are generated by analyzing, by the first middleware, a service data change log of a database;
the batch processing unit 320 is configured to perform offline batch processing on the multiple analysis results to obtain a first set of calculation results;
a second obtaining unit 330, configured to obtain a target table generated by the real-time computing platform performing online computation on at least one analysis result obtained from the second middleware; the at least one analysis result belongs to an analysis result generated by analyzing the service data change log of the database by the second middleware;
a comparing unit 340 configured to compare the first set and the target table;
an updating unit 350 configured to update the target table based on the first set when the calculation result in the first set and the calculation result in the target table are inconsistent.
In some embodiments, the business data is silver deposit data.
In some embodiments, the first middleware is any one of canal, databus, keylet, otter;
the second intermediate piece is drc;
the link between the offline computing platform and the first middleware is different from the link between the real-time computing platform and the second middleware.
In some embodiments, the plurality of analysis results are results of analyzing, by the first middleware, change logs of the database in a preset period;
the at least one analysis result belongs to an analysis result generated by analyzing the change log of the database in the preset period by the second middleware.
In some embodiments, the plurality of analysis results are the results of the first middleware analyzing all change logs of the database;
the at least one analysis result belongs to an analysis result generated by analyzing a change log generated after the database is on line on the real-time computing platform by the second middleware.
In some embodiments, the updating unit 350 is configured to write at least one calculation result to the target table to update the target table when the first set has more calculation results than the target table.
In some embodiments, the first set includes first computation results corresponding to a first change log, and the target table includes second computation results corresponding to the first change log; the updating unit 350 is configured to modify the second calculation result based on the first calculation result to update the target table when the first calculation result and the second calculation result are inconsistent.
In some embodiments, the apparatus further comprises an alert unit (not shown) configured to initiate an alert when the first set and the target table are inconsistent.
In another aspect, embodiments of the present specification provide a computer-readable storage medium having a computer program stored thereon, which, when executed in a computer, causes the computer to perform the method shown in fig. 2.
In another aspect, embodiments of the present description provide a computing terminal including a memory and a processor, where the memory stores executable code, and the processor executes the executable code to implement the method shown in fig. 2.
Those skilled in the art will recognize that, in one or more of the examples described above, the functions described in this invention may be implemented in hardware, software, firmware, or any combination thereof. When implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium.
The above-mentioned embodiments, objects, technical solutions and advantages of the present invention are further described in detail, it should be understood that the above-mentioned embodiments are only exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made on the basis of the technical solutions of the present invention should be included in the scope of the present invention.