CN118069455A - Data real-time processing method and system - Google Patents
Data real-time processing method and system Download PDFInfo
- Publication number
- CN118069455A CN118069455A CN202311509777.6A CN202311509777A CN118069455A CN 118069455 A CN118069455 A CN 118069455A CN 202311509777 A CN202311509777 A CN 202311509777A CN 118069455 A CN118069455 A CN 118069455A
- Authority
- CN
- China
- Prior art keywords
- data
- index
- time
- real
- accumulated
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000003672 processing method Methods 0.000 title abstract description 16
- 230000000694 effects Effects 0.000 claims abstract description 70
- 238000012545 processing Methods 0.000 claims abstract description 55
- 238000007781 pre-processing Methods 0.000 claims abstract description 35
- 238000000034 method Methods 0.000 claims abstract description 34
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 claims abstract description 32
- 238000013075 data extraction Methods 0.000 claims abstract description 21
- 238000012544 monitoring process Methods 0.000 claims description 23
- 238000009825 accumulation Methods 0.000 claims description 13
- 238000004590 computer program Methods 0.000 claims description 7
- 230000003111 delayed effect Effects 0.000 claims description 7
- 230000001960 triggered effect Effects 0.000 claims description 7
- 230000011218 segmentation Effects 0.000 claims description 6
- 230000009471 action Effects 0.000 claims description 4
- 238000012216 screening Methods 0.000 claims description 4
- 230000008439 repair process Effects 0.000 abstract description 5
- 230000001186 cumulative effect Effects 0.000 description 12
- 230000002159 abnormal effect Effects 0.000 description 11
- 238000010586 diagram Methods 0.000 description 6
- 230000008569 process Effects 0.000 description 4
- 238000004458 analytical method Methods 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 230000008447 perception Effects 0.000 description 3
- 238000012795 verification Methods 0.000 description 3
- 230000003044 adaptive effect Effects 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 238000007405 data analysis Methods 0.000 description 2
- 238000001914 filtration Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 238000011084 recovery Methods 0.000 description 2
- 241000283690 Bos taurus Species 0.000 description 1
- 230000002776 aggregation Effects 0.000 description 1
- 238000004220 aggregation Methods 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000006837 decompression Effects 0.000 description 1
- 230000009191 jumping Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/3003—Monitoring arrangements specially adapted to the computing system or computing system component being monitored
- G06F11/302—Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system component is a software system
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/14—Error detection or correction of the data by redundancy in operation
- G06F11/1402—Saving, restoring, recovering or retrying
- G06F11/1446—Point-in-time backing up or restoration of persistent data
- G06F11/1448—Management of the data involved in backup or backup restore
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/3065—Monitoring arrangements determined by the means or processing involved in reporting the monitored data
- G06F11/3072—Monitoring arrangements determined by the means or processing involved in reporting the monitored data where the reporting involves data filtering, e.g. pattern matching, time or event triggered, adaptive or policy-based reporting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2455—Query execution
- G06F16/24553—Query execution of query operations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/54—Interprogram communication
- G06F9/546—Message passing systems or structures, e.g. queues
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Quality & Reliability (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Computational Linguistics (AREA)
- Mathematical Physics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention relates to a data real-time processing method and a data real-time processing system. The method comprises the following steps: a data extraction step of directly extracting the stream data from the service database; a data preprocessing step of preprocessing the running water data extracted in the data extraction step and outputting the preprocessed running water data; and an index statistics step, namely dividing and grouping the preprocessed stream data according to the activity types and the time levels, respectively carrying out index value statistics on the stream data divided and grouped, and calculating index accumulated values of the time levels of the activity types through a sliding window. According to the invention, real-time index statistics and self-adaptive repair can be realized.
Description
Technical Field
The invention relates to a computer technology for processing transaction flow data, in particular to a data real-time processing method and a data real-time processing system.
Background
The following problems are mainly existed in the prior art for processing the flow data:
(1) In the prior art, the method is generally mainly suitable for carrying out statistics checking on whether an abnormal condition exists in a single running water, and has poor statistics compatibility on indexes which need aggregation processing in a time period. Moreover, generally, since no storage unit is provided, only transaction indexes of a single user in a short time range can be counted, and indexes of the same time range as the year or month after the activity is carried out cannot be counted;
(2) In the prior art, the real-time performance of statistics is poor because service transaction flow data are required to be transmitted to message middleware such as kafka, and the flow is required to be actively pushed by a service system, and operations such as deformation collection and the like are required to be carried out if a certain index relates to a plurality of systems or different types of flow, so that time and labor are consumed;
(3) Normally, the situation that abnormal rollback is required for indexes which are not compatible with stock history statistics is not considered, the records in kafka are normally only stored for a few days due to the limitation of disk space, and when the statistical caliber is in a problem or is adjusted, the existing technology cannot fall back to a designated time point, so that updating and repairing of the stock history statistics data cannot be carried out, and errors occur in the continuously accumulated indexes;
(4) The interference of late running water to a single day index cannot be processed generally, the transaction system often fluctuates, even the extreme situation is abnormal, so that the transaction running water is delayed to arrive, and thus when the transaction data of the daily dimension are counted, the transaction running water which should be counted in the previous day can exist, and the transaction running water is accumulated to the next day at the late time, so that the deviation of the actual transaction index of each day is caused.
Disclosure of Invention
In order to solve the above problems in the prior art, the present invention aims to provide a data real-time processing method and a data real-time processing system capable of performing real-time index statistics.
Furthermore, the invention also aims to provide a data real-time processing method and a data real-time processing system capable of performing self-adaptive repair.
The data real-time processing method of one aspect of the invention comprises the following steps:
A data extraction step of directly extracting the stream data from the service database;
a data preprocessing step of preprocessing the running water data extracted in the data extraction step and outputting the preprocessed running water data; and
And an index statistics step, namely dividing and grouping the preprocessed stream data according to the activity types and the time levels, respectively carrying out index value statistics on the stream data divided and grouped, and calculating index accumulated values of the time levels of the activity types through a sliding window.
Optionally, the method further comprises:
and an index storage step of storing the index accumulated values of different days for the index accumulated values of the time level obtained in the index statistics step.
Optionally, the method further comprises:
a data backup step, namely backing up the preprocessed stream data output by the data preprocessing step to obtain backup data; and
And an index playback step of extracting the backup data of the appointed playback time period from the backup data, and counting index statistics values of time levels of the backup data of the appointed playback time period to obtain updated index accumulated values of the time levels.
Optionally, in the index storing step, the updated index accumulated value obtained in the index playback step is stored instead of the time-level index accumulated value obtained in the index counting step.
Optionally, the method further comprises:
and an index monitoring step of comparing the index accumulated value or the updated index accumulated value with a preset first threshold value and triggering an alarm when the first threshold value is exceeded.
Optionally, the method further comprises:
and an index monitoring step of accumulating the index accumulated value or the updated index accumulated value in a preset specified time period to obtain an accumulated index statistical value, comparing the accumulated index statistical value with a preset second threshold value, and triggering an alarm when the second threshold value is exceeded.
Optionally, in the data extraction step, the stream data is obtained in real time by incrementally reading a binary log file from a service database.
Optionally, in the step of preprocessing the data, the preprocessing includes screening the pipeline data according to operation identifiers to screen out the pipeline data which is in a successful state and accords with the index statistical category.
Optionally, the index statistics step includes:
performing time-level segmentation grouping on the preprocessed stream data according to the activity type;
dividing sliding windows for the flow data after dividing the grouping;
When the sliding window is triggered, statistics is carried out on the flow data according to the activity types and the time levels respectively to obtain index statistics values of the current window of the time levels of different activity types, and the index statistics values of the current window and the index statistics values accumulated before the current window are added to obtain index accumulation values of the time levels of the activity types.
Optionally, the time scale is a day.
Optionally, the preprocessed pipeline data includes delayed pipeline data that is one day later.
The data real-time processing system of one aspect of the invention comprises:
the data extraction unit is used for directly extracting the stream data from the service database;
The data preprocessing unit is used for preprocessing the running water data extracted by the data extraction unit and outputting the preprocessed running water data; and
And the index statistics unit is used for dividing and grouping the preprocessed stream data according to the activity types and the time levels, respectively carrying out index value statistics on the stream data divided and grouped, and calculating the accumulated index statistics value of the time level of each activity type through a sliding window.
Optionally, the method further comprises:
and the index storage unit is used for respectively storing the index accumulated values of different days for the index accumulated values of the time level obtained in the index statistics unit.
Optionally, the method further comprises:
The data backup unit is used for backing up the preprocessed stream data output by the data preprocessing unit and obtaining backup data; and
And the index playback unit is used for extracting the backup data of the appointed playback time period from the backup data, counting index statistics values of time levels of the backup data of the appointed playback time period, and obtaining updated accumulated index statistics values of the time levels.
Optionally, in the indicator storage unit, the updated accumulated indicator statistics obtained in the indicator playback unit are stored instead of the accumulated indicator statistics of the temporal level obtained in the indicator statistics unit.
Optionally, the method further comprises:
And the index monitoring unit is used for comparing the index accumulated value or the updated accumulated index statistical value with a preset first threshold value and triggering an alarm when the first threshold value is exceeded.
Optionally, the method further comprises:
And the index monitoring unit is used for accumulating the index accumulated value or the updated accumulated index statistical value in a preset specified time period, comparing the accumulated index statistical value with a preset second threshold value and triggering an alarm when the second threshold value is exceeded.
Optionally, in the data extraction unit, the stream data is obtained in real time by incrementally reading a binary log file from a service database.
Optionally, in the data preprocessing unit, the preprocessing includes screening the pipeline data according to operation identifiers to screen out the pipeline data which is in a successful state and accords with the index statistical category.
Optionally, the index statistics unit performs the following actions:
performing time-level segmentation grouping on the preprocessed stream data according to the activity type;
dividing sliding windows for the flow data after dividing the grouping;
When the sliding window is triggered, statistics is carried out on the flow data according to the activity types and the time levels respectively to obtain index statistics values of the current window of the time levels of different activity types, and the index statistics values of the current window and the index statistics values accumulated before the current window are added to obtain accumulated index statistics values of the time levels of the activity types.
Optionally, the time scale is a day.
Optionally, the preprocessed pipeline data includes delayed pipeline data that is one day later.
A computer readable medium of an aspect of the invention has stored thereon a computer program which, when executed by a processor, performs said data real-time processing.
The computer equipment comprises a storage module, a processor and a computer program which is stored on the storage module and can run on the processor, wherein the processor realizes the real-time data processing when executing the computer program.
Drawings
These and other objects and advantages of the present application will become more fully apparent from the following detailed description taken in conjunction with the accompanying drawings, in which like or similar elements are designated by like reference numerals.
Fig. 1 is a flow chart illustrating a method for processing data in real time according to an embodiment of the present invention.
Fig. 2 is a block diagram showing the structure of a data real-time processing system according to a specific example of the present invention.
Fig. 3 is a process flow diagram showing one example of the index statistics unit 300.
Fig. 4 is a process flow diagram showing one example of the index playback unit 600.
Detailed Description
The following presents a simplified summary of the invention in order to provide a basic understanding of the invention. It is not intended to identify key or critical elements of the invention or to delineate the scope of the invention.
For the purposes of brevity and explanation, the principles of the present invention are described herein primarily with reference to exemplary embodiments thereof. Those skilled in the art will readily recognize that the same principles are equally applicable to all types of data real-time processing methods and data real-time processing systems, and that these same principles may be implemented therein, and that any such variations do not depart from the true spirit and scope of the present patent application.
Also, in the following description, reference is made to the accompanying drawings that illustrate specific exemplary embodiments. Electrical, mechanical, logical and structural changes may be made to these embodiments without departing from the spirit and scope of the present invention. Furthermore, while a feature of the invention may have been disclosed with respect to only one of several implementations/embodiments, such feature may be combined with one or more other features of the other implementations/embodiments, as may be desired and/or advantageous for any given or identifiable function. The following description is, therefore, not to be taken in a limiting sense, and the scope of the present invention is defined by the appended claims and their equivalents.
Terms such as "comprising" and "including" mean that the technical solution of the present invention does not exclude the presence of other elements (modules) and steps than those directly and explicitly described in the description and claims.
Aiming at the problems, the invention provides a data real-time processing method and a data real-time processing system capable of realizing self-adaptive restoration. In the standard real-time statistical verification method and the data real-time processing system, based on the distributed stream type data processing engine, in a real-time job task, the change of the data source record is captured by directly reading the binary log file for recording the change of the data source, so that message middleware such as kafka and the like is not needed.
Moreover, by preprocessing the streaming data acquired from multiple sources (e.g., multiple business databases) of the data source, the streaming data supporting the update state can be adapted.
Furthermore, an index time-level segmentation mode is provided, accumulated data after the activity is developed is divided into transaction indexes including but not limited to daily or custom time periods according to the running transaction time, and finally, an accurate index value after the activity is started is obtained by accumulating the segmented index data, so that when the index statistics of a certain day is wrong or the transaction running time of a certain time period is wrong, a task of a specified time period only needs to be re-run, and the limitation that when the statistical caliber needs to be adjusted, a real-time task cannot play back historical data is solved.
Finally, by means of multisource joint processing of the distributed engine (configuration data is obtained from a configuration database), index statistics and monitoring and alarming are combined into a whole, task flow is simplified, and timeliness of alarming is improved.
Fig. 1 is a flow chart illustrating a method for processing data in real time according to an embodiment of the present invention.
As shown in fig. 1, the method for processing data in real time according to an embodiment of the present invention includes the following steps:
data extraction step S100: directly extracting stream data from a service database, for example, reading a binary log file from the service database for storing the stream data in an incremental way so as to obtain real-time stream data, wherein one or more service databases can be used as the service database, and transaction stream data of various transactions are recorded in the service database;
Data preprocessing step S200: preprocessing the running water data extracted in the data extraction step S100 and outputting the preprocessed running water data, for example, one example of preprocessing, the running water data may be given different operation identifiers, and the successful state and the running water data conforming to the index statistics category may be screened according to the operation identifiers;
Index statistics step S300: dividing and grouping the preprocessed stream data according to the activity types and time levels, respectively carrying out index value statistics on the stream data divided and grouped, calculating index accumulated values of the time levels of the activity types through a sliding window, and jumping to an index storage step S600;
Data backup step S400: on the other hand, after the data preprocessing step S200, the preprocessed stream data output by the data preprocessing step S300 is backed up to obtain backup data;
Index playback step S500: when errors occur in service statistics caliber or stream data is abnormal, extracting backup data of a designated playback time period from the backup data obtained in the data backup step S400, and counting index statistics values of time levels of the backup data of the designated playback time period to obtain updated index accumulated values of the time levels;
Index storage step S600: on the one hand, storing the index accumulated value of the time level of each activity type obtained in the index statistics step S300, or on the other hand, when the service statistics caliber is wrong or the flow data is abnormal, storing the updated index accumulated value of the time level obtained in the index playback step S500 instead of the index accumulated value of the time level obtained in the index statistics step S300;
index monitoring step S700: comparing the stored index accumulated value of the time level of each activity category obtained in the index statistics step S300 or the updated index accumulated value of the time level obtained in the index playback step S500 with a pre-stored threshold value, and triggering an alarm if the threshold value is exceeded.
In the above technical solution, the real-time index statistics of the present invention can be implemented by using the data extraction step S100, the data preprocessing step S200, and the index statistics step S300, and the adaptive repair of the present invention can be implemented by further adding the data backup step S400, the index playback step S500, and the index storage step S600, and the timely alarm can be implemented by further adding the index monitoring step S700.
In the above technical solution, as a preferable example, the time level is a day, and the following description will also take "day" as an example of the time level.
In the above technical solution, as a preferable example, the index statistics step S300 includes (not illustrated):
Performing time-level segmentation grouping on the preprocessed streaming data according to the activity type, wherein the preprocessed streaming data can further comprise delay streaming data, and the delay streaming data refers to delay streaming data which is delayed to one day;
dividing sliding windows for the flow data after dividing the grouping;
When the sliding window is triggered, statistics is carried out on the flow data according to the activity types and the time levels respectively to obtain index statistics values of the current window of the time levels of different activity types, and the index statistics values of the current window and the index statistics values accumulated before the current window are added to obtain index accumulation values of the time levels of the activity types.
In the above-described aspect, as a preferable example, in the index monitoring step, the comparison may be performed on a daily basis or may be performed on an integrated basis, wherein the integrated basis is to sum the index integrated values for a predetermined time to obtain an integrated index statistic value and compare the integrated index statistic value with a preset threshold value.
According to the technical scheme, the service indexes of each dimension obtained by accumulating a large amount of running water can be counted, the method is not limited to counting abnormal transactions such as overtime-processed service payment sheets or cattle transactions, and the method is also applicable to various indexes of normal transactions, such as accumulated transaction amount, accumulated preferential amount and the like of each activity, and the coverage range of the indexes is wide and the applicability is strong.
According to the technical scheme, message middleware such as kafka is not needed, the intermediate transmission process is reduced, the real-time performance of statistics is high, the statistics of a plurality of data sources is summarized, and personalized processing is supported.
According to the technical scheme, the condition that the stock history statistics index is abnormal and needs to be retracted can be compatible, and when the port diameter is wrong or the transaction flow is abnormal in a certain time period, the data of the stock history statistics can be updated and repaired without perception in a specified time period.
According to the technical scheme, the situation that data such as late arrival flow on business is delayed can be compatible. Especially, when the trade index of daily dimension is counted, the trade index is compatible with the situation of delay to flow in a cross-day way.
The data real-time processing method according to an embodiment of the present invention has been described above, and a data real-time processing system according to a specific example of the present invention will be described below.
Fig. 2 is a block diagram showing the structure of a data real-time processing system according to a specific example of the present invention.
As shown in fig. 2, the adaptive repair index real-time statistics and verification system according to a specific example of the present invention mainly includes: a data extraction unit 100, a data preprocessing unit 200, an index statistics unit 300, a data backup unit 400, an index playback unit 500, an index storage unit 600, and an index monitoring unit 700.
The input to the data extraction unit 100 is a multi-source database comprising one or more business data sources involved in each transaction, only one business database 800 being illustrated in fig. 2 as an example. Wherein, the service data source 800 mainly provides transaction flow data recorded when each service system generates various transactions. The data extraction unit 100 obtains real-time streaming data by incrementally reading the binary log file of the service data source 800.
The input of the data preprocessing unit 200 is the running water data extracted by the data extracting unit 100 in the previous step, and the data preprocessing unit 200 is used for preprocessing the running water data extracted by the data extracting unit 100 and outputting the preprocessed running water data.
Different operation identifiers can be contained in the pipeline data, and specifically include:
During updating, two records are inserted, one record is a record before updating, the operation identifier OP= -U, and the other record is a record after updating, and the operation identifier OP= +U;
When inserted, op=i;
when deleted, op=d.
The data preprocessing unit 200 performs data preprocessing by requiring filtering and deleting operations, and for updating the pipeline, the pipeline of op= -U is converted into a newly added field "Before" in the pipeline of op= +u by recording association, and further necessary filtering can be performed by comparing fields such as transaction states in the pipeline Before and after updating.
The index statistics unit 300 performs index statistics on the screened success status obtained by the pretreatment of the data pretreatment unit 200 and the transaction running water meeting the index statistics category.
Fig. 3 is a process flow diagram showing one example of the index statistics unit 300.
ACTIVITYID in fig. 3 is an activity ID, which indicates the type of activity, date is the date of the transaction, and Value is a Value.
As shown in fig. 3, the index statistics unit 300 performs the following processing flow:
(1) Through the pre-written query statistics SQL, incoming traffic flow data is firstly classified according to activity IDs, specifically, in fig. 3, the traffic flow data with activity id=001 has a flow 1 (the value of flow 1 is a) and a flow 5 (the value of flow 5 is E), the traffic flow data with activity id=002 has a flow 2 (the value of flow 1 is B) and a flow 4 (the value of flow 4 is D), and the traffic flow data with activity id=003 has a flow 3 (the value of flow 3 is C);
(2) Grouping transaction dates using the running data (the transaction dates may be grouped by day or grouped by hour, that is, the division is performed by time level, and here, the example of the division is performed by the transaction date, that is, "day"), specifically, in fig. 3, running water 1 with the transaction date 20230101 and running water 5 with the transaction date 20230102 are included in the business running data of activity id=001, running water 2 with the transaction date 20230102 and running water 4 are included in the business running data of activity id=002, and running water 3 with the transaction date 20230102 is included in the business running data of activity id=003;
(3) The sliding window is divided according to the system processing time in each group of stream data, the window time length can be set to the corresponding size according to the transaction gauge module for different services, such as 3min, 5min, etc., when the window is triggered, the stream data of various activities (namely according to the activity ID) are respectively counted according to the date to obtain an index accumulated value M window of the current window in the transaction date, the index accumulated value M window of the new statistics in the window is added with the index value M date-org stored in the index storage unit 600 and accumulated to the index storage unit 600 according to the day, and the updated index value M date-upd=Mwindow+Mdate-org of the transaction date is obtained.
Here, a method of obtaining the index integrated value during the activity period when the index monitoring unit 700 performs comparison in the integrated manner will be described.
Assuming that the start date of an activity is start_time, the calculation formula of the index accumulated value of the activity during the activity is:
Where n represents the day of the start of the activity.
Specifically, as further described with reference to fig. 3, X in fig. 3 refers to the sum of index values of the preceding days of the real-time date, such as 20221230 for the start date of the event and 20230102 for the designated date, x=m 20221230+M20221231+M20230101, i.e., the sum of index value of 20221230, index value of 2021231 and index value of 20230101, where "M 20221230+M20221231+M20230101" is equivalent to that in the above formula
In fig. 3, E is the cumulative value of the index in the current 20230102 statistics window, corresponding to "M window" described above, E1 is the index currently counted previously in the current day 20230102, corresponding to "M date-org" described above, a and A1 are the late-arrival pipeline data shown in the boxes, and represent that a certain statistics window on the current day 20230102 receives 20230101 late-arrival pipeline data, and the value obtained by counting this part of late-arrival pipeline data is a+a1.
Thus, in fig. 3, the summary index statistic (M-SUM) obtained for the pipeline data with the activity ID of 001 is as follows (the summation calculation of the summary index statistic may be performed in the index monitoring unit 700):
Summarizing index statistics (M-SUM) =index accumulation value (X) of running data accumulated from the start date of the activity to the date of the designation) +index accumulation value (a) of running data in the date-day window of the designation) +index accumulation value (A1) of running data accumulated on the date-day of the designation) +index accumulation value (E) of late running data in the date-day window of the designation) +index accumulation value (E1) of late running data accumulated on the date-day of the designation;
the aggregate index statistic (M-SUM) obtained for the pipeline data with activity class 002 is as follows:
Summarizing index statistics (M-SUM) =index accumulation value (Y) of running data accumulated from the start date of the activity to the specified date) +index accumulation value (b+d) of running data in the window of the day of the specified date) +index accumulation value (B1) of running data accumulated on the day of the specified date;
The aggregate index statistic (M-SUM) obtained for the stream data with activity ID 003 is as follows:
aggregate index statistic (M-SUM) =index cumulative value of running data accumulated from the start date of the activity to the specified date (Z) +index cumulative value of running data in the window of the specified date and day (C) +index cumulative value of running data accumulated on the specified date and day (C1).
As described above, the index statistics unit records the daily index integrated value for each activity ID in the index storage unit 600, and when outputting, the index monitoring unit 700 performs comparison with a preset rule by adding together these index integrated values, as an example. As shown in fig. 3, the late stream data may be further included in the stream data, and even if the late stream data is generated across days, the transaction date in the late stream data is correctly classified into the corresponding statistics group.
The index playback unit 600 inputs the data record acquired by the data extraction unit 100, and provides the data record after being backed up by the data backup unit 500. For example, the data backup unit 500 may backup the running data of the service database of the previous day after the early morning, and generally selects the running data to be backed up at the daily timing after the running data is delayed by the maximum delay time, so that the running data can be contained by the running data, and the integrity of the data can be ensured to the greatest extent. Files backed up in the data backup unit 500 are named according to transaction date, such as Binlog-date1, and may be uniformly stored in a storage medium such as a disk.
Fig. 4 is a process flow diagram showing one example of the index playback unit 600.
The index playback unit 600 is used for processing the service flow data in the case that the statistical caliber is wrong or the caliber is not wrong but the service flow data is abnormal in a certain time interval, and the input of the index playback unit is the flow data backed up in the data backup unit 500. The index playback unit 600 is configured to extract backup data of a specified playback time period from the backup data obtained by the data backup unit 500, count index statistics of time levels of the backup data of the specified playback time period, and obtain an updated accumulated index statistics of time levels.
Specifically, when the service statistics caliber is wrong and needs to be played back, the original statistics caliber is first modified to be the correct caliber, when the flow data is abnormal, the caliber is not required to be modified, then a time period needing to be played back is designated, for example, dateM to dateN, then a playback task is started, a compressed Binlog file (Binlog is called a binary log) with a backup date of dateM to dateN is taken out from the data backup unit 400, after the flow data is read through decompression and analysis, the corresponding statistics caliber is used for accumulation by day, an index value M dateM-new,...,MdateN-new in the updated time period is obtained, and a calculation formula of an index real-time accumulated value M sum after the re-running is as follows:
Msum=Mdate1+Mdate2+MdateM-new+…+MdateN-new+Mdate-now
Where M dateM-new,...,MdateN-new represents the updated index running total for each day over the period dateM to dateN, and M date-now represents the index running total for the current date.
Specifically, as dateM to dateN, "Binlog1-date1" to "Binlog1-date5" are taken out in fig. 4, and in fig. 4, the case where only data date3 of one day is erroneous, that is, "Binlog1-date3" is erroneous, is exemplified, and after these backup data of "Binlog1-date1" to "Binlog1-date5" are decompressed, analyzed, and the running water data are read, they are accumulated by day using the corresponding statistical aperture. In fig. 4, M-dateNow represents the index integrated value of the current date recorded in the index storage unit 600 by the index statistics unit, M-date1, M-date2, M-date4, and M-date5 represent the index integrated values of the days of the non-updated date1, date2, date4, and date5, respectively, and M-date3 represents the index integrated value of the updated date3 (the value thereof is updated from C to C1).
Then, when the above formulas are mapped to fig. 4, mdate, M-date2 in fig. 4 correspond to M date1、Mdate2 in the above formulas, M-data4, M-data5 in fig. 3 correspond to ellipses in the above formulas, and in fig. 4, date3 is updated in value due to playback, M-date3 corresponds to M date3-new in the formulas, where if data date3 representing only one day is wrong in fig. 4, if there are errors for a plurality of days, for example, date4 is wrong, then M date4-new will appear similarly to the above formulas after repair.
In this way, if the index values in the time periods from date1 to date5 updated by the index playback unit 600 are A, B, C, D, E, respectively, and recorded in the index storage unit 600, the index monitoring unit 700 may sum these values to obtain a+b+c1+d+e, and further add the current index integrated value X on this basis, so that a real-time sum value of the index integrated values, that is, x+a+b+c1+d+e, can be obtained.
According to fig. 4 and the above formula, it can be seen from the formula that the real-time data is accumulated by daily level indexes, and when the index value in a certain time period is wrong, only the data needs to be updated, the real-time statistics task does not need to be stopped for a long time, and the real-time index data in the index database does not need to be deleted, so that the wrong data can be repaired without perception.
The input of the index storage unit 600 is an index integrated value counted by the index counting unit 300 and an updated day-level index integrated value counted and updated by the index playback unit 600. Index store 600 may generally be selected from, but is not limited to, a Mysql database that persists statistical index data to the database by day level.
The configuration data source 900 primarily provides configuration information including activities such as activity funds, various types of preference rules, and the like.
The inputs of the index monitoring unit 700 are an index integrated value of the day level counted by the index counting unit 300 recorded in the index storage unit 600 and an updated index integrated value of the day level obtained after the statistics update by the index playback unit 600, and activity configuration data (e.g., activity rules, etc.) provided by the configuration database 900. The index monitoring unit 700 triggers the monitoring action after the index corresponding to a certain activity is updated.
The monitoring action herein may include: (1) For monitoring the index cumulative value of a single day, specifically, comparing the index cumulative value of a day level recorded in the index storage unit 600 or the index cumulative value of an updated day level obtained by statistics and updating of the index playback unit 600 with a first threshold value preset in the activity configuration data 900, and triggering an alarm if the first threshold value is exceeded; (2) For monitoring the cumulative value of the daily index cumulative value in the specified time period, specifically, the cumulative value of the day level index recorded in the index storage unit 600 in the specified time period or the updated day level index cumulative value obtained after the statistics and updating by the index playback unit 600 is summed up to obtain a summary index cumulative value, which is compared with a second threshold value preset in the activity configuration data 900, and if the second threshold value is exceeded, an alarm is triggered.
As described above, according to the data real-time processing method and the data real-time processing system of the present invention, the time-level slicing method of the index is proposed in the real-time index statistics, the index data is divided into the transaction indexes including, but not limited to, "days" or "custom time periods" according to the transaction time stamp of the flowing water, so that the statistics method of the accumulated indexes is accumulated continuously after the start of the activity and optimized to the index accumulation in each time period, and the situation that the data delay arrives late in the flowing water or the like can be compatible.
Further, in the real-time index statistics, considering the problem that data needs to run again under abnormal conditions, an index recovery unit and an index recovery step are added, so that the time-level index value in the error time period can be recovered without perception, and self-adaptive restoration is realized, and therefore, the limitation that when the statistical caliber needs to be adjusted or the service data in a certain time period is wrong, the real-time statistical task cannot play back the historical data can be solved.
The data real-time processing method and the data real-time processing system of the present invention can be applied to the following scenarios, for example:
(1) Large banks and credit card centers: the data real-time processing method and the data real-time processing system of the invention can be applied to the statistics and analysis of marketing data in the large bank and each credit card center, and the real-time statistics and verification technology of the invention is utilized to monitor and evaluate marketing activities in real time and make corresponding decisions and optimizations;
(2) Digital marketing company: the data real-time processing method and the data real-time processing system can better utilize the stream data and provide a more accurate marketing strategy based on the stream data;
(3) Data analysis and business intelligence team: the data real-time processing method and the data real-time processing system can be utilized to optimize the data analysis and service intelligent capability;
(4) Marketing solution provider: the data real-time processing method and the data real-time processing system can be utilized to provide more powerful and real-time data statistics and analysis functions for clients.
The above is merely an embodiment of the present application, but the scope of the present application is not limited thereto. Other possible variations or substitutions will occur to those skilled in the art from the teachings disclosed herein and are intended to be within the scope of the present application. The embodiments of the present application and features in the embodiments may also be combined with each other without conflict. The protection scope of the present application is subject to the claims.
Claims (24)
1. A method for real-time processing of data, comprising:
A data extraction step of directly extracting the stream data from the service database;
a data preprocessing step of preprocessing the running water data extracted in the data extraction step and outputting the preprocessed running water data; and
And an index statistics step, namely dividing and grouping the preprocessed stream data according to the activity types and the time levels, respectively carrying out index value statistics on the stream data divided and grouped, and calculating index accumulated values of the time levels of the activity types through a sliding window.
2. The method for processing data in real time according to claim 1, further comprising:
and an index storage step of storing the index accumulated values of different days for the index accumulated values of the time level obtained in the index statistics step.
3. The method for processing data in real time according to claim 2, further comprising:
a data backup step, namely backing up the preprocessed stream data output by the data preprocessing step to obtain backup data; and
And an index playback step of extracting the backup data of the appointed playback time period from the backup data, and counting index statistics values of time levels of the backup data of the appointed playback time period to obtain updated index accumulated values of the time levels.
4. The method for real-time processing of data according to claim 3, wherein,
And in the index storage step, storing the updated index accumulated value obtained in the index playback step in place of the time-level index accumulated value obtained in the index statistics step.
5. The method for processing data in real time according to claim 4, further comprising:
and an index monitoring step of comparing the index accumulated value or the updated index accumulated value with a preset first threshold value and triggering an alarm when the first threshold value is exceeded.
6. The method for processing data in real time according to claim 4, further comprising:
and an index monitoring step of accumulating the index accumulated value or the updated index accumulated value in a preset specified time period to obtain an accumulated index statistical value, comparing the accumulated index statistical value with a preset second threshold value, and triggering an alarm when the second threshold value is exceeded.
7. The method for real-time processing of data according to claim 1, wherein,
In the data extraction step, the stream data is obtained in real time by incrementally reading the binary log file from the service database.
8. The method for real-time processing of data according to claim 1, wherein,
In the data preprocessing step, the preprocessing comprises screening the flow data according to operation identifiers so as to screen the flow data which is in a successful state and accords with an index statistical category.
9. The method for real-time processing of data according to claim 1, wherein the index statistics step includes:
performing time-level segmentation grouping on the preprocessed stream data according to the activity type;
dividing sliding windows for the flow data after dividing the grouping;
When the sliding window is triggered, statistics is carried out on the flow data according to the activity types and the time levels respectively to obtain index statistics values of the current window of the time levels of different activity types, and the index statistics values of the current window and the index statistics values accumulated before the current window are added to obtain index accumulation values of the time levels of the activity types.
10. A method for real-time processing of data according to any one of claims 1 to 8, wherein,
The time scale is days.
11. The method for real-time processing of data according to claim 9, wherein the pre-processed pipeline data includes one-day-later delay pipeline data.
12. A data real-time processing system, comprising:
the data extraction unit is used for directly extracting the stream data from the service database;
The data preprocessing unit is used for preprocessing the running water data extracted by the data extraction unit and outputting the preprocessed running water data; and
And the index statistics unit is used for dividing and grouping the preprocessed stream data according to the activity types and the time levels, respectively carrying out index value statistics on the stream data divided and grouped, and calculating the accumulated index statistics value of the time level of each activity type through a sliding window.
13. The data real-time processing system of claim 12, further comprising:
and the index storage unit is used for respectively storing the index accumulated values of different days for the index accumulated values of the time level obtained in the index statistics unit.
14. The data real-time processing system of claim 13, further comprising:
The data backup unit is used for backing up the preprocessed stream data output by the data preprocessing unit and obtaining backup data; and
And the index playback unit is used for extracting the backup data of the appointed playback time period from the backup data, counting index statistics values of time levels of the backup data of the appointed playback time period, and obtaining updated accumulated index statistics values of the time levels.
15. The data real-time processing system according to claim 14, wherein,
In the index storage unit, the updated accumulated index statistic obtained in the index playback unit is stored in place of the accumulated index statistic of the time level obtained in the index statistic unit.
16. The data real-time processing system of claim 15, further comprising:
And the index monitoring unit is used for comparing the index accumulated value or the updated accumulated index statistical value with a preset first threshold value and triggering an alarm when the first threshold value is exceeded.
17. The data real-time processing system of claim 15, further comprising:
And the index monitoring unit is used for accumulating the index accumulated value or the updated accumulated index statistical value in a preset specified time period, comparing the accumulated index statistical value with a preset second threshold value and triggering an alarm when the second threshold value is exceeded.
18. The data real-time processing system according to claim 12, wherein,
And in the data extraction unit, the stream data are obtained in real time by incrementally reading the binary log file from the service database.
19. The data real-time processing system according to claim 12, wherein,
And in the data preprocessing unit, the preprocessing comprises screening the flow data according to operation identification so as to screen the flow data which is in a successful state and accords with an index statistical category.
20. The data real-time processing system according to claim 12, wherein the index statistics unit performs the following actions:
performing time-level segmentation grouping on the preprocessed stream data according to the activity type;
dividing sliding windows for the flow data after dividing the grouping;
When the sliding window is triggered, statistics is carried out on the flow data according to the activity types and the time levels respectively to obtain index statistics values of the current window of the time levels of different activity types, and the index statistics values of the current window and the index statistics values accumulated before the current window are added to obtain accumulated index statistics values of the time levels of the activity types.
21. A data real-time processing system according to any one of claims 12 to 20, wherein,
The time scale is days.
22. The data real-time processing system of claim 21, wherein the pre-processed pipeline data includes delayed pipeline data that is one day later.
23. A computer readable medium having a computer program stored thereon, characterized in that,
The computer program, when executed by a processor, implements the real-time processing of data according to any one of claims 1 to 11.
24. A computer device comprising a memory module, a processor and a computer program stored on the memory module and executable on the processor, characterized in that,
The processor, when executing the computer program, implements the real-time processing of data according to any one of claims 1 to 11.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311509777.6A CN118069455A (en) | 2023-11-13 | 2023-11-13 | Data real-time processing method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311509777.6A CN118069455A (en) | 2023-11-13 | 2023-11-13 | Data real-time processing method and system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN118069455A true CN118069455A (en) | 2024-05-24 |
Family
ID=91097999
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311509777.6A Pending CN118069455A (en) | 2023-11-13 | 2023-11-13 | Data real-time processing method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN118069455A (en) |
-
2023
- 2023-11-13 CN CN202311509777.6A patent/CN118069455A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107810500B (en) | Data quality analysis | |
US6912508B1 (en) | Method and apparatus for promoting taxpayer compliance | |
US7613747B1 (en) | Tiered database storage and replication | |
US20090063236A1 (en) | Unidirectionally protected, fully automated asset allocation and asset monitoring apparatuses, and a corresponding method | |
Kvet et al. | Complex time management in databases | |
CN111737335B (en) | Product information integration processing method and device, computer equipment and storage medium | |
CN117573728B (en) | Information dimension-increasing processing method and system for data information | |
US5826104A (en) | Batch program status via tape data set information for dynamically determining the real time status of a batch program running in a main frame computer system | |
CN118069455A (en) | Data real-time processing method and system | |
CN111382028A (en) | Method and device for processing date switching errors of batch processing system and server | |
CN111222928A (en) | Method and system for monitoring enterprise standard invoicing | |
CN108805778B (en) | Electronic device, method for collecting credit investigation data and storage medium | |
CN117520313B (en) | Data backtracking method and device based on multidimensional associated data warehouse slice table | |
CN108280151B (en) | Method and system for monitoring data cleaning quality | |
CN111639057A (en) | Log message processing method and device, computer equipment and storage medium | |
CN113535469B (en) | Switching method and switching system for disaster recovery database | |
CN114637786B (en) | Off-line calculation method for vehicle working hours and storage medium | |
US12056992B2 (en) | Identification of anomalies in an automatic teller machine (ATM) network | |
CN118096011A (en) | Inventory data updating method, updating device, equipment and medium | |
US8583500B2 (en) | Systems and methods for providing computing device counts | |
CN117787981A (en) | Method, equipment and medium for supplementing transaction details on same day | |
CN118158176A (en) | Method and device for collecting and analyzing API call conditions based on multi-tenant Saas gateway | |
CN118761747A (en) | Massive data checking method based on large memory | |
CN114969016A (en) | Data cleaning method and device | |
CN115018601A (en) | Data processing method, device and equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |