WO2023231281A1 - 数据处理方法、装置及电子设备 - Google Patents

数据处理方法、装置及电子设备 Download PDF

Info

Publication number
WO2023231281A1
WO2023231281A1 PCT/CN2022/127575 CN2022127575W WO2023231281A1 WO 2023231281 A1 WO2023231281 A1 WO 2023231281A1 CN 2022127575 W CN2022127575 W CN 2022127575W WO 2023231281 A1 WO2023231281 A1 WO 2023231281A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
time
window
processed
target
Prior art date
Application number
PCT/CN2022/127575
Other languages
English (en)
French (fr)
Inventor
夏柱昌
苗青利
刘建波
Original Assignee
深圳前海微众银行股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳前海微众银行股份有限公司 filed Critical 深圳前海微众银行股份有限公司
Publication of WO2023231281A1 publication Critical patent/WO2023231281A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems

Definitions

  • the present application relates to the field of big data technology, and in particular, to a data processing method, device and electronic equipment.
  • massive data can be processed in batches through a big data processing framework, and then related businesses can be implemented.
  • batch processing massive amounts of data you can first collect the data to be processed corresponding to different businesses, and then process the collected data to be processed in batches after the data to be processed reaches the number of collection days or numbers.
  • this processing method has a long delay, which reduces the real-time nature of data processing. It can only be applied to businesses with low real-time requirements and cannot be applied to businesses with high real-time requirements. It also reduces the applicable business of massive data processing. The scope of the scenario, in turn, affects the user’s application experience.
  • the purpose of this application is to provide a data processing method, device and electronic equipment to improve the real-time nature of data processing.
  • this application discloses a data processing method, including:
  • the data to be processed is divided into different time windows according to the data generation time, and window data corresponding to different time windows is obtained according to the division results, wherein the window data of each time window includes a boundary time, and each The boundary time includes a start time and an end time.
  • the start time is the earliest generation time included in the data to be processed allocated to the time window.
  • the end time is the data to be processed allocated to the time window.
  • the target service is implemented according to the window data corresponding to the different time windows.
  • the data to be processed is divided into different time windows according to the data generation time, and window data corresponding to different time windows is obtained according to the division results, including:
  • historical window data is obtained, wherein the historical window data includes the first end time, and the historical window data is the data corresponding to the time window in which at least one window status is the executed state, so The first end time is the latest end time among the boundary times included in the historical window data;
  • the target data to be processed is divided into time windows corresponding to the number of the window threshold, including :
  • the end time of the target time window corresponds to multiple target to-be-processed data with the same data generation time, then allocate the multiple target to-be-processed data with the same data generation time to the target time. under the window.
  • the realization of the target service based on the window data corresponding to the different time windows includes:
  • the earliest start time included in the window data, and the second end time is the latest end time included in the at least two first target window data;
  • the newly extracted data to be processed is executed synchronously, and the at least two first target window data are updated according to the execution result of the newly extracted data to be processed.
  • Optional also includes:
  • the abnormal data to be processed is executed asynchronously, and the window data corresponding to the abnormal data to be processed is updated according to the processing result of the abnormal data to be processed.
  • the obtaining of exception pending data that meets preset conditions includes:
  • the second target window data corresponding to the time window in which the processing fails is that the number of synchronous processing within the first preset time period is greater than the first preset number threshold, and the number of asynchronous processing is less than the second preset Time window for times threshold;
  • obtaining the abnormal data to be processed that meets the preset conditions includes:
  • obtaining the exception pending data that meets the preset conditions includes:
  • the end time of the delay period is the earliest time of the time window corresponding to the pending state, the processing state, and the processing failure state.
  • obtaining the exception pending data that meets the preset conditions includes:
  • the start time, the fourth end time is the latest end time included in the abnormal window data.
  • this application discloses a data processing device, including:
  • the acquisition module is used to acquire the data to be processed corresponding to the target business in real time, where the data to be processed includes the data generation time;
  • a processing module configured to divide the data to be processed into different time windows according to the data generation time, and obtain window data corresponding to different time windows according to the division results, wherein the window data of each time window includes a Boundary time.
  • Each boundary time includes a start time and an end time. The start time is the earliest generation time included in the data to be processed allocated to the time window. The end time is the time allocated to the time window. The latest generation time included in the data to be processed under;
  • the processing module is also used to implement the target service according to the window data corresponding to the different time windows.
  • this application discloses an electronic device, including: a processor, and a memory communicatively connected to the processor;
  • the memory stores computer execution instructions
  • the processor executes computer execution instructions stored in the memory to implement the data processing method described in the first aspect and various possible designs of the first aspect.
  • the present application discloses a computer-readable storage medium.
  • Computer-executable instructions are stored in the computer-readable storage medium.
  • the processor executes the computer-executable instructions, the above first aspect and the first aspect are implemented. various possible designs for the described data processing methods.
  • the present application discloses a computer program product, including a computer program, which, when executed by a processor, implements the data processing method described in the first aspect and various possible designs of the first aspect.
  • Embodiments of the present application provide a data processing method, device and electronic equipment.
  • the to-be-processed data corresponding to the target business including the data generation time can be obtained in real time, and then the to-be-processed data can be divided into Under different time windows, window data corresponding to different time windows are obtained.
  • the window data of each time window includes a boundary time, and each boundary time includes a start time and an end time.
  • the start time is assigned to The earliest generation time included in the data to be processed under the time window, the end time is the latest generation time included in the data to be processed under the time window allocated to it, and then implemented based on the window data corresponding to different time windows.
  • the target business can achieve the target business by dividing the data to be processed in real time into different time windows according to the data generation time of the data to be processed, and then processing the data to be processed according to the time window, thereby realizing the target business.
  • the acquired data to be processed is stream-processed. It is no longer necessary to wait for the data to be processed to reach the collection days or number before processing in batches. This improves the real-time nature of data processing. It can be applied to businesses with high real-time requirements and improves the efficiency of mass processing.
  • the data can be applied to a range of business scenarios, thereby ensuring the user's application experience.
  • Figure 1 is an application diagram of the existing big data processing framework
  • Figure 2 is a schematic diagram of the architecture of the application system of the data processing method provided by the embodiment of the present application;
  • Figure 3 is a schematic flow chart of the data processing method provided by the embodiment of the present application.
  • Figure 4 is a schematic diagram of the principle of the data processing method provided by the embodiment of the present application.
  • Figure 5 is a schematic structural diagram of a data processing device provided by an embodiment of the present application.
  • FIG. 6 is a schematic diagram of the hardware structure of an electronic device provided by an embodiment of the present application.
  • Traditional big data processing architecture can include stream processing architecture (such as Spark Streaming, Flink, etc.), batch processing architecture (such as Hive, etc.) and stream-batch integrated processing architecture (such as Flink, etc.), involving many components and commonly used relationships Databases (for example, MySQL, etc.) are incompatible and cannot effectively support OLAP (On-Line Analytical Processing) and OLTP (On-Line Transaction Processing) businesses at the same time, especially financial technology businesses. .
  • OLAP On-Line Analytical Processing
  • OLTP On-Line Transaction Processing
  • FIG. 1 is an application diagram of the existing big data processing framework.
  • the existing batch processing method generally adopts the form of T+1 day, that is, the big data processing framework can first wait for T day, After all the data to be processed on day T is imported into the big data processing framework, the import of new data to be processed is stopped, and then all new data to be processed are batch processed on day T+1.
  • this processing method has a long delay, which reduces the real-time nature of data processing. It can only be applied to businesses with low real-time requirements and cannot be applied to businesses with high real-time requirements. It also reduces the applicable business of massive data processing. The scope of the scenario, in turn, affects the user’s application experience.
  • the existing technology can also have hour-level tasks, that is, processing data imported in a previous hour, thereby improving real-time performance, but the delay in data processing is still high.
  • this application divides the data to be processed in real time into different time windows according to the data generation time of the data to be processed, and then processes the data to be processed according to the time window to achieve the target business. It can realize streaming processing of the data to be processed obtained in real time. There is no need to wait for the data to be processed to reach the collection days or number before processing in batches.
  • This improves the real-time nature of data processing and can be applied to applications with high real-time requirements.
  • business which improves the scope of business scenarios that can be applied to massive data (that is, can it be applied to scenarios that do not require high real-time performance, or can it be applied to scenarios that require high real-time performance), thereby ensuring the user's application experience technical effects.
  • Figure 2 is a schematic architectural diagram of the application system of the data processing method provided by the embodiment of the present application.
  • the application system can include: a distributed database, a big data processing framework and a terminal device.
  • the distributed database can The data to be processed is synchronized to the big data processing framework in real time, and the terminal device can stream the data to be processed in the big data processing framework, thereby improving the real-time nature of the data to be processed.
  • the distributed database can use an existing database, and the data to be processed in the database can be data corresponding to different businesses, for example, it can be data corresponding to financial services.
  • the big data processing framework can be TiDB.
  • TiDB is a converged distributed open-to-process database product that supports both online transaction processing and online analytical processing. It has horizontal expansion or contraction, real-time HTAP (that is, mixed OLTP and OLAP business processing at the same time) system) and other functions, and has the characteristics of both relational databases and non-relational databases.
  • DM Data Migrator
  • the terminal device can be a smartphone, personal computer, tablet, server or server cluster and other devices.
  • the big data processing framework can be deployed in independent devices or in terminal devices.
  • Figure 3 is a schematic flowchart of a data processing method provided by an embodiment of the present application.
  • the method of this embodiment can be executed by a terminal device. As shown in Figure 3, the method in this embodiment may include:
  • S301 Obtain the data to be processed corresponding to the target business in real time, where the data to be processed includes the data generation time.
  • the to-be-processed data related to the target service may be obtained first.
  • you can use streaming business data processing that is, obtain the data to be processed corresponding to the target business in real time, and process the obtained data to be processed in real time.
  • the data to be processed can include basic information related to the target business, data generation time, and data status, which is used to identify the current status of the data to be processed.
  • the data status may be a pending status, a processing status, a processing failure status, a processing success status, etc.
  • the default data status is the pending status (for example, it can be represented by 0), and the status of the pending data can be updated later according to the specific processing process.
  • the distributed database corresponding to the target business when obtaining the pending data corresponding to the target business in real time, it can be obtained from the distributed database corresponding to the target business. That is, after each business system generates the data to be processed corresponding to the target business, it can store the data to be processed in a distributed database, and then different distributed databases can synchronize the data to be processed to the big data processing framework for storage, and the terminal device
  • the data to be processed can be obtained from the big data processing framework in sequence according to the data generation time contained in the data to be processed, and the data to be processed can be executed to achieve the target business.
  • S302 Divide the data to be processed into different time windows according to the data generation time, and obtain window data corresponding to different time windows according to the division results.
  • the window data of each time window includes a boundary time, and each boundary time It includes a start time and an end time.
  • the start time is the earliest generation time included in the data to be processed allocated to the time window.
  • the end time is the earliest generation time included in the data to be processed allocated to the time window. latest generation time.
  • the data to be processed can be divided into different time windows according to the data generation time, and window data corresponding to different time windows can be obtained. Among them, the number of data to be processed corresponding to each time window can be customized and set according to the actual application scenario.
  • dividing the data to be processed into different time windows according to the data generation time, and obtaining window data corresponding to different time windows according to the division results may specifically include:
  • historical window data is obtained, wherein the historical window data includes the first end time, and the historical window data is the data corresponding to the time window in which at least one window status is the executed state, so The first end time is the latest end time among the boundary times included in the historical window data.
  • the data to be processed is extracted according to the first end time and the target time to obtain initial target data to be processed, wherein the target time is determined based on the current time and the first preset delay length.
  • Target data to be processed corresponding to the number of targets is extracted from the initial target data to be processed in sequence according to data generation time, where the number of targets is the product of a preset number threshold and a preset time window threshold.
  • the historical window data is the data corresponding to the time window in which at least one window status is the executed status.
  • Each time window in the executed status corresponds to a boundary time.
  • Each boundary time may include a start time (Timestamp_From) and an end time. (Timestamp_Till), the start time is the data generation time of the first data in the data corresponding to the window, and the end time is the data generation time of the last data in the data corresponding to the window.
  • the first end time is the latest end time among the boundary times included in the historical window data, that is, the data generation time of the last data. After obtaining the data partition permission lock, you can obtain the first end time, and then set the first end time as the start time of the new time window.
  • the start time of the first time window If it is the first time to divide the window, you can set the start time of the first time window. Set to 0. Then, the data to be processed can be extracted and processed according to the first end time and the subsequently determined target time to obtain the initial target data to be processed, wherein the target time can be determined by difference processing between the current time and the first preset delay length.
  • the first preset delay length can be customized according to the actual application scenario. Optionally, the first preset delay length can be determined based on the amount of initial target data to be processed in the current period.
  • the first preset delay duration can be set smaller, or the first preset delay duration can be set to 0; if the initial target data to be processed in the current period is If the amount of data to be processed is large, the first preset delay time can be set larger, and the time period corresponding to the data to be processed is shorter. Therefore, the amount of data to be processed each time can be appropriately reduced, thereby reducing the Reduce the pressure on terminal devices to process data.
  • all the initial target data to be processed whose generation time satisfies the time period from the first end time to the target time can be divided into different times. under the window. It is also possible to select part of the data to be processed from all the initial target data to be processed that meets the conditions according to the actual processing capability of the terminal device. Among them, when selecting part of the data to be processed from all the initial target data to be processed that meets the conditions, it can be obtained and processed in chronological order according to the data generation time of the initial target data to be processed, so as to avoid the problem of data to be processed. The processing sequence is confused and causes data processing errors.
  • the preset number threshold and time window threshold can be obtained first, and then based on the number threshold and the time window threshold in turn to extract the target pending data from the initial target pending data, and construct different time windows based on the time window threshold (for example, if the time window threshold is 3, you can construct three time windows), and then The extracted target data to be processed is divided into different time windows.
  • the number threshold can be the maximum length of the encoding blocks contained in each window (also called chunk_size)
  • the time window threshold is the maximum number of time windows that the terminal device can provide (also called window_divide_max).
  • the data corresponding to the number of targets can be obtained from the initial target data to be processed according to the data generation time of the data to be processed, that is, in the order of data generation time.
  • Target data to be processed can be processed.
  • the number of target data to be processed may be the same as the number of initial target data to be processed, that is, all the data to be processed are obtained. Data processing.
  • window data corresponding to different time windows can also be obtained according to the division results, that is, window data corresponding to a new window can be obtained.
  • the window data can include the window status (initially it can be the unprocessed status, and it can be updated to the processing status, executed status, etc. according to the actual operation), the boundary time of the time window (such as the start time, end time of each time window time, etc.
  • the target data to be processed in each time window is arranged in chronological order, that is, the first target data to be processed has the earliest generation time, and the last target data to be processed has the latest data generation time.
  • Each time The start time of the window is the data generation time of the first target pending data assigned to this time window
  • the end time of each time window is the data generation time of the last target pending data allocated to this time window) .
  • the initial target data to be processed that meets the conditions can be filtered out from the data to be processed based on the first end time and the target time, and then the initial target data to be processed can be selected from the initial target data to be processed.
  • 3*2 pieces of target data to be processed are screened out, and divided according to the form of 2 pieces of target data to be processed in each time window, and 3 time windows are generated.
  • the start time and end time of the time window can also be determined based on the data generation time of the data to be processed in each time window, and then the window data of the time window can be determined.
  • Table 1 is an information table corresponding to the initial target data to be processed.
  • Table 1 contains 6 initial target data to be processed.
  • Each initial target data to be processed includes the data generation time and the data related to the target business. Basic information etc.
  • Table 2 is the information table corresponding to the target data to be processed.
  • Table 2 there are three time windows, and each time window contains two rows of target data to be processed.
  • Table 3 is the window data table corresponding to the target data to be processed.
  • Table 3 there are three time windows, and each time window corresponds to a start time and an end time.
  • Table 3 Window data table corresponding to target data to be processed
  • Time window number Starting time End Time 0 0 20211226 10:09:10.561 1 20211226 10:09:10.561 20211226 10:09:10.716 2 20211226 10:09:10.716 20211226 10:09:10.720
  • the boundary time of the time window is determined based on the generation time of the data to be processed assigned to the time window, which improves the flexibility of the boundary time determination method.
  • dividing the target data to be processed into time windows corresponding to the number of the window thresholds which may specifically include :
  • the end time of the target time window corresponds to multiple target to-be-processed data with the same data generation time, then allocate the multiple target to-be-processed data with the same data generation time to the target time. under the window.
  • the target data to be processed is divided into windows according to the data generation time of the target data to be processed, and there may be multiple pieces of data at the same time, in order to ensure that the generation time is the same (also called time
  • the target data to be processed with the same stamp) are not classified into different time windows (if they are classified into different time windows, it may cause the data that should be processed in the processing order specified by a certain business to be processed in the wrong order. , causing the data to not be processed normally). Therefore, if the end time of a time window corresponds to multiple pieces of data to be processed, multiple pieces of target data to be processed can be included in the time window instead of being included in the next time window, which improves the accuracy of data processing.
  • S303 Implement target services based on window data corresponding to different time windows.
  • the window data corresponding to the different time windows can be obtained according to the division results, and then the target service can be implemented based on the window data corresponding to the different time windows.
  • the window data can include a boundary time, which can include a start time and an end time.
  • the start time is the data generation time of the first data to be processed contained in the window corresponding to the window data
  • the end time is The data generation time of the last data to be processed contained in the window corresponding to the window data.
  • the to-be-processed data corresponding to the target business including the data generation time in real time, and then divide the to-be-processed data into different time windows according to the data generation time to obtain window data corresponding to different time windows, and then based on The window data corresponding to different time windows realizes the target business.
  • the data to be processed in real time is divided into different time windows according to the data generation time of the data to be processed, and then the data to be processed is processed according to the time window to achieve the goal.
  • the business method can realize streaming processing of the data to be processed obtained in real time. There is no need to wait for the data to be processed to reach the collection days or number before processing in batches. This improves the real-time nature of data processing and can be applied to real-time requirements. Higher services increase the scope of business scenarios that can be applied to massive data, thus ensuring user application experience.
  • realizing the target service based on the window data corresponding to the different time windows includes:
  • the earliest start time included in the window data, and the second end time is the latest end time included in the at least two first target window data.
  • the newly extracted data to be processed is executed synchronously, and the at least two first target window data are updated according to the execution result of the newly extracted data to be processed.
  • the existing synchronization processing permission lock adopts the form of coarse-grained lock. For example, an entire window of data is allocated to a permission lock, thereby ensuring that each task can only obtain one window for subsequent processing. .
  • fine-grained locks can be used to increase concurrency, that is, a permission lock can be assigned to part of the data in the window data (such as a row of data), that is, a window data can be divided into multiple lock granularities according to actual needs.
  • a fine-grained lock can also correspond to at least two time windows. After obtaining a synchronization processing permission lock (i.e., fine-grained lock), the data required in multiple time windows can be read out at once, thereby reducing the number of data The number of reads.
  • the maximum lock number is the maximum number of synchronized processing permission locks - 1. After reaching the maximum value, it returns to 0 and starts again.
  • Table 4 is the window data table after locking. Continuing to use Table 3 as an example, a synchronization processing permission lock can be assigned to every two time windows, and a lock number can be assigned to each synchronization processing permission lock.
  • Time window number lock number Starting time End Time 0 0 0 20211226 10:09:10.561 1 0 20211226 10:09:10.561 20211226 10:09:10.716 2 1 20211226 10:09:10.716 20211226 10:09:10.720
  • the corresponding data to be processed can be obtained according to the assigned synchronous processing permission lock, and the obtained data to be processed and the corresponding window data are encapsulated into data blocks ( It can also be called chunk), and submits the encapsulated data block to the data processing thread to achieve the relevant target business.
  • the window data corresponding to synchronization processing permission lock No. 0 can be obtained from all time windows (the obtained window data is data corresponding to at least two time windows), and then extract data from the data to be processed based on the second start time and the second end time contained in the at least two window data, and implement relevant target services based on the extracted data.
  • the data to be processed corresponding to each window is arranged in the order of data generation time, and each window data contains a boundary time.
  • the boundary time can correspond to a start time and an end time.
  • the start time corresponds to the window data.
  • the data generation time of the first data to be processed contained in the window is the data generation time of the last data to be processed contained in the window corresponding to the window data
  • the second start time is the window corresponding to at least two window data
  • the earliest start time among the included boundary times, and the second end time is the latest end time among the boundary times included in the windows corresponding to at least two first target window data.
  • the data processing methods in this application can be divided into two types: synchronous processing mode (also called MainRoad) and asynchronous processing mode (also called SideTrack).
  • synchronous processing mode also called MainRoad
  • asynchronous processing mode also called SideTrack
  • the synchronous processing mode can be used when implementing the target business based on the window data corresponding to different time windows
  • the asynchronous processing mode can be used when processing various abnormal scenarios.
  • abnormal scenarios It can be used for window processing failure retrieval, business data processing failure retrieval, delayed arrival data processing and window heartbeat loss retrieval, etc.
  • the data corresponding to the synchronous processing mode and the asynchronous processing mode can be resource isolated.
  • the data processing thread can be DataProcessPoolService, which can determine the obtainable lock number with the preset lock counter, and then obtain the synchronous processing permission lock corresponding to the lock number, and use the obtained synchronous processing permission according to the lock counter.
  • the lock obtains the processable time window data and the corresponding data to be processed, and encapsulates it into a data block.
  • the status of the window corresponding to the time window data can be set to the processing status, and the synchronization processing permission lock is released.
  • the lock counter automatically Add, and then the encapsulated data blocks can be submitted to DataProcessPoolService for processing to achieve the target business.
  • the data block can be used as an independent data processing unit, including window data, data to be processed corresponding to the time window, and the type of data block (such as synchronous processing, asynchronous processing, business failure re-pull, delayed arrival data re-pull) Pull and window loss heartbeat re-pull, corresponding to various processing in MainRoad mode and SideTrack mode respectively).
  • the type of data block such as synchronous processing, asynchronous processing, business failure re-pull, delayed arrival data re-pull
  • Pull and window loss heartbeat re-pull corresponding to various processing in MainRoad mode and SideTrack mode respectively.
  • the method may further include: after obtaining the asynchronous processing permission lock, obtaining the exception pending data that meets the preset conditions.
  • the abnormal data to be processed is executed asynchronously, and the window data corresponding to the abnormal data to be processed is updated according to the processing result of the abnormal data to be processed.
  • exceptions in the processing of the data to be processed may occur.
  • exception handling exceptions may be used data to handle exceptions.
  • obtaining the exception pending data that meets the preset conditions may include:
  • the second target window data corresponding to the time window in which the processing fails is that the number of synchronous processing within the first preset time period is greater than the first preset number threshold, and the number of asynchronous processing is less than the second preset Time window for times threshold.
  • the encapsulated data block may fail to be processed, and the abnormal data block can be reprocessed (the maximum number of retries can be customized).
  • the status of the time window corresponding to the data block can be set to the processing failure status.
  • the second target window data corresponding to the time window whose status is the processing failure status can be obtained.
  • the range of window data obtained can be limited by time. For example, the time window in which processing fails within a first preset time period (which can be any value from 3 to 5 days) can be obtained.
  • the obtained second target can be updated
  • the window status of the time window corresponding to the window data is the processing status, and the lock is released.
  • extract the exception pending data from the to-be-processed data table according to the third start time and the third end time contained in the obtained second target window data and encapsulate it into a window exception data block (also called a chunk, which can include The second target window data and the exception pending data corresponding to the second target window data).
  • pass the encapsulated window exception data block to the processing task, and submit the task to the task processing thread (such as DataProcessPoolService) to wait for execution.
  • the task processing thread such as DataProcessPoolService
  • the data to be processed contains data status.
  • the obtaining of the abnormal data to be processed that meets the preset conditions may include:
  • the preset interface (such as the batchUpdate interface in TiDB) is called to update the data status of all data to be processed to the processing success status (for example, the corresponding field can be The value is updated to 9);
  • the preset interface (such as the batchUpdate interface in TiDB) is called to update the processing status of the data to the processing failure status (for example, the value of the corresponding field can be + each time 1), and at the same time, the status of the unreturned data within the time window is updated to the processing success status (that is, the value of the corresponding field is updated to 9).
  • the processing status of the time window is updated to the processing success status (the value is S). Even if business data processing fails, the status of the time window can be updated to the processing success status. At this time, the processing status of the failed data to be processed is a non-successful status (9 is successful, failure is non-0 and less than value of 9), subsequent task processing can be done in SideTrack mode (reprocessing if business data processing fails).
  • a specific lock can be obtained (for example, it can be a special lock for business data processing failure).
  • the exception pending data data for processing failure can be extracted. For example, Each data to be processed has a processing status, 0 is the pending status, 1-8 is the number of retries in case of failure, and 9 is the success status (customizable configuration). Under this premise, you only need to filter the data that failed to be processed according to the processing status, that is, the data status contains any number from 1 to 8.
  • the query scope can also be limited by time.
  • the data generation time within the second preset time period can be obtained. It can then be encapsulated into a window exception data block (also called a chunk, which can include data that failed to process). Then pass the encapsulated window exception data block to the processing task, and submit the task to the task processing thread (such as DataProcessPoolService) to wait for execution.
  • a window exception data block also called a chunk, which can include data that failed to process.
  • the task processing thread such as DataProcessPoolService
  • the task processing thread such as DataProcessPoolService
  • the status of the data increases by 1 each time until the processing is successful. If it still fails after reaching the maximum number of retries, the status can be modified after manual confirmation by the operation and maintenance personnel, and the program will automatically execute the aforementioned process.
  • the window data contains the window status, then obtaining the exception pending data that meets the preset conditions may specifically include:
  • the end time of the delay period is the earliest time of the time window corresponding to the pending state, the processing state, and the processing failure state.
  • this application is a quasi-real-time data processing method that supports streaming processing, it needs to support the processing of delayed arrival of abnormal data to be processed.
  • you can first acquire a specific lock for example, you can acquire a dedicated lock for delayed arrival of data
  • the delay time of data to be processed will generally not exceed 3 days. Therefore, you can set the start time of the delay period to 3 days forward from the current time. If the status of the time window is pending or in-process, it means that the time window has not been processed or has not been processed yet.
  • the data to be processed within the time window cannot be regarded as delayed arrival, and there is no need to process it separately. , it can be processed uniformly by the synchronization task. If the status of the time window is the processing failure status, the time window can also be re-opened by the asynchronous task for processing. Therefore, the smallest time window among the time windows whose window status is the processing failure status, the processing status, or the pending status can be taken.
  • the start time is used as the end time that the data delay reaches. Then you can encapsulate the obtained exception pending data into a chunk (mainly delayed arrival data), then pass the encapsulated chunk to the processing task, and submit the task to the task processing thread (such as DataProcessPoolService) to wait for execution. And release the lock after execution ends.
  • the task processing thread such as DataProcessPoolService
  • window processing timeout it may also include the case of window processing timeout.
  • the window data contains the window status, then obtaining the exception pending data that meets the preset conditions may specifically include:
  • the window status is a processing status
  • the duration of the processing status exceeds an abnormal time window of a preset duration threshold.
  • the number of synchronous processing and the number of asynchronous processing corresponding to the abnormal time window are updated to zero, where the number of synchronous processing and the number of asynchronous processing are stored in the exception window data.
  • the start time, the fourth end time is the latest end time included in the abnormal window data.
  • a specific lock can be obtained first (for example, it can be a special lock for lost heartbeat), and if the lock cannot be obtained, an exception prompt can be generated. Then you can obtain the abnormal time window for extracting lost heartbeats, that is, the window status is in the processing state for a long time, and the length of time in the processing state exceeds the preset duration threshold. For example, a window that started processing 10 minutes ago is still processing. status, this time can be customized and configured based on experience.
  • the exception pending data can also be obtained from the to-be-processed data according to the boundary time (ie, the fourth start time and the fourth end time) in the exception window data corresponding to the exception time window, and encapsulated into a chunk (which may include the window data and data to be processed in the window), and passes the encapsulated chunk to the processing task, and submits the task to the task processing thread (such as DataProcessPoolService) to wait for execution, and releases the lock after the execution is completed.
  • the task processing thread such as DataProcessPoolService
  • various abnormal scenarios can support asynchronous tasks to automatically pull up and retry some abnormal data to be processed, that is, the normal window data is executed in the synchronization task, and when some data is executed abnormally, the abnormal data is marked. If it fails, then there will be an asynchronous task to automatically retry the abnormal data, which improves the data processing efficiency and has good quasi-real-time performance. Only the part of the data that failed to be processed will be reprocessed. The successfully processed data does not need to be reprocessed, reducing the It reduces the data processing volume and further improves the data processing efficiency.
  • FIG 4 is a schematic diagram of the principle of the data processing method provided by the embodiment of the present application.
  • the database can be a TDSQL database.
  • the TDSQL database contains data to be processed, and the data to be processed can be synchronized to TiDB, the window division thread (for example, it can be WindowDivisionThread) in the terminal device can first obtain the window division special lock (that is, the data division permission lock), and then read the first end time in the historical window division data, and based on the first At the end of the time, the data to be processed is divided into windows, the new window data is obtained, and then the window-specific lock is released.
  • the window division thread for example, it can be WindowDivisionThread
  • the window data processing thread (for example, it can be WindowedProcessorThread) can first acquire the window extraction lock (also known as the synchronization processing permission lock), and then can determine the window data corresponding to the target time window to be processed, and release the window extraction lock, Then determine the target data to be processed based on the window data corresponding to the target time window, encapsulate the window data corresponding to the target time window and the target data to be processed into data blocks (also called Chunks), and send the encapsulated data blocks to
  • the data processing thread pool also known as DataProcessPoolService
  • the data processing thread pool DataProcessPoolService schedules the data processing task StreamedProcessorRunner
  • the data processing task StreamedProcessorRunner calls the corresponding business logic (specific processing logic implemented by the business developer) to process the data to be processed within the time window. Process the data and perform related follow-up operations based on the processing results, such as task failure retry, data processing status update, etc.
  • this application is based on TiDB's streaming processing architecture, which can support quasi-real-time processing of financial-level data.
  • business developers only need to implement the business processing related interface StreamedProcessor, and all other functions can be implemented based on the streaming processing architecture layer: docking data Source, task scheduling, reliability guarantee, etc., reducing the workload of business developers.
  • Figure 5 is a schematic structural diagram of the data processing device provided by the embodiment of the present application. As shown in Figure 5, the device provided by this embodiment may include:
  • the acquisition module 501 is used to acquire the data to be processed corresponding to the target business in real time, where the data to be processed includes the data generation time.
  • the processing module 502 is configured to divide the data to be processed into different time windows according to the data generation time, and obtain window data corresponding to different time windows according to the division results, wherein the window data of each time window includes A boundary time.
  • Each boundary time includes a start time and an end time.
  • the start time is the earliest generation time included in the data to be processed under the time window allocated to it.
  • the end time is the time allocated to the time window.
  • processing module 502 is also used to:
  • historical window data is obtained, wherein the historical window data includes the first end time, and the historical window data is the data corresponding to the time window in which at least one window status is the executed state, so The first end time is the latest end time among the boundary times included in the historical window data.
  • the data to be processed is extracted according to the first end time and the target time to obtain initial target data to be processed, wherein the target time is determined based on the current time and the first preset delay length.
  • Target data to be processed corresponding to the number of targets is extracted from the initial target data to be processed in sequence according to data generation time, where the number of targets is the product of a preset number threshold and a preset time window threshold.
  • processing module 502 is also used to:
  • the end time of the target time window corresponds to multiple target to-be-processed data with the same data generation time, then allocate the multiple target to-be-processed data with the same data generation time to the target time. under the window.
  • the processing module 502 is also used to implement the target service according to the window data corresponding to the different time windows.
  • the processing module 502 is also used to:
  • the earliest start time included in the window data, and the second end time is the latest end time included in the at least two first target window data.
  • the newly extracted data to be processed is executed synchronously, and the at least two first target window data are updated according to the execution result of the newly extracted data to be processed.
  • processing module 502 is also used to:
  • the abnormal data to be processed is executed asynchronously, and the window data corresponding to the abnormal data to be processed is updated according to the processing result of the abnormal data to be processed.
  • processing module 502 is also used to:
  • the second target window data corresponding to the time window in which the processing fails is that the number of synchronous processing within the first preset time period is greater than the first preset number threshold, and the number of asynchronous processing is less than the second preset Time window for times threshold.
  • the data to be processed includes data status
  • the processing module 502 is also used to:
  • the window data includes window status
  • the processing module 502 is also used to:
  • the end time of the delay period is the earliest time of the time window corresponding to the pending state, the processing state, and the processing failure state.
  • the window data includes window status
  • the processing module 502 is also used to:
  • the window status is a processing status
  • the duration of the processing status exceeds an abnormal time window of a preset duration threshold.
  • the number of synchronous processing and the number of asynchronous processing corresponding to the abnormal time window are updated to zero, where the number of synchronous processing and the number of asynchronous processing are stored in the exception window data.
  • the start time, the fourth end time is the latest end time included in the abnormal window data.
  • the device provided by the embodiment of the present application can implement the method of the embodiment shown in Figure 2.
  • the implementation principles and technical effects are similar and will not be described again here.
  • FIG. 6 is a schematic diagram of the hardware structure of an electronic device provided by an embodiment of the present application.
  • a device 600 provided by this embodiment includes a processor 601 and a memory communicatively connected to the processor. Among them, the processor 601 and the memory 602 are connected through a bus 603.
  • the processor 601 executes the computer execution instructions stored in the memory 602, so that the processor 601 executes the method in the above method embodiment.
  • the processor may be a central processing unit (English: Central Processing Unit, referred to as: CPU), or other general-purpose processors, digital signal processors (English: Digital Signal Processor, referred to as: DSP), application specific integrated circuit (English: Application Specific Integrated Circuit, referred to as: ASIC), etc.
  • a general-purpose processor may be a microprocessor or the processor may be any conventional processor, etc. The steps of the method disclosed in conjunction with the invention can be directly embodied and executed by a hardware processor, or executed by a combination of hardware and software modules in the processor.
  • the memory may include high-speed RAM memory and may also include non-volatile storage NVM, such as at least one disk memory.
  • the bus can be an Industry Standard Architecture (Industry Standard Architecture, ISA) bus, a Peripheral Component Interconnect (PCI) bus or an Extended Industry Standard Architecture (EISA) bus, etc.
  • ISA Industry Standard Architecture
  • PCI Peripheral Component Interconnect
  • EISA Extended Industry Standard Architecture
  • the bus can be divided into address bus, data bus, control bus, etc.
  • the bus in the drawings of this application is not limited to only one bus or one type of bus.
  • Embodiments of the present application also provide a computer-readable storage medium.
  • Computer-executable instructions are stored in the computer-readable storage medium.
  • the processor executes the computer-executable instructions, the data processing method of the above method embodiment is implemented.
  • An embodiment of the present application also provides a computer program product, which includes a computer program.
  • the computer program When the computer program is executed by a processor, it implements the data processing method as described above.
  • the above-mentioned computer-readable storage medium can be implemented by any type of volatile or non-volatile storage device or their combination, such as static random access memory (SRAM), electrically erasable memory.
  • SRAM static random access memory
  • EEPROM Programmable read-only memory
  • EPROM erasable programmable read-only memory
  • PROM programmable read-only memory
  • ROM read-only memory
  • Readable storage media can be any available media that can be accessed by a general purpose or special purpose computer.
  • An exemplary readable storage medium is coupled to the processor such that the processor can read information from the readable storage medium and write information to the readable storage medium.
  • the readable storage medium may also be an integral part of the processor.
  • the processor and readable storage medium may be located in Application Specific Integrated Circuits (ASICs for short).
  • ASICs Application Specific Integrated Circuits
  • the processor and the readable storage medium may also exist as discrete components in the device.
  • the aforementioned program can be stored in a computer-readable storage medium.
  • the steps including the above-mentioned method embodiments are executed; and the aforementioned storage media include: ROM, RAM, magnetic disks, optical disks and other media that can store program codes.

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

提供了一种数据处理方法、装置及电子设备,数据处理方法包括实时获取目标业务对应的待处理数据,其中,待处理数据中包含数据生成时间(S301),根据数据生成时间将待处理数据划分至不同的时间窗口下,并根据划分结果得到不同时间窗口对应的窗口数据,其中,每个时间窗口的窗口数据中包含一边界时间,每个边界时间包含一开始时间和一结束时间,开始时间为分配至时间窗口下的待处理数据中包含的最早生成时间,结束时间为分配至时间窗口下的待处理数据中包含的最晚生成时间(S302),根据不同时间窗口对应的窗口数据实现目标业务(S303)。本申请提高了数据处理的实时性,提高了海量数据可应用业务场景的范围,进而保证了用户的应用体验。

Description

数据处理方法、装置及电子设备
本申请要求于2022年5月30日提交中国专利局、申请号为202210603396.3、申请名称为“数据处理方法、装置及电子设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及涉及大数据技术领域,尤其涉及一种数据处理方法、装置及电子设备。
背景技术
随着计算机技术的发展,越来越多的技术应用在金融领域,传统金融业正在逐步向金融科技(Fintech)转变,大数据技术也不例外,但由于金融行业的安全性、实时性要求,也对大数据技术提出了更高的要求。为了满足各金融业务增长的需求,大数据处理框架的应用变的越来越普遍。
现有技术中,可以通过大数据处理框架来对海量数据进行批量处理,进而实现相关的业务。在对海量数据进行批量处理时,可以先收集不同业务对应的待处理数据,等到待处理数据达到收集天数或个数之后,再批量处理收集到的待处理数据。
然而,该种处理方式时延较长,降低了数据处理的实时性,仅可以应用于实时性要求不高的业务,无法应用于实时性要求较高的业务,降低了海量数据处理可应用业务场景的范围,进而影响了用户的应用体验。
发明内容
本申请的目的在于提供一种数据处理方法、装置及电子设备,以提高数据处理的实时性。
第一方面,本申请公开了一种数据处理方法,包括:
实时获取目标业务对应的待处理数据,其中,所述待处理数据中包含数据生成时间;
根据所述数据生成时间将所述待处理数据划分至不同的时间窗口下,并根据划分结果得到不同时间窗口对应的窗口数据,其中,每个时间窗口的窗口数据中包含一边界时间,每个边界时间包含一开始时间和一结束时间,所述开始时间为分配至所述时间窗口下的待处理数据中包含的最早生成时间,所述结束时间为分配至所述时间窗口下的待处理数据中包含的最晚生成时间;
根据所述不同时间窗口对应的窗口数据实现所述目标业务。
可选的,所述根据所述数据生成时间将所述待处理数据划分至不同的时间窗口下,并根据划分结果得到不同时间窗口对应的窗口数据,包括:
在获取到数据划分权限锁时,获取历史窗口数据,其中,所述历史窗口数据中包含第一结束时间,所述历史窗口数据为至少一窗口状态为已执行状态的时间窗口对应的数据,所述第一结束时间为所述历史窗口数据中包含的边界时间中最晚的结束时间;
根据所述第一结束时间以及目标时间对所述待处理数据进行提取处理,得到初始目标待处理数据,其中,所述目标时间为根据当前时间与第一预设延迟时长确定的;
按数据生成时间顺序从所述初始目标待处理数据中提取与目标个数相对应的目标待处理数据,其中,所述目标个数为预设个数阈值与预设时间窗口阈值的乘积;
构建与所述时间窗口阈值的个数对应的时间窗口,并基于所述目标待处理数据的数据生成时间以及所述个数阈值,将所述目标待处理数据划分至与所述时间窗口阈值的个数对应的时间窗口下,并根据划分结果得到不同时间窗口对应的窗口数据。
可选的,所述基于所述目标待处理数据的数据生成时间以及所述预设个数阈值,将所述目标待处理数据划分至与所述窗口阈值的个数对应的时间窗口下,包括:
针对任一目标时间窗口,若所述目标时间窗口的结束时间对应多个数据生成时间相同的目标待处理数据,则将所述多个数据生成时间相同的目标待处理数据分配至所述目标时间窗口下。
可选的,每个同步处理权限锁对应的窗口数据为至少两个,则所述根据所述不同时间窗口对应的窗口数据实现所述目标业务,包括:
在得到目标同步处理权限锁之后,获取与所述目标同步处理权限锁对应的至少两个第一目标窗口数据;
根据所述至少两个第一目标窗口数据中包含的第二开始时间以及第二结束时间从所述待处理数据中提取数据,其中,所述第二开始时间为所述至少两个第一目标窗口数据中包含的最早的开始时间,所述第二结束时间为所述至少两个第一目标窗口数据中包含的最晚的结束时间;
同步执行新提取的待处理数据,并根据所述新提取的待处理数据的执行结果更新所述至少两个第一目标窗口数据。
可选的,还包括:
在得到异步处理权限锁之后,获取满足预设条件的异常待处理数据;
异步执行所述异常待处理数据,并根据所述异常待处理数据的处理结果更新所述异常待处理数据对应的窗口数据。
可选的,所述获取满足预设条件的异常待处理数据,包括:
获取处理失败的时间窗口对应的第二目标窗口数据,其中,所述处理失败的时间窗口为第一预设时长内同步处理次数大于第一预设次数阈值,且异步处理次数小于第二预设次数阈值的时间窗口;
根据所述第二目标窗口数据中包含的第三开始时间以及第三结束时间从所述待处理数据表中提取异常待处理数据,其中,所述第三开始时间为所述第二目标窗口数据中包含的最早的开始时间,所述第三结束时间为所述第二目标窗口数据中包含的最晚的结束时间。
可选的,所述待处理数据中包含数据状态,则所述获取满足预设条件的异常待处理数据,包括:
从所述待处理数据中获取数据生成时间在第二预设时长内,且数据状态为处理失败状态的待处理数据,得到异常待处理数据。
可选的,所述窗口数据中包含窗口状态,则所述获取满足预设条件的异常待处理数据,包括:
从所述待处理数据中获取数据生成时间满足延迟时间段的待处理数据,得到异常待处理数据,其中,所述延迟时间段的开始时间为通过当前时间与第二预设延迟时长确定的,所述延迟时间段的结束时间为窗口状态为待处理状态、处理中状态以及处理失败状态对应的时间窗口的最早时间。
可选的,所述窗口数据中包含窗口状态,则所述获取满足预设条件的异常待处理数据,包括:
确定所述窗口状态为处理中状态,且处于处理中状态的时长超过预设时长阈值的异常时间窗口;
将所述异常时间窗口对应的同步处理次数和异步处理次数更新为零,其中,所述同步处理次数与所述异步处理次数存储于异常窗口数据中;
根据所述异常窗口数据中包含的第四开始时间以及第四结束时间从所述待处理数据中提取异常待处理数据,其中,所述第四开始时间为所述异常窗口数据中包含的最早的开始时间,所述第四结束时间为所述异常窗口数据中包含的最晚的结束时间。
第二方面,本申请公开了一种数据处理装置,包括:
获取模块,用于实时获取目标业务对应的待处理数据,其中,所述待处理数据中包含数据生成时间;
处理模块,用于根据所述数据生成时间将所述待处理数据划分至不同的时间窗口下,并根据划分结果得到不同时间窗口对应的窗口数据,其中,每个时间窗口的窗口数据中包含一边界时间,每个边界时间包含一开始时间和一结束时间,所述开始时间为分配至所述时间窗口下的待处理数据中包含的最早生成时间,所述结束时间为分配至所述时间窗口下的待处理数据中包含的最晚生成时间;
所述处理模块,还用于根据所述不同时间窗口对应的窗口数据实现所述目标业务。
第三方面,本申请公开了一种电子设备,包括:处理器,以及与所述处理器通信连接的存储器;
所述存储器存储计算机执行指令;
所述处理器执行所述存储器存储的计算机执行指令,以实现如上第一方面以及第一方面各种可能的设计所述的数据处理方法。
第四方面,本申请公开了一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机执行指令,当处理器执行所述计算机执行指令时,以实现如上第一方面以及第一方面各种可能的设计所述的数据处理方法。
第五方面,本申请公开了一种计算机程序产品,包括计算机程序,所述计算机程序被处理器执行时,以实现如上第一方面以及第一方面各种可能的设计所述的数据处理方法。
本申请实施例提供了一种数据处理方法、装置及电子设备,采用上述方案后,可以先实时获取目标业务对应的包含数据生成时间的待处理数据,然后根据数据生成时间将待处理数据划分至不同的时间窗口下,得到不同时间窗口对应的窗口数据,其中,每个时间窗口的窗口数据中包含一边界时间,每个边界时间包含一开始时间和一结束时间,所述开始时间为分配至所述时间窗口下的待处理数据中包含的最早生成时间,所述结束时间为分配至所述时间窗口下的待处理数据中包含的最晚生成时间,再根据不同时间窗口对应的窗口数据实现目标业务,通过依据待处理数据的数据生成时间来将实时获取的待处理数据划分 至不同的时间窗口下,然后再根据时间窗口来处理待处理数据,进而实现目标业务的方式,可以实现对实时获取的待处理数据进行流式处理,无需再等到待处理数据达到收集天数或个数后才能批量进行处理,提高了数据处理的实时性,可以应用于实时性要求较高的业务,提高了海量数据可应用业务场景的范围,进而保证了用户的应用体验。
附图说明
图1为现有的大数据处理框架的应用示意图;
图2为本申请实施例提供的数据处理方法的应用系统的架构示意图;
图3为本申请实施例提供的数据处理方法的流程示意图;
图4为本申请实施例提供的数据处理方法的原理示意图;
图5为本申请实施例提供的数据处理装置的结构示意图;
图6为本申请实施例提供的电子设备的硬件结构示意图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”、“第三”“第四”等(如果存在)是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的本申请的实施例还能够包括除了图示或描述的那些实例以外的其他顺序实例。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。
传统大数据处理架构可以包括流式处理架构(例如Spark Streaming、Flink等)、批量处理架构(例如Hive等)以及流批一体化处理架构(例如Flink等),涉及组件较多,与常用的关系型数据库(例如,MySQL等)不兼容,且无法有效同时支持OLAP(On-Line Analytical Processing,联机分析处理)业务和OLTP(On-Line Transaction Processing,联机事务处理)业务,特别是金融科技类业务。此外,MySQL等关系型数据库也无法支撑海量数据的处理。
在此背景下,新的大数据处理框架(例如,TiDB)应运而生。目前,新的大数据处理框架在金融科技类业务中可以提供海量数据的批量处理,还可以在海量数据处理之后提供数据查询等服务。在对海量数据进行批量处理时,由于目前的大数据处理框架中没有部署流数据处理平台,所以无法使用现有流式处理技术进行处理,即无法实现数据的准实时处理。通常需要先收集不同业务对应的待处理数据,等到待处理数据达到收集天数或个数之后,再批量处理收集到的待处理数据。示例性的,图1为现有的大数据处理框架的应用示意图,如图1所示,现有的批量处理方式一般采用T+1日的形式,即大数据处理框架可以先等待T日,待T日待处理数据全部汇入到大数据处理框架之后,停止新的待处理数据的 汇入,然后在T+1日对所有获取的新的待处理数据进行批量处理。然而,该种处理方式时延较长,降低了数据处理的实时性,仅可以应用于实时性要求不高的业务,无法应用于实时性要求较高的业务,降低了海量数据处理可应用业务场景的范围,进而影响了用户的应用体验。
此外,现有技术中还可以有针对小时级别的任务,即处理前序某个小时汇入的数据,从而来提高实时性,但是数据处理的延时性仍然较高。
基于上述技术问题,本申请通过依据待处理数据的数据生成时间来将实时获取的待处理数据划分至不同的时间窗口下,然后再根据时间窗口来处理待处理数据,进而实现目标业务的方式,可以实现对实时获取的待处理数据进行流式处理,无需再等到待处理数据达到收集天数或个数后才能批量进行处理,达到了提高了数据处理的实时性,可以应用于实时性要求较高的业务,提高了海量数据可应用业务场景的范围(即既可以应用于对实时性要求不高的场景吗,也可以应用于对实时性要求较高的场景),进而保证了用户的应用体验的技术效果。
图2为本申请实施例提供的数据处理方法的应用系统的架构示意图,如图2所示,在该应用系统中,可以包括:分布式数据库,大数据处理框架以及终端设备,分布式数据库可以实时同步待处理数据至大数据处理框架中,终端设备可以流式处理大数据处理框架中的待处理数据,进而提高了待处理数据处理的实时性。
其中,分布式数据库可以采用现有的数据库,数据库中的待处理数据可以为不同业务对应的数据,例如,可以为金融业务对应的数据。大数据处理框架可以为TiDB,TiDB是一种同时支持在线事务处理与在线分析处理的融合型分布式开待处理数据库产品,具备水平扩容或者缩容,实时HTAP(即混合OLTP和OLAP业务同时处理的系统)等功能,同时具有关系型数据库和非关系型数据库的特点。且在分布式数据库实时同步待处理数据至大数据处理框架中时,可以通过TiDB提供的数据同步工具,DM(Data Migrator,数据迁移器)来实现。终端设备可以为智能手机、个人电脑、平板、服务器或者服务器集群等设备。
另外,大数据处理框架可以部署于独立的设备中,也可以部署于终端设备中。
下面以具体地实施例对本申请的技术方案进行详细说明。下面这几个具体的实施例可以相互结合,对于相同或相似的概念或过程可能在某些实施例不再赘述。
图3为本申请实施例提供的数据处理方法的流程示意图,本实施例的方法可以由终端设备执行。如图3所示,本实施例的方法,可以包括:
S301:实时获取目标业务对应的待处理数据,其中,待处理数据中包含数据生成时间。
在本实施例中,在实现目标业务时,可以先获取与目标业务相关的待处理数据。为了提高业务数据处理的实时性,可以采用流式处理业务数据的方式,即实时获取目标业务对应的待处理数据,并实时处理获取到的待处理数据。其中,待处理数据中可以包含与目标业务相关的基本信息,数据生成时间,还可以包含数据状态,用于标识待处理数据的当前状态。示例性的,数据状态可以为待处理状态、处理中状态、处理失败状态、或者处理成功状态等。默认的数据状态为待处理状态(例如,可以用0表示),后续可以根据具体的处理过程来更新待处理数据的状态。
此外,待处理数据可以有一个或多个。可选的,在实时获取目标业务对应的待处理数 据时,可以从目标业务对应的分布式数据库中获取。即各业务系统在生成目标业务对应的待处理数据之后,可以将待处理数据存储至分布式数据库中,然后不同的分布式数据库可以将待处理数据同步至大数据处理框架中进行存储,终端设备可以根据待处理数据中包含的数据生成时间依次从大数据处理框架中获取待处理数据,并执行待处理数据来实现目标业务。
S302:根据数据生成时间将待处理数据划分至不同的时间窗口下,并根据划分结果得到不同时间窗口对应的窗口数据,其中,每个时间窗口的窗口数据中包含一边界时间,每个边界时间包含一开始时间和一结束时间,所述开始时间为分配至所述时间窗口下的待处理数据中包含的最早生成时间,所述结束时间为分配至所述时间窗口下的待处理数据中包含的最晚生成时间。
在本实施例中,在实时获取到待处理数据之后,由于待处理数据为不同的分布式数据库对应的数据,且待处理数据的数量是较大的,若直接对获取到的待处理数据进行处理,容易出现数据遗漏或者数据处理顺序出错等情况。因此,可以根据数据生成时间将待处理数据划分至不同的时间窗口下,得到不同时间窗口对应的窗口数据。其中,每个时间窗口对应的待处理数据的数量可以根据实际应用场景自定义进行设置。
进一步的,所述根据所述数据生成时间将所述待处理数据划分至不同的时间窗口下,并根据划分结果得到不同时间窗口对应的窗口数据,具体可以包括:
在获取到数据划分权限锁时,获取历史窗口数据,其中,所述历史窗口数据中包含第一结束时间,所述历史窗口数据为至少一窗口状态为已执行状态的时间窗口对应的数据,所述第一结束时间为所述历史窗口数据中包含的边界时间中最晚的结束时间。
根据所述第一结束时间以及目标时间对所述待处理数据进行提取处理,得到初始目标待处理数据,其中,所述目标时间为根据当前时间与第一预设延迟时长确定的。
按数据生成时间依次从所述初始目标待处理数据中提取与目标个数相对应的目标待处理数据,其中,所述目标个数为预设个数阈值与预设时间窗口阈值的乘积。
构建与所述时间窗口阈值的数量对应的时间窗口,并基于所述目标待处理数据的数据生成时间以及所述个数阈值,将所述目标待处理数据划分至与所述时间窗口阈值的数量对应的时间窗口下,并根据划分结果得到不同时间窗口对应的窗口数据。
具体的,在将待处理数据划分至不同的时间窗口时,需要先获取数据划分权限锁,只有获取到了数据划分权限锁,才可以执行将待处理数据划分至不同的时间窗口的过程,且在一个线程获取到数据划分权限锁之后,其他线程则无法获取该数据划分权限锁,实现了每条待处理数据只能归属在一个窗口中的效果,保证了数据划分的正确性。
此外,历史窗口数据为至少一窗口状态为已执行状态的时间窗口对应的数据,每个已执行状态的时间窗口对应一边界时间,每个边界时间可以包含一开始时间(Timestamp_From)和一结束时间(Timestamp_Till),开始时间为窗口对应的数据中第一个数据的数据生成时间,结束时间为窗口对应的数据中最后一个数据的数据生成时间。而第一结束时间为历史窗口数据中包含的边界时间中最晚的结束时间,即最后一个数据的数据生成时间。在获取到数据划分权限锁之后,可以获取第一结束时间,然后可以将第一结束时间设置为新的时间窗口的开始时间,如果是首次分窗,则可以将第一个时间窗口的开始时间设置为0。然后可以根据第一结束时间以及后续确定的目标时间,对待处理数据 进行提取处理,得到初始目标待处理数据,其中,目标时间可以通过将当前时间与第一预设延迟时长进行做差处理确定。第一预设延迟时长可以根据实际应用场景自定义进行设置,可选的,第一预设延迟时长可以根据当前时段中初始目标待处理数据的数量来确定。例如,若当前时段中初始目标待处理数据的数量较少,则可以将第一预设延迟时长设置的较小,也可以将第一预设延迟时长设置为0;若当前时段中初始目标待处理数据的数量较多,则可以将第一预设延迟时长设置的较大,对应获取的待处理数据的时间段较短,因此,可以适当减少每次获取的待处理数据的数量,进而降低了终端设备处理数据的压力。
可选的,在根据第一结束时间以及目标时间,得到初始目标待处理数据之后,可以将生成时间满足第一结束时间至目标时间这个时间段的所有初始目标待处理数据均划分至不同的时间窗口下。也可以根据终端设备的实际处理能力从满足条件的所有初始目标待处理数据中选择部分待处理数据进行处理。其中,在从满足条件的所有初始目标待处理数据中选择部分待处理数据进行处理时,可以按照初始目标待处理数据的数据生成时间,按时间先后顺序依次获取并进行处理,避免由于待处理数据的处理顺序混乱造成数据处理出错的情况。
此外,在将获取到的初始目标待处理数据划分至不同的时间窗口下时,为了符合终端设备的实际计算量,可以先获取预先设置的个数阈值以及时间窗口阈值,然后再根据个数阈值以及时间窗口阈值依次从初始目标待处理数据中提取目标待处理数据,并根据时间窗口阈值构建不同的时间窗口(例如,若时间窗口阈值为3,则可以构建三个时间窗口),然后并将提取到的目标待处理数据划分至不同的时间窗口下。其中,个数阈值可以为每个窗口中包含的编码块的最大长度(也可称为chunk_size),时间窗口阈值为终端设备可以提供的最大时间窗口个数(也可称为window_divide_max)。
可选的,在获取到时间窗口阈值以及个数阈值之后,可以先对时间窗口阈值以及个数阈值做乘积处理,得到目标个数,然后从待处理数据中获取与目标个数对应的目标待处理数据。其中,为了保证目标业务的正常实现,在获取目标待处理数据时,可以按照待处理数据的数据生成时间,即按照数据生成时间的先后顺序从初始目标待处理数据中获取与目标个数对应的目标待处理数据。此外,在初始目标待处理数据的数量较少时,即并发执行的业务较少时,目标待处理数据的个数与初始目标待处理数据的个数可能相同,即获取得到的是所有的待处理数据。
此外,在将目标待处理数据划分至与时间窗口阈值的数量对应的时间窗口下之后,还可以根据划分结果得到不同时间窗口对应的窗口数据,即得到新的窗口对应的窗口数据。其中,窗口数据中可以包含窗口状态(初始可以为未处理状态,后续可以根据实际操作更新为处理中状态、已执行状态等),时间窗口的边界时间(如每个时间窗口的开始时间、结束时间等,每个时间窗口中的目标待处理数据为按时间先后顺序排列的,即第一个目标待处理数据的生成时间最早,最后一个目标待处理数据的数据生成时间最晚,每个时间窗口的开始时间为分配到该时间窗口下的第一个目标待处理数据的数据生成时间,每个时间窗口的结束时间为分配到该时间窗口下的最后一个目标待处理数据的数据生成时间)。
示例性的,时间窗口阈值为3,个数阈值为2,则可以先根据第一结束时间以及目标时间从待处理数据中筛选出符合条件的初始目标待处理数据,然后从初始目标待处理数据中筛选出3*2条目标待处理数据,并按照每个时间窗口2条目标待处理数据的形式进行划 分,生成3个时间窗口。还可以根据每个时间窗口下的待处理数据的数据生成时间确定时间窗口的开始时间以及结束时间,进而确定时间窗口的窗口数据。
示例性的,表1为初始目标待处理数据对应的信息表,在表1中,包含6个初始目标待处理数据,每个初始目标待处理数据中包含数据生成时间,以及与目标业务相关的基本信息等。
表1初始目标待处理数据对应的信息表
编号 数据生成时间 基本信息
1 20211226 10:09:10.332 XXXXXXXXXX
2 20211226 10:09:10.561 XXXXXXXXXX
3 2021 122610:09:10.714 XXXXXXXXXX
4 20211226 10:09:10.716 XXXXXXXXXX
5 20211226 10:09:10.718 XXXXXXXXXX
6 20211226 10:09:10.720 XXXXXXXXXX
表2为目标待处理数据对应的信息表,在表2中,有三个时间窗口,每个时间窗口中包含两行目标待处理数据。
表2目标待处理数据对应的信息表
Figure PCTCN2022127575-appb-000001
表3为目标待处理数据对应的窗口数据表,在表3中,有三个时间窗口,每个时间窗口对应一开始时间以及一结束时间。
表3目标待处理数据对应的窗口数据表
时间窗口编号 开始时间 结束时间
0 0 20211226 10:09:10.561
1 20211226 10:09:10.561 20211226 10:09:10.716
2 20211226 10:09:10.716 20211226 10:09:10.720
综上,通过先对待处理数据进行筛选,得到满足预设个数的目标待处理数据,再将筛选出的目标待处理数据划分至不同的时间窗口下,既提高了待处理数据的处理效率,也符合终端设备的实际计算量,避免了因计算量较大等原因造成终端设备出现故障的情况,进 而保证了业务的正常实现。且时间窗口的边界时间是根据分配至该时间窗口的待处理数据的生成时间确定的,提高了边界时间确定方式的灵活性。
此外,所述基于所述目标待处理数据的数据生成时间以及所述预设个数阈值,将所述目标待处理数据划分至与所述窗口阈值的个数对应的时间窗口下,具体可以包括:
针对任一目标时间窗口,若所述目标时间窗口的结束时间对应多个数据生成时间相同的目标待处理数据,则将所述多个数据生成时间相同的目标待处理数据分配至所述目标时间窗口下。
具体的,由于在对目标待处理数据进行分窗时,是按照目标待处理数据的数据生成时间来划分的,而同一时间可能会有多条数据,为了保证生成时间相同(也可称为时间戳相同)的目标待处理数据不被划入不同的时间窗口中(如果被划入不同的时间窗口中,可能导致本来应该按照一定业务规定的处理顺序处理的数据,出现数据处理顺序错误的情况,进而导致数据无法被正常处理)。所以如果一个时间窗口的结束时间对应多条待处理数据,可以将多条目标待处理数据都划入该时间窗口中,而不划入至下一个时间窗口中,提高了数据处理的准确性。
S303:根据不同时间窗口对应的窗口数据实现目标业务。
在本实施例中,在将待处理数据划分至不同的时间窗口下之后,可以根据划分结果得到不同时间窗口对应的窗口数据,然后可以根据不同时间窗口对应的窗口数据来实现目标业务。
可选的,窗口数据中可以包含边界时间,该边界时间可以包括开始时间以及结束时间,开始时间为该窗口数据对应的窗口中包含的第一个待处理数据的数据生成时间,结束时间为该窗口数据对应的窗口中包含的最后一个待处理数据的数据生成时间,然后可以从待处理数据中提取生成时间满足开始时间以及结束时间组成的时间段中的数据,并根据提取到的数据实现目标业务。
采用上述方案后,可以先实时获取目标业务对应的包含数据生成时间的待处理数据,然后根据数据生成时间将待处理数据划分至不同的时间窗口下,得到不同时间窗口对应的窗口数据,再根据不同时间窗口对应的窗口数据实现目标业务,通过依据待处理数据的数据生成时间来将实时获取的待处理数据划分至不同的时间窗口下,然后再根据时间窗口来处理待处理数据,进而实现目标业务的方式,可以实现对实时获取的待处理数据进行流式处理,无需再等到待处理数据达到收集天数或个数后才能批量进行处理,提高了数据处理的实时性,可以应用于实时性要求较高的业务,提高了海量数据可应用业务场景的范围,进而保证了用户的应用体验。
基于图2的方法,本说明书实施例还提供了该方法的一些具体实施方案,下面进行说明。
在另一实施例中,每个同步处理权限锁对应的窗口数据为至少两个,则所述根据所述不同时间窗口对应的窗口数据实现所述目标业务,包括:
在得到目标同步处理权限锁之后,获取与所述目标同步处理权限锁对应的至少两个第一目标窗口数据。
根据所述至少两个第一目标窗口数据中包含的第二开始时间以及第二结束时间从所述待处理数据中提取数据,其中,所述第二开始时间为所述至少两个第一目标窗口数据中 包含的最早的开始时间,所述第二结束时间为所述至少两个第一目标窗口数据中包含的最晚的结束时间。
同步执行新提取的待处理数据,并根据所述新提取的待处理数据的执行结果更新所述至少两个第一目标窗口数据。
在本实施例中,现有的同步处理权限锁是采用粗粒度锁的形式,例如,将一整个窗口数据来分配一权限锁,进而保证每个任务只能获取到一个窗口来进行后续的处理。然而流式处理过程由于处理的数据量大,且需要保证准实时性,常规的处理方案无法实现。因此,可以采用细粒度锁的形式来增加并发,即可以为窗口数据中的部分数据(如一行数据)来分配一权限锁,即可以将一个窗口数据根据实际需要划分为多个锁粒度,每个细粒度锁还可以对应至少两个时间窗口,在获取到一个同步处理权限锁(即细粒度锁)之后,可以一次性读取出多个时间窗口范围内所需要的数据,进而减少了数据读取的次数。
可选的,可以先获取预先设置好的每个同步处理权限锁对应的最大窗口数量(可以称为window_query_max)以及同步处理权限锁的最大个数(可以称为window_mutex_num),然后从编号0开始,每隔同步处理权限锁对应的最大窗口数量个时间窗口,锁编号加1,最大的锁编号是同步处理权限锁的最大个数-1,达到最大值后返回0重新开始。
表4为加锁后的窗口数据表,继续以表3为例,可以每两个时间窗口分配一同步处理权限锁,并为每个同步处理权限锁分配一锁编号。
表4加锁后的窗口数据表
时间窗口编号 锁编号 开始时间 结束时间
0 0 0 20211226 10:09:10.561
1 0 20211226 10:09:10.561 20211226 10:09:10.716
2 1 20211226 10:09:10.716 20211226 10:09:10.720
通过为一个同步处理权限锁分配至少两个连续的时间窗口,可以在获取到一个同步处理权限锁后,同时获取到多个窗口数据,然后通过多个时间窗口中的最小和最大时间(即最小开始时间和最大结束时间),一次操作尽可能多的取出待处理数据,不仅有效保证了数据在时间上的连续性,还有效减少与数据库的交互,提升数据处理效率。
另外,在为时间窗口分配完成同步处理权限锁之后,即可以根据分配的同步处理权限锁来获取相对应的待处理数据,并将获取的待处理数据以及对应的窗口数据等封装为数据块(也可称为chunk),并将封装好的数据块提交到数据处理线程中,进而实现相关的目标业务。
例如,当前线程获取到了0号同步处理权限锁(即目标同步处理权限锁为0号锁),则可以从所有时间窗口中获取0号同步处理权限锁对应的窗口数据(获取到的窗口数据为至少两个时间窗口对应的数据),然后根据至少两个窗口数据中包含的第二开始时间以及第二结束时间从待处理数据中提取数据,并根据提取到的数据实现相关的目标业务。其中,每个窗口对应的待处理数据为按数据生成时间的先后顺序排列的,且每个窗口数据中包含一边界时间,边界时间可以对应一开始时间以及一结束时间,开始时间为窗口数据对应的窗口中包含的第一个待处理数据的数据生成时间,结束时间为窗口数据对应的窗口中包含的最后一个待处理数据的数据生成时间,第二开始时间为至少两个窗口数据对应的窗口包含的边界时间中最早的开始时间,第二结束时间为至少两个第一目标窗口数据对应的窗口 包含的边界时间中最晚的结束时间。
此外,本申请中的数据处理方式可以分为两种:分别为同步处理模式(也可称为MainRoad)和异步处理模式(也可称为SideTrack)。为了提高待处理数据的处理效率,在根据不同时间窗口对应的窗口数据实现目标业务时,可以采用同步处理模式,而在处理各种异常场景时,可以采用异步处理方式,示例性的,异常场景可以为窗口处理失败重拉、业务数据处理失败重拉、延迟到达数据的处理和窗口心跳丢失重拉等。另外,为了避免同步处理模式和异步处理模式的相互影响,可以将同步处理模式和异步处理模式对应的数据进行资源隔离。
示例性的,在同步处理模式下,数据处理线程可以为DataProcessPoolService,可以预设的锁计数器确定可获取的锁编号,然后获取与锁编号对应的同步处理权限锁,并根据获取到的同步处理权限锁获取到可处理的时间窗口数据及对应的待处理数据,并封装为数据块,同时可以将时间窗口数据对应的窗口的状态设置为处理中状态,并释放同步处理权限锁,同时锁计数器自增,然后可以将封装好的数据块提交到DataProcessPoolService中进行处理,实现目标业务。其中,数据块可以作为一个独立的数据处理单元,包括了窗口数据、时间窗口对应的待处理数据数据,以及数据块的类型(如,同步处理、异步处理、业务失败重拉、延迟到达数据重拉和窗口丢失心跳重拉,分别对应到MainRoad模式和SideTrack模式的各种处理),通过数据块的封装,不仅可以提供后续处理的完整数据(待处理数据及窗口数据等),还可以根据不同的类型进行不同的处理(例如同步处理和异步处理的不同处理逻辑),提高了数据处理的效率与安全性。
在另一实施例中,所述方法还可以包括:在得到异步处理权限锁之后,获取满足预设条件的异常待处理数据。
异步执行所述异常待处理数据,并根据所述异常待处理数据的处理结果更新所述异常待处理数据对应的窗口数据。
在本实施例中,在执行新提取的待处理数据来实现目标业务时,可能出现待处理数据处理异常等情况,为了保证其他数据的正常处理,且保证业务的正常运行,可以采用异常处理异常数据的方式来处理异常情况。
进一步的,异常情况可以有多种,针对不同的异常情况,可以采用不同的处理方式进行处理。
可选的,可能存在时间窗口处理失败的情况,在该情况下,所述获取满足预设条件的异常待处理数据,可以包括:
获取处理失败的时间窗口对应的第二目标窗口数据,其中,所述处理失败的时间窗口为第一预设时长内同步处理次数大于第一预设次数阈值,且异步处理次数小于第二预设次数阈值的时间窗口。
根据所述第二目标窗口数据中包含的第三开始时间以及第三结束时间从所述待处理数据表中提取异常待处理数据,其中,所述第三开始时间为所述第二目标窗口数据中包含的最早的开始时间,所述第三结束时间为所述第二目标窗口数据中包含的最晚的结束时间。
具体的,在同步对待处理数据进行处理时,可能出现封装的数据块处理失败的情况,则可以对出现异常的数据块重新进行处理(其中,重试最大次数可自定义进行配置),当重试达到最大次数时,可以将该数据块对应的时间窗口的状态设置为处理失败状态。然后 可以在获取到特定的锁(例如,窗口处理失败专用锁)之后,获取状态为处理失败状态的时间窗口对应的第二目标窗口数据。其中,可以时间限定获取的窗口数据的范围,例如可以获取第一预设时长(可以为3-5天中的任意值)内处理失败的时间窗口。此外,也需要保证窗口的同步处理次数已经达到最大次数(即第一预设次数阈值)和异步处理次数小于最大处理次数(即第二预设次数阈值),然后可以更新获取到的第二目标窗口数据对应的时间窗口的窗口状态为处理中状态,并释放锁。再根据获取到的第二目标窗口数据中包含的第三开始时间以及第三结束时间从待处理数据表中提取异常待处理数据,并封装为窗口异常数据块(也可称为chunk,可以包括第二目标窗口数据和第二目标窗口数据对应的异常待处理数据)。再将封装好的窗口异常数据块传递到处理任务中,并将任务提交到任务处理线程(如DataProcessPoolService)中等待执行。
可选的,还可能存在时间窗口内的部分待处理数据处理失败的情况,在该情况下待处理数据中包含数据状态,则所述获取满足预设条件的异常待处理数据,可以包括:
从所述待处理数据中获取数据生成时间在第二预设时长内,且数据状态为处理失败状态的待处理数据,得到异常待处理数据。
具体的,还可以根据业务层级口(如StreamedProcessor.process)的返回结果进行:
若无返回结果,则表示任务窗口内的源数据全部处理成功,调用预设接口(如TiDB中的batchUpdate接口)更新所有待处理数据的数据状态为处理成功状态(示例性的,可以将对应字段的值更新为9);
若有返回结果,则返回结果对应的数据全部处理失败,调用预设接口(如TiDB中的batchUpdate接口)更新数据的处理状态为处理失败状态(示例性的,可以将对应字段的值每次+1),同时对时间窗口内的未返回的数据更新状态为处理成功状态(即将对应字段的值更新为9)。
无论返回结果如何,都更新时间窗口的处理状态为处理成功状态(值为S)。即使在有业务数据处理失败的情况下,也可以将时间窗口的状态更新为处理成功状态,此时失败的待处理数据的处理状态为非成功的状态(9为成功,失败为非0且小于9的值),后续可以通过SideTrack模式任务处理(业务数据处理失败重新处理)。
在通过SideTrack模式对处理失败的异常待处理数据重新处理时,可以针对时间窗口内的部分处理失败数据重新进行处理。对应的额,可以获取特定的锁(实例性的,可以为业务数据处理失败专用锁),在获取到业务数据处理失败专用锁之后,可以提取处理失败的异常待处理数据数据,示例性的,每个待处理数据都有处理状态,0是待处理状态,1-8为失败时重试的次数,9为成功状态(可自定义配置)。在这个前提下,只需要根据处理状态来筛选处理失败的数据即可,即数据状态中包含1-8任一数字的数据即可。此外,还可以通过时间来限制查询范围,例如,可以获取数据生成时间在第二预设时长内的。然后可以封装为窗口异常数据块(也可称为chunk,可以包括处理失败的数据)。再将封装好的窗口异常数据块传递到处理任务中,并将任务提交到任务处理线程(如DataProcessPoolService)中等待执行。
此外,在将任务提交到任务处理线程(如DataProcessPoolService)中执行执行完成后,可以只更新异常待处理数据的处理状态:若成功则将待处理数据的状态设置为9;若失败则将待处理数据的状态每次加1,直到处理成功。若达到最大重试次数后仍然失败,则可 以在经过运维人员人工确认后,修改状态,程序会自动执行前述过程。
可选的,还可能存在有延迟到达的异常待处理数据的情况,在该情况下窗口数据中包含窗口状态,则所述获取满足预设条件的异常待处理数据,具体可以包括:
从所述待处理数据中获取数据生成时间满足延迟时间段的待处理数据,得到异常待处理数据,其中,所述延迟时间段的开始时间为通过当前时间与第二预设延迟时长确定的,所述延迟时间段的结束时间为窗口状态为待处理状态、处理中状态以及处理失败状态对应的时间窗口的最早时间。
具体的,由于本申请为支持流式处理的准实时数据处理方式,因此需要支持对延迟到达的异常待处理数据的处理。对应的,可以先获取特定的锁(示例性的,可以为数据延迟达到专用锁),然后获取延迟达到的异常待处理数据。在获取延迟到达的异常待处理数据时,可以先确定延迟时间段,然后再获取满足延迟时间段的异常待处理数据。一般情况下,待处理数据的延迟时间一般不会超过3天,因此,可以将延迟时间段的开始时间设置为从当前时间往前推3天。如果时间窗口的状态为待处理状态或处理中状态,那么说明时间窗口还未被处理或还未被处理完,则该时间窗口范围内的待处理数据都不能算延迟到达,也没必要单独处理,由同步任务统一处理即可。如果时间窗口的状态为处理失败状态,也可以由异步任务把时间窗口重新拉起进行处理,因此可以取窗口状态为处理失败状态、处理中状态或者待处理状态的时间窗口中,最小的时间窗口的开始时间作为数据延迟达到的结束时间。然后可以将获取到的异常待处理数据封装为一个chunk(主要是延迟达到的数据)再将封装好的chunk传递到处理任务中,并将任务提交到任务处理线程(如DataProcessPoolService)中等待执行,并在执行结束后释放锁。
此外,任务处理线程执行完成后,可以只更新异常待处理数据的处理状态:若成功则将待处理数据的状态设置为9;若失败则将待处理数据的状态每次加1,直到处理成功。若达到最大重试次数后仍然失败,则可以在经过运维人员人工确认后,修改状态,程序会自动执行前述过程。
可选的,还可能包含窗口处理超时的情况,在该情况下窗口数据中包含窗口状态,则所述获取满足预设条件的异常待处理数据,具体可以包括:
确定所述窗口状态为处理中状态,且处于处理中状态的时长超过预设时长阈值的异常时间窗口。
将所述异常时间窗口对应的同步处理次数和异步处理次数更新为零,其中,所述同步处理次数与所述异步处理次数存储于异常窗口数据中。
根据所述异常窗口数据中包含的第四开始时间以及第四结束时间从所述待处理数据中提取异常待处理数据,其中,所述第四开始时间为所述异常窗口数据中包含的最早的开始时间,所述第四结束时间为所述异常窗口数据中包含的最晚的结束时间。
具体的,对于任一时间窗口,如果在处理过程中任务中断,会导致窗口一直处于处理中状态,则需要重新处理丢失心跳的时间窗口。对应的,可以先获取特定的锁(示例性的,可以为丢失心跳专用锁),若取不到锁则可以生成异常提示。然后可以获取提取丢失心跳的异常时间窗口,即窗口状态长时间处于处理中状态,且处于处理中状态的时长超过预设时长阈值的异常时间窗口,例如10分钟前就开始处理的窗口还是处理中状态,这个时间可以根据经验自定义配置。然后可以更新异常时间窗口的同步和异步处理次数为0,并运 行后续重新处理,同时释放锁。还可以根据异常时间窗口对应的异常窗口数据中的边界时间(即第四开始时间以及第四结束时间)从待处理数据中获取异常待处理数据,并封装为一个chunk(其中,可以包括了窗口数据和窗口内的待处理数据),并将封装好的chunk传递到处理任务中,并将任务提交到任务处理线程(如DataProcessPoolService)中等待执行,并在执行结束后释放锁。
综上,针对各种异常场景都可以支持异步任务对部分异常待处理数据的自动拉起重试,即在正常的窗口数据在同步任务中执行,当有部分数据执行异常时,针对异常数据标记为失败,然后会有异步任务针对这些异常数据自动进行重试,提高了数据处理效率,准实时性好,且仅针对处理失败的部分数据重新进行处理,处理成功的数据不需要重新处理,降低了数据的处理量,更进一步的提高了数据处理效率。
图4为本申请实施例提供的数据处理方法的原理示意图,如图4所示,在该实施例中,数据库可以为TDSQL数据库,TDSQL数据库中包含待处理数据,待处理数据可以通过DM同步至TiDB,终端设备中的分窗线程(示例性的,可以为WindowDivisionThread)可以先获取分窗专用锁(即数据划分权限锁),然后读取历史分窗数据中的第一结束时间,并根据第一结束时间对待处理数据进行分窗处理,得到新的窗口数据,再释放分窗专用锁。窗口数据处理线程(示例性的,可以为WindowedProcessorThread)可以先获取窗口提取锁(也可称为同步处理权限锁),然后可以确定待处理的目标时间窗口对应的窗口数据,并释放窗口提取锁,然后根据目标时间窗口对应的窗口数据确定目标待处理数据,并将目标时间窗口对应的窗口数据与目标待处理数据封装为数据块(也可称为Chunks),并将封装好的数据块发送至数据处理线程池(也可称为DataProcessPoolService),数据处理线程池DataProcessPoolService调度数据处理任务StreamedProcessorRunner,数据处理任务StreamedProcessorRunner调用对应的业务逻辑(由业务开发人员实现的具体处理逻辑)来处理时间窗口内的待处理数据数据,并根据处理结果执行相关的后续操作,如任务失败重试、数据处理状态更新等。
综上,本申请基于TiDB的流式处理架构,可以支持金融级数据的准实时处理,另外业务开发人员只需要实现业务处理相关接口StreamedProcessor,其余功能可以全部基于流式处理架构层实现:对接数据源、任务调度、可靠性保证等,降低了业务开发人员的工作量。
基于同样的思路,本说明书实施例还提供了上述方法对应的装置,图5为本申请实施例提供的数据处理装置的结构示意图,如图5所示,本实施例提供的装置,可以包括:
获取模块501,用于实时获取目标业务对应的待处理数据,其中,所述待处理数据中包含数据生成时间。
处理模块502,用于根据所述数据生成时间将所述待处理数据划分至不同的时间窗口下,并根据划分结果得到不同时间窗口对应的窗口数据,其中,每个时间窗口的窗口数据中包含一边界时间,每个边界时间包含一开始时间和一结束时间,所述开始时间为分配至所述时间窗口下的待处理数据中包含的最早生成时间,所述结束时间为分配至所述时间窗口下的待处理数据中包含的最晚生成时间。
在本实施例中,所述处理模块502,还用于:
在获取到数据划分权限锁时,获取历史窗口数据,其中,所述历史窗口数据中包含第 一结束时间,所述历史窗口数据为至少一窗口状态为已执行状态的时间窗口对应的数据,所述第一结束时间为所述历史窗口数据中包含的边界时间中最晚的结束时间。
根据所述第一结束时间以及目标时间对所述待处理数据进行提取处理,得到初始目标待处理数据,其中,所述目标时间为根据当前时间与第一预设延迟时长确定的。
按数据生成时间依次从所述初始目标待处理数据中提取与目标个数相对应的目标待处理数据,其中,所述目标个数为预设个数阈值与预设时间窗口阈值的乘积。
构建与所述时间窗口阈值的数量对应的时间窗口,并基于所述目标待处理数据的数据生成时间以及所述个数阈值,将所述目标待处理数据划分至与所述时间窗口阈值的数量对应的时间窗口下,并根据划分结果得到不同时间窗口对应的窗口数据。
进一步的,所述处理模块502,还用于:
针对任一目标时间窗口,若所述目标时间窗口的结束时间对应多个数据生成时间相同的目标待处理数据,则将所述多个数据生成时间相同的目标待处理数据分配至所述目标时间窗口下。
所述处理模块502,还用于根据所述不同时间窗口对应的窗口数据实现所述目标业务。
此外,在另一实施例中,每个同步处理权限锁对应的窗口数据为至少两个,则所述处理模块502,还用于:
在得到目标同步处理权限锁之后,获取与所述目标同步处理权限锁对应的至少两个第一目标窗口数据。
根据所述至少两个第一目标窗口数据中包含的第二开始时间以及第二结束时间从所述待处理数据中提取数据,其中,所述第二开始时间为所述至少两个第一目标窗口数据中包含的最早的开始时间,所述第二结束时间为所述至少两个第一目标窗口数据中包含的最晚的结束时间。
同步执行新提取的待处理数据,并根据所述新提取的待处理数据的执行结果更新所述至少两个第一目标窗口数据。
此外,在另一实施例中,所述处理模块502,还用于:
在得到异步处理权限锁之后,获取满足预设条件的异常待处理数据。
异步执行所述异常待处理数据,并根据所述异常待处理数据的处理结果更新所述异常待处理数据对应的窗口数据。
在本实施例中,所述处理模块502,还用于:
获取处理失败的时间窗口对应的第二目标窗口数据,其中,所述处理失败的时间窗口为第一预设时长内同步处理次数大于第一预设次数阈值,且异步处理次数小于第二预设次数阈值的时间窗口。
根据所述第二目标窗口数据中包含的第三开始时间以及第三结束时间从所述待处理数据表中提取异常待处理数据,其中,所述第三开始时间为所述第二目标窗口数据中包含的最早的开始时间,所述第三结束时间为所述第二目标窗口数据中包含的最晚的结束时间。
在本实施例中,所述待处理数据中包含数据状态,所述处理模块502,还用于:
从所述待处理数据中获取数据生成时间在第二预设时长内,且数据状态为处理失败状态的待处理数据,得到异常待处理数据。
在本实施例中,所述窗口数据中包含窗口状态,所述处理模块502,还用于:
从所述待处理数据中获取数据生成时间满足延迟时间段的待处理数据,得到异常待处理数据,其中,所述延迟时间段的开始时间为通过当前时间与第二预设延迟时长确定的,所述延迟时间段的结束时间为窗口状态为待处理状态、处理中状态以及处理失败状态对应的时间窗口的最早时间。
在本实施例中,所述窗口数据中包含窗口状态,所述处理模块502,还用于:
确定所述窗口状态为处理中状态,且处于处理中状态的时长超过预设时长阈值的异常时间窗口。
将所述异常时间窗口对应的同步处理次数和异步处理次数更新为零,其中,所述同步处理次数与所述异步处理次数存储于异常窗口数据中。
根据所述异常窗口数据中包含的第四开始时间以及第四结束时间从所述待处理数据中提取异常待处理数据,其中,所述第四开始时间为所述异常窗口数据中包含的最早的开始时间,所述第四结束时间为所述异常窗口数据中包含的最晚的结束时间。
本申请实施例提供的装置,可以实现上述如图2所示的实施例的方法,其实现原理和技术效果类似,此处不再赘述。
图6为本申请实施例提供的电子设备的硬件结构示意图,如图6所示,本实施例提供的设备600包括:处理器601,以及与所述处理器通信连接的存储器。其中,处理器601、存储器602通过总线603连接。
在具体实现过程中,处理器601执行所述存储器602存储的计算机执行指令,使得处理器601执行上述方法实施例中的方法。
处理器601的具体实现过程可参见上述方法实施例,其实现原理和技术效果类似,本实施例此处不再赘述。
在上述的图6所示的实施例中,应理解,处理器可以是中央处理单元(英文:Central Processing Unit,简称:CPU),还可以是其他通用处理器、数字信号处理器(英文:Digital Signal Processor,简称:DSP)、专用集成电路(英文:Application Specific Integrated Circuit,简称:ASIC)等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合发明所公开的方法的步骤可以直接体现为硬件处理器执行完成,或者用处理器中的硬件及软件模块组合执行完成。
存储器可能包含高速RAM存储器,也可能还包括非易失性存储NVM,例如至少一个磁盘存储器。
总线可以是工业标准体系结构(Industry Standard Architecture,ISA)总线、外部设备互连(Peripheral Component Interconnect,PCI)总线或扩展工业标准体系结构(Extended Industry Standard Architecture,EISA)总线等。总线可以分为地址总线、数据总线、控制总线等。为便于表示,本申请附图中的总线并不限定仅有一根总线或一种类型的总线。
本申请实施例还提供一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机执行指令,当处理器执行所述计算机执行指令时,实现上述方法实施例的数据处理方法。
本申请实施例还提供一种计算机程序产品,包括计算机程序,所述计算机程序被处理器执行时,实现如上所述的数据处理方法。
上述的计算机可读存储介质,上述可读存储介质可以是由任何类型的易失性或非易失 性存储设备或者它们的组合实现,如静态随机存取存储器(SRAM),电可擦除可编程只读存储器(EEPROM),可擦除可编程只读存储器(EPROM),可编程只读存储器(PROM),只读存储器(ROM),磁存储器,快闪存储器,磁盘或光盘。可读存储介质可以是通用或专用计算机能够存取的任何可用介质。
一种示例性的可读存储介质耦合至处理器,从而使处理器能够从该可读存储介质读取信息,且可向该可读存储介质写入信息。当然,可读存储介质也可以是处理器的组成部分。处理器和可读存储介质可以位于专用集成电路(Application Specific Integrated Circuits,简称:ASIC)中。当然,处理器和可读存储介质也可以作为分立组件存在于设备中。
本领域普通技术人员可以理解:实现上述各方法实施例的全部或部分步骤可以通过程序指令相关的硬件来完成。前述的程序可以存储于一计算机可读取存储介质中。该程序在执行时,执行包括上述各方法实施例的步骤;而前述的存储介质包括:ROM、RAM、磁碟或者光盘等各种可以存储程序代码的介质。
最后应说明的是:以上各实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述各实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分或者全部技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的范围。

Claims (13)

  1. 一种数据处理方法,其特征在于,包括:
    实时获取目标业务对应的待处理数据,其中,所述待处理数据中包含数据生成时间;
    根据所述数据生成时间将所述待处理数据划分至不同的时间窗口下,并根据划分结果得到不同时间窗口对应的窗口数据,其中,每个时间窗口的窗口数据中包含一边界时间,每个边界时间包含一开始时间和一结束时间,所述开始时间为分配至所述时间窗口下的待处理数据中包含的最早生成时间,所述结束时间为分配至所述时间窗口下的待处理数据中包含的最晚生成时间;
    根据所述不同时间窗口对应的窗口数据实现所述目标业务。
  2. 根据权利要求1所述的方法,其特征在于,所述根据所述数据生成时间将所述待处理数据划分至不同的时间窗口下,并根据划分结果得到不同时间窗口对应的窗口数据,包括:
    在获取到数据划分权限锁时,获取历史窗口数据,其中,所述历史窗口数据中包含第一结束时间,所述历史窗口数据为至少一窗口状态为已执行状态的时间窗口对应的数据,所述第一结束时间为所述历史窗口数据中包含的边界时间中最晚的结束时间;
    根据所述第一结束时间以及目标时间对所述待处理数据进行提取处理,得到初始目标待处理数据,其中,所述目标时间为根据当前时间与第一预设延迟时长确定的;
    按数据生成时间依次从所述初始目标待处理数据中提取与目标个数相对应的目标待处理数据,其中,所述目标个数为预设个数阈值与预设时间窗口阈值的乘积;
    构建与所述时间窗口阈值的数量对应的时间窗口,并基于所述目标待处理数据的数据生成时间以及所述个数阈值,将所述目标待处理数据划分至与所述时间窗口阈值的数量对应的时间窗口下,并根据划分结果得到不同时间窗口对应的窗口数据。
  3. 根据权利要求2所述的方法,其特征在于,所述基于所述目标待处理数据的数据生成时间以及所述预设个数阈值,将所述目标待处理数据划分至与所述窗口阈值的个数对应的时间窗口下,包括:
    针对任一目标时间窗口,若所述目标时间窗口的结束时间对应多个数据生成时间相同的目标待处理数据,则将所述多个数据生成时间相同的目标待处理数据分配至所述目标时间窗口下。
  4. 根据权利要求1-3任一项所述的方法,其特征在于,每个同步处理权限锁对应的窗口数据为至少两个,则所述根据所述不同时间窗口对应的窗口数据实现所述目标业务,包括:
    在得到目标同步处理权限锁之后,获取与所述目标同步处理权限锁对应的至少两个第一目标窗口数据;
    根据所述至少两个第一目标窗口数据中包含的第二开始时间以及第二结束时间从所述待处理数据中提取数据,其中,所述第二开始时间为所述至少两个第一目标窗口数据中包含的最早的开始时间,所述第二结束时间为所述至少两个第一目标窗口数据中包含的最晚的结束时间;
    同步执行新提取的待处理数据,并根据所述新提取的待处理数据的执行结果更新所述至少两个第一目标窗口数据。
  5. 根据权利要求4所述的方法,其特征在于,还包括:
    在得到异步处理权限锁之后,获取满足预设条件的异常待处理数据;
    异步执行所述异常待处理数据,并根据所述异常待处理数据的处理结果更新所述 异常待处理数据对应的窗口数据。
  6. 根据权利要求5所述的方法,其特征在于,所述获取满足预设条件的异常待处理数据,包括:
    获取处理失败的时间窗口对应的第二目标窗口数据,其中,所述处理失败的时间窗口为第一预设时长内同步处理次数大于第一预设次数阈值,且异步处理次数小于第二预设次数阈值的时间窗口;
    根据所述第二目标窗口数据中包含的第三开始时间以及第三结束时间从所述待处理数据表中提取异常待处理数据,其中,所述第三开始时间为所述第二目标窗口数据中包含的最早的开始时间,所述第三结束时间为所述第二目标窗口数据中包含的最晚的结束时间。
  7. 根据权利要求5或6所述的方法,其特征在于,所述待处理数据中包含数据状态,则所述获取满足预设条件的异常待处理数据,包括:
    从所述待处理数据中获取数据生成时间在第二预设时长内,且数据状态为处理失败状态的待处理数据,得到异常待处理数据。
  8. 根据权利要求5-7任一项所述的方法,其特征在于,所述窗口数据中包含窗口状态,则所述获取满足预设条件的异常待处理数据,包括:
    从所述待处理数据中获取数据生成时间满足延迟时间段的待处理数据,得到异常待处理数据,其中,所述延迟时间段的开始时间为通过当前时间与第二预设延迟时长确定的,所述延迟时间段的结束时间为窗口状态为待处理状态、处理中状态以及处理失败状态对应的时间窗口的最早时间。
  9. 根据权利要求5-8任一项所述的方法,其特征在于,所述窗口数据中包含窗口状态,则所述获取满足预设条件的异常待处理数据,包括:
    确定所述窗口状态为处理中状态,且处于处理中状态的时长超过预设时长阈值的异常时间窗口;
    将所述异常时间窗口对应的同步处理次数和异步处理次数更新为零,其中,所述同步处理次数与所述异步处理次数存储于异常窗口数据中;
    根据所述异常窗口数据中包含的第四开始时间以及第四结束时间从所述待处理数据中提取异常待处理数据,其中,所述第四开始时间为所述异常窗口数据中包含的最早的开始时间,所述第四结束时间为所述异常窗口数据中包含的最晚的结束时间。
  10. 一种数据处理装置,其特征在于,包括:
    获取模块,用于实时获取目标业务对应的待处理数据,其中,所述待处理数据中包含数据生成时间;
    处理模块,用于根据所述数据生成时间将所述待处理数据划分至不同的时间窗口下,并根据划分结果得到不同时间窗口对应的窗口数据,其中,每个时间窗口的窗口数据中包含一边界时间,每个边界时间包含一开始时间和一结束时间,所述开始时间为分配至所述时间窗口下的待处理数据中包含的最早生成时间,所述结束时间为分配至所述时间窗口下的待处理数据中包含的最晚生成时间;
    所述处理模块,还用于根据所述不同时间窗口对应的窗口数据实现所述目标业务。
  11. 一种电子设备,其特征在于,包括处理器和存储器;其中,
    所述存储器,用于存储程序代码;
    所述处理器,用于调用所述存储器中所存储的程度代码,执行权利要求1~9中任一项所述的方法。
  12. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质中存储有指令,当该指令在计算机上运行时,使得计算机执行权利要求1~9中任一项所述的方 法。
  13. 一种计算机程序,其特征在于,包括程序代码,当计算机运行所述计算机程序时,所述程序代码执行如权利要求1~9中任一项所述的方法。
PCT/CN2022/127575 2022-05-30 2022-10-26 数据处理方法、装置及电子设备 WO2023231281A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210603396.3A CN114860846A (zh) 2022-05-30 2022-05-30 数据处理方法、装置及电子设备
CN202210603396.3 2022-05-30

Publications (1)

Publication Number Publication Date
WO2023231281A1 true WO2023231281A1 (zh) 2023-12-07

Family

ID=82640654

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/127575 WO2023231281A1 (zh) 2022-05-30 2022-10-26 数据处理方法、装置及电子设备

Country Status (2)

Country Link
CN (1) CN114860846A (zh)
WO (1) WO2023231281A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114860846A (zh) * 2022-05-30 2022-08-05 深圳前海微众银行股份有限公司 数据处理方法、装置及电子设备
CN115080156B (zh) * 2022-08-23 2022-11-11 卓望数码技术(深圳)有限公司 基于流批一体的大数据批量计算的优化计算方法及装置

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130117482A1 (en) * 2011-04-28 2013-05-09 Sandeep Jain Method and a system for polling and processing data
CN112559592A (zh) * 2020-12-11 2021-03-26 深圳前海微众银行股份有限公司 实时数据处理方法、装置及设备
CN112818183A (zh) * 2021-02-03 2021-05-18 恒安嘉新(北京)科技股份公司 一种数据合成方法、装置、计算机设备和存储介质
CN113159464A (zh) * 2021-05-26 2021-07-23 中国银行股份有限公司 一种数据处理方法、装置和服务器
CN113204387A (zh) * 2021-05-21 2021-08-03 珠海金山网络游戏科技有限公司 实时计算中数据超时的处理方法及装置
CN114860846A (zh) * 2022-05-30 2022-08-05 深圳前海微众银行股份有限公司 数据处理方法、装置及电子设备

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130117482A1 (en) * 2011-04-28 2013-05-09 Sandeep Jain Method and a system for polling and processing data
CN112559592A (zh) * 2020-12-11 2021-03-26 深圳前海微众银行股份有限公司 实时数据处理方法、装置及设备
CN112818183A (zh) * 2021-02-03 2021-05-18 恒安嘉新(北京)科技股份公司 一种数据合成方法、装置、计算机设备和存储介质
CN113204387A (zh) * 2021-05-21 2021-08-03 珠海金山网络游戏科技有限公司 实时计算中数据超时的处理方法及装置
CN113159464A (zh) * 2021-05-26 2021-07-23 中国银行股份有限公司 一种数据处理方法、装置和服务器
CN114860846A (zh) * 2022-05-30 2022-08-05 深圳前海微众银行股份有限公司 数据处理方法、装置及电子设备

Also Published As

Publication number Publication date
CN114860846A (zh) 2022-08-05

Similar Documents

Publication Publication Date Title
WO2023231281A1 (zh) 数据处理方法、装置及电子设备
EP3120261B1 (en) Dependency-aware transaction batching for data replication
WO2020238737A1 (zh) 数据库任务的处理方法、装置、电子设备及可读介质
US20170351585A1 (en) Transaction consistency query support for replicated data from recovery log to external data stores
US11334422B2 (en) System and method for data redistribution in a database
US10204012B2 (en) Impact analysis-based task redoing method, impact analysis calculation apparatus, and one-click resetting apparatus
US9037905B2 (en) Data processing failure recovery method, system and program
CN108564463B (zh) 一种银行异常交易冲正方法及系统
WO2016035189A1 (ja) ストリームデータ処理方法、ストリームデータ処理装置及び記憶媒体
US10055445B2 (en) Transaction processing method and apparatus
CN112434043B (zh) 一种数据同步方法、装置、电子设备及介质
CN113094434A (zh) 数据库同步方法、系统、装置、电子设备及介质
CN110795287A (zh) 数据恢复方法、系统、电子设备及计算机存储介质
US11914569B2 (en) Light weight redundancy tool for performing transactions
CN107644041B (zh) 保单结算处理方法和装置
US20240061710A1 (en) Resource allocation method and system after system restart and related component
US20160147612A1 (en) Method and system to avoid deadlocks during a log recovery
US11210236B2 (en) Managing global counters using local delta counters
CN114218173B (zh) 传帐类交易文件的批处理系统、处理方法、介质和设备
CN115904817A (zh) 分布式数据库并行回放方法、装置、电子设备和存储介质
CN110674214A (zh) 大数据同步方法、装置、计算机设备及存储介质
CN109710690B (zh) 一种业务驱动计算方法及系统
CN113434509A (zh) 一种增量索引的更新方法、装置、存储介质及电子设备
CN118227709A (zh) 一种数据库同步中的断点处理方法、装置、设备及介质
CN116150241A (zh) 一种数据处理方法、终端及计算机可读存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22944589

Country of ref document: EP

Kind code of ref document: A1