CN114860846A

CN114860846A - Data processing method and device and electronic equipment

Info

Publication number: CN114860846A
Application number: CN202210603396.3A
Authority: CN
Inventors: 夏柱昌; 苗青利; 刘建波
Original assignee: WeBank Co Ltd
Current assignee: WeBank Co Ltd
Priority date: 2022-05-30
Filing date: 2022-05-30
Publication date: 2022-08-05
Also published as: WO2023231281A1

Abstract

The embodiment of the application provides a data processing method, a data processing device and electronic equipment, wherein the method comprises the steps of obtaining data to be processed corresponding to a target service in real time, wherein the data to be processed comprises data generation time, dividing the data to be processed into different time windows according to the data generation time, and obtaining window data corresponding to the different time windows according to a dividing result, wherein the window data of each time window comprises a boundary time, each boundary time comprises a start time and an end time, the start time is the earliest generation time contained in the data to be processed distributed to the time windows, the end time is the latest generation time contained in the data to be processed distributed to the time windows, and the target service is realized according to the window data corresponding to the different time windows. The method and the device improve the real-time performance of data processing, improve the range of applicable service scenes of mass data, and further ensure the application experience of users.

Description

Data processing method and device and electronic equipment

Technical Field

The embodiment of the application relates to the technical field of big data, in particular to a data processing method and device and electronic equipment.

Background

With the development of computer technology, more and more technologies are applied in the financial field, the traditional financial industry is gradually changing to financial technology (Fintech), and big data technology is no exception, but higher requirements are also put forward for big data technology due to the requirements of security and real-time performance of the financial industry. The use of large data processing frameworks is becoming increasingly common in order to meet the growing demands of each financial transaction.

In the prior art, mass data can be processed in batch through a big data processing framework, so that related services are realized. When the mass data is processed in batch, the data to be processed corresponding to different services can be collected firstly, and the collected data to be processed is processed in batch after the data to be processed reaches the collection days or the number.

However, the processing method has a long time delay, reduces the real-time performance of data processing, can only be applied to services with low real-time requirements, cannot be applied to services with high real-time requirements, reduces the range of applicable service scenes for processing mass data, and further influences the application experience of users.

Disclosure of Invention

The embodiment of the application provides a data processing method and device and electronic equipment, so as to improve the real-time performance of data processing.

In a first aspect, an embodiment of the present application provides a data processing method, including:

acquiring data to be processed corresponding to a target service in real time, wherein the data to be processed comprises data generation time;

dividing the data to be processed into different time windows according to the data generation time, and obtaining window data corresponding to the different time windows according to a division result, wherein the window data of each time window comprises a boundary time, each boundary time comprises a start time and an end time, the start time is the earliest generation time contained in the data to be processed distributed to the time window, and the end time is the latest generation time contained in the data to be processed distributed to the time window;

and realizing the target service according to the window data corresponding to the different time windows.

Optionally, the dividing the data to be processed into different time windows according to the data generation time, and obtaining window data corresponding to the different time windows according to a division result includes:

when a data division permission lock is acquired, acquiring historical window data, wherein the historical window data comprises first end time, the historical window data is data corresponding to a time window with at least one window state being an executed state, and the first end time is the latest end time in boundary time contained in the historical window data;

extracting the data to be processed according to the first end time and target time to obtain initial target data to be processed, wherein the target time is determined according to current time and a first preset delay time;

extracting target data to be processed corresponding to the number of targets from the initial target data to be processed according to a data generation time sequence, wherein the number of targets is the product of a preset number threshold and a preset time window threshold;

and constructing time windows corresponding to the number of the time window thresholds, dividing the target data to be processed into the time windows corresponding to the number of the time window thresholds based on the data generation time of the target data to be processed and the number thresholds, and obtaining window data corresponding to different time windows according to the division result.

Optionally, the dividing the target data to be processed into time windows corresponding to the number of the window thresholds based on the data generation time of the target data to be processed and the preset number threshold includes:

for any target time window, if the end time of the target time window corresponds to target data to be processed with the same data generation time, distributing the target data to be processed with the same data generation time to the target time window.

Optionally, if at least two window data corresponding to each synchronization processing permission lock are provided, the implementing the target service according to the window data corresponding to the different time windows includes:

after a target synchronous processing authority lock is obtained, at least two first target window data corresponding to the target synchronous processing authority lock are obtained;

extracting data from the data to be processed according to a second start time and a second end time contained in the at least two pieces of first target window data, wherein the second start time is the earliest start time contained in the at least two pieces of first target window data, and the second end time is the latest end time contained in the at least two pieces of first target window data;

and synchronously executing the newly extracted data to be processed, and updating the at least two first target window data according to the execution result of the newly extracted data to be processed.

Optionally, the method further includes:

after the asynchronous processing authority lock is obtained, abnormal data to be processed meeting preset conditions are obtained;

and asynchronously executing the abnormal data to be processed, and updating window data corresponding to the abnormal data to be processed according to the processing result of the abnormal data to be processed.

Optionally, the acquiring abnormal data to be processed meeting the preset condition includes:

acquiring second target window data corresponding to a time window with processing failure, wherein the time window with processing failure is a time window in which the synchronous processing times are greater than a first preset time threshold value and the asynchronous processing times are less than a second preset time threshold value within a first preset time;

and extracting abnormal data to be processed from the data table to be processed according to a third starting time and a third ending time which are contained in the second target window data, wherein the third starting time is the earliest starting time contained in the second target window data, and the third ending time is the latest ending time contained in the second target window data.

Optionally, if the data to be processed includes a data state, the acquiring abnormal data to be processed meeting a preset condition includes:

and acquiring the data to be processed with the data generation time within a second preset time length and the data state being a processing failure state from the data to be processed to obtain abnormal data to be processed.

Optionally, if the window data includes a window state, the acquiring abnormal data to be processed meeting a preset condition includes:

and acquiring the data to be processed with the data generation time meeting the delay time period from the data to be processed to obtain abnormal data to be processed, wherein the starting time of the delay time period is determined by the current time and a second preset delay time period, and the ending time of the delay time period is the earliest time of a time window corresponding to the window state of the data to be processed, the processing state and the processing failure state.

determining the window state as a processing state, wherein the duration of the processing state exceeds an abnormal time window of a preset duration threshold;

updating the synchronous processing times and the asynchronous processing times corresponding to the abnormal time window to be zero, wherein the synchronous processing times and the asynchronous processing times are stored in abnormal window data;

and extracting abnormal data to be processed from the data to be processed according to a fourth starting time and a fourth ending time which are contained in the abnormal window data, wherein the fourth starting time is the earliest starting time contained in the abnormal window data, and the fourth ending time is the latest ending time contained in the abnormal window data.

In a second aspect, an embodiment of the present application provides a data processing apparatus, including:

the system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring data to be processed corresponding to a target service in real time, and the data to be processed comprises data generation time;

the processing module is used for dividing the data to be processed into different time windows according to the data generation time and obtaining window data corresponding to the different time windows according to the division result, wherein the window data of each time window comprises a boundary time, each boundary time comprises a start time and an end time, the start time is the earliest generation time contained in the data to be processed distributed to the time windows, and the end time is the latest generation time contained in the data to be processed distributed to the time windows;

the processing module is further configured to implement the target service according to the window data corresponding to the different time windows.

In a third aspect, an embodiment of the present application provides an electronic device, including: a processor, and a memory communicatively coupled to the processor;

the memory stores computer-executable instructions;

the processor executes computer-executable instructions stored by the memory to implement the data processing method as described above in the first aspect and various possible designs of the first aspect.

In a fourth aspect, embodiments of the present application provide a computer-readable storage medium, in which computer-executable instructions are stored, and when a processor executes the computer-executable instructions, the data processing method according to the first aspect and various possible designs of the first aspect is implemented.

In a fifth aspect, embodiments of the present application provide a computer program product, which includes a computer program that, when executed by a processor, implements the data processing method as described in the first aspect and various possible designs of the first aspect.

The embodiment of the application provides a data processing method, a data processing device and an electronic device, after the scheme is adopted, to-be-processed data which comprises data generation time and corresponds to a target service can be obtained in real time, then the to-be-processed data are divided into different time windows according to the data generation time, window data corresponding to different time windows are obtained, wherein the window data of each time window comprise a boundary time, each boundary time comprises a start time and an end time, the start time is the earliest generation time contained in the to-be-processed data distributed to the time windows, the end time is the latest generation time contained in the to-be-processed data distributed to the time windows, the target service is realized according to the window data corresponding to the different time windows, the to-be-processed data obtained in real time are divided into different time windows according to the data generation time of the to-be-processed data, and then, the data to be processed is processed according to the time window, so that the target service is realized, the streaming processing of the data to be processed which is obtained in real time can be realized, the data to be processed can be processed in batch without waiting for the data to be processed to reach the collection days or the collection number, the real-time performance of data processing is improved, the method can be applied to services with higher real-time requirements, the range of applicable service scenes of mass data is improved, and the application experience of users is further ensured.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without inventive exercise.

FIG. 1 is a schematic diagram of an application of a conventional big data processing framework;

fig. 2 is a schematic structural diagram of an application system of a data processing method according to an embodiment of the present application;

fig. 3 is a schematic flowchart of a data processing method according to an embodiment of the present application;

FIG. 4 is a schematic diagram illustrating a data processing method according to an embodiment of the present disclosure;

fig. 5 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present application;

fig. 6 is a schematic diagram of a hardware structure of an electronic device according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The terms "first," "second," "third," "fourth," and the like in the description and in the claims of the present application and in the above-described drawings (if any) are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are capable of including other sequential examples in addition to those illustrated or described. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

Conventional big data Processing architectures may include Streaming Processing architectures (e.g., Spark Streaming, Flink, etc.), batch Processing architectures (e.g., Hive, etc.), and Streaming batch integrated Processing architectures (e.g., Flink, etc.), involve many components, are incompatible with commonly used relational databases (e.g., MySQL, etc.), and cannot effectively support both OLAP (On-Line Analytical Processing) services and OLTP (On-Line Transaction Processing) services, especially financial technology services. In addition, relational databases such as MySQL cannot support the processing of massive data.

In this context, a new big data processing framework (e.g., TiDB) arises. At present, a new big data processing frame can provide batch processing of mass data in financial and scientific services, and can also provide services such as data query and the like after the mass data are processed. When mass data is processed in batch, because a stream data processing platform is not deployed in the current big data processing framework, the current stream processing technology cannot be used for processing, namely the quasi-real-time processing of the data cannot be realized. Generally, data to be processed corresponding to different services needs to be collected first, and the collected data to be processed is processed in batch after the data to be processed reaches the collection days or the number. For example, fig. 1 is an application schematic diagram of an existing big data processing framework, and as shown in fig. 1, an existing batch processing mode generally adopts a form of T +1 days, that is, the big data processing framework may wait for T days first, stop importing new data to be processed after all data to be processed are imported into the big data processing framework for T days, and then perform batch processing on all acquired new data to be processed for T +1 days. However, the processing mode has a long time delay, reduces the real-time performance of data processing, can only be applied to services with low real-time requirements, cannot be applied to services with high real-time requirements, reduces the range of applicable service scenes for processing mass data, and further influences the application experience of users.

In addition, there may be a task for an hour level in the prior art, that is, processing data imported from a preceding hour, so as to improve real-time performance, but the data processing latency is still high.

Based on the technical problems, the method divides the to-be-processed data acquired in real time into different time windows according to the data generation time of the to-be-processed data, then processes the to-be-processed data according to the time windows, and further realizes the mode of the target service, can realize the streaming processing of the to-be-processed data acquired in real time, and does not need to wait for the to-be-processed data to reach the collection days or the number for batch processing, thereby improving the real-time performance of data processing, being applicable to the service with higher real-time requirement, and improving the range of applicable service scenes of mass data (namely being applicable to scenes with low real-time requirement and scenes with higher real-time requirement), and further ensuring the technical effect of the application experience of users.

Fig. 2 is a schematic architecture diagram of an application system of the data processing method provided in the embodiment of the present application, and as shown in fig. 2, the application system may include: the distributed database can synchronize the data to be processed to the big data processing frame in real time, and the terminal equipment can process the data to be processed in the big data processing frame in a streaming mode, so that the real-time performance of the data to be processed is improved.

The distributed database may be an existing database, and the data to be processed in the database may be data corresponding to different services, for example, data corresponding to financial services. The big data processing frame can be TiDB, which is a fusion type distributed open pending database product supporting online transaction processing and online analysis processing at the same time, has the functions of horizontal capacity expansion or capacity contraction, real-time HTAP (namely a system for processing mixed OLTP and OLAP services at the same time) and the like, and has the characteristics of a relational database and a non-relational database. And when the distributed database synchronizes the Data to be processed to the big Data processing framework in real time, the Data synchronization can be realized through a Data synchronization tool, namely a Data Migrator (DM), provided by the TiDB. The terminal device can be a smart phone, a personal computer, a tablet, a server or a server cluster and the like.

In addition, the big data processing framework can be deployed in independent equipment and can also be deployed in terminal equipment.

The technical solution of the present application will be described in detail below with specific examples. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments.

Fig. 3 is a schematic flowchart of a data processing method according to an embodiment of the present application, where the method according to the embodiment may be executed by a terminal device. As shown in fig. 3, the method of this embodiment may include:

s301: and acquiring data to be processed corresponding to the target service in real time, wherein the data to be processed comprises data generation time.

In this embodiment, when the target service is implemented, to-be-processed data related to the target service may be acquired first. In order to improve the real-time performance of the service data processing, a streaming service data processing mode may be adopted, that is, to-be-processed data corresponding to the target service is obtained in real time, and the obtained to-be-processed data is processed in real time. The data to be processed may include basic information related to the target service, data generation time, and a data state, which is used to identify a current state of the data to be processed. For example, the data state may be a pending state, an in-process state, a process failed state, a process successful state, or the like. The default data state is a pending state (e.g., may be represented by 0), and the status of the pending data may be updated according to a specific processing procedure.

In addition, there may be one or more of the data to be processed. Optionally, when the to-be-processed data corresponding to the target service is obtained in real time, the to-be-processed data may be obtained from a distributed database corresponding to the target service. After generating the to-be-processed data corresponding to the target service, each service system may store the to-be-processed data into the distributed databases, and then different distributed databases may synchronize the to-be-processed data into the big data processing frame for storage.

S302: dividing the data to be processed into different time windows according to the data generation time, and obtaining window data corresponding to the different time windows according to the division result, wherein the window data of each time window comprises a boundary time, each boundary time comprises a start time and an end time, the start time is the earliest generation time contained in the data to be processed distributed to the time windows, and the end time is the latest generation time contained in the data to be processed distributed to the time windows.

In this embodiment, after the to-be-processed data is obtained in real time, because the to-be-processed data is data corresponding to different distributed databases and the amount of the to-be-processed data is large, if the obtained to-be-processed data is directly processed, situations such as data omission or data processing sequence errors are likely to occur. Therefore, the data to be processed can be divided into different time windows according to the data generation time, and window data corresponding to the different time windows can be obtained. The amount of the data to be processed corresponding to each time window can be set according to the practical application scene in a user-defined mode.

Further, the dividing the data to be processed into different time windows according to the data generation time, and obtaining window data corresponding to the different time windows according to a division result may specifically include:

when a data division permission lock is acquired, acquiring history window data, wherein the history window data comprises first end time, the history window data is data corresponding to a time window with at least one window state being an executed state, and the first end time is the latest end time in boundary time contained in the history window data.

And extracting the data to be processed according to the first end time and target time to obtain initial target data to be processed, wherein the target time is determined according to the current time and a first preset delay time.

And sequentially extracting target data to be processed corresponding to the target number from the initial target data to be processed according to the data generation time, wherein the target number is the product of a preset number threshold and a preset time window threshold.

And constructing time windows corresponding to the number of the time window thresholds, dividing the target data to be processed into the time windows corresponding to the number of the time window thresholds based on the data generation time of the target data to be processed and the number threshold, and obtaining window data corresponding to different time windows according to the division result.

Specifically, when the data to be processed is divided into different time windows, the data division permission lock needs to be acquired first, the process of dividing the data to be processed into the different time windows can be executed only when the data division permission lock is acquired, and after one thread acquires the data division permission lock, other threads cannot acquire the data division permission lock, so that the effect that each piece of data to be processed can only belong to one window is realized, and the correctness of data division is ensured.

In addition, the history window data is data corresponding to at least one time window in which the window state is an executed state, each time window in the executed state corresponds to a boundary time, and each boundary time may include a start time (Timestamp _ From) and an end time (Timestamp _ Till), where the start time is a data generation time of a first data in the data corresponding to the window, and the end time is a data generation time of a last data in the data corresponding to the window. And the first end time is the latest end time among the boundary times included in the history window data, i.e., the data generation time of the last data. After the data partitioning right lock is acquired, a first end time may be acquired, and then the first end time may be set as a start time of a new time window, and if it is first time windowing, the start time of the first time window may be set to 0. And then, extracting the data to be processed according to the first end time and the subsequently determined target time to obtain initial target data to be processed, wherein the target time can be determined by performing difference processing on the current time and a first preset delay time. The first preset delay time may be set according to the actual application scenario, and optionally, the first preset delay time may be determined according to the number of the initial target data to be processed in the current time period. For example, if the number of the initial target data to be processed in the current time period is small, the first preset delay time may be set to be small, or the first preset delay time may be set to be 0; if the number of the initial target data to be processed in the current time period is large, the first preset delay time length can be set to be large, and the time period corresponding to the acquired data to be processed is short, so that the number of the data to be processed acquired each time can be properly reduced, and the pressure of the terminal equipment for processing the data is further reduced.

Optionally, after the initial target to-be-processed data is obtained according to the first end time and the target time, all the initial target to-be-processed data whose generation time satisfies the time period from the first end time to the target time may be divided into different time windows. Or selecting part of the data to be processed from all the initial target data to be processed meeting the conditions according to the actual processing capacity of the terminal equipment. When part of to-be-processed data is selected from all initial target to-be-processed data meeting the conditions to be processed, the to-be-processed data can be sequentially acquired and processed according to the data generation time of the initial target to-be-processed data in chronological order, and the condition that data processing errors are caused by disordered processing orders of the to-be-processed data is avoided.

In addition, when the obtained initial target data to be processed is divided into different time windows, in order to meet the actual calculation amount of the terminal device, a preset number threshold and a preset time window threshold may be obtained first, then the target data to be processed is sequentially extracted from the initial target data to be processed according to the number threshold and the time window threshold, different time windows are constructed according to the time window threshold (for example, if the time window threshold is 3, three time windows may be constructed), and then the extracted target data to be processed is divided into different time windows. The number threshold may be a maximum length of the coding blocks contained in each window (also referred to as chunk _ size), and the time window threshold may be a maximum number of time windows that can be provided by the terminal device (also referred to as window _ divide _ max).

Optionally, after the time window threshold and the number threshold are obtained, the time window threshold and the number threshold may be multiplied to obtain a target number, and then target data to be processed corresponding to the target number is obtained from the data to be processed. In order to ensure the normal implementation of the target service, when the target data to be processed is obtained, the target data to be processed corresponding to the target number may be obtained from the initial target data to be processed according to the data generation time of the data to be processed, that is, according to the sequence of the data generation time. In addition, when the number of the initial target to-be-processed data is small, that is, when the concurrently-executed service is small, the number of the target to-be-processed data may be the same as the number of the initial target to-be-processed data, that is, all the to-be-processed data are obtained.

In addition, after the target data to be processed are divided into the time windows corresponding to the number of the time window thresholds, window data corresponding to different time windows can be obtained according to the division result, and the window data corresponding to the new window can be obtained. The window data may include a window state (the initial state may be an unprocessed state, and the subsequent state may be updated to a processed state, an executed state, and the like according to an actual operation), a boundary time of a time window (for example, a start time, an end time, and the like of each time window, target to-be-processed data in each time window are arranged in chronological order, that is, a generation time of first target to-be-processed data is the earliest, a data generation time of last target to-be-processed data is the latest, the start time of each time window is a data generation time of first target to-be-processed data allocated to the time window, and the end time of each time window is a data generation time of last target to-be-processed data allocated to the time window).

For example, if the time window threshold is 3 and the number threshold is 2, the initial target data to be processed meeting the condition may be screened from the data to be processed according to the first end time and the target time, then 3 × 2 pieces of target data to be processed are screened from the initial target data to be processed, and the data to be processed is divided according to the form of 2 pieces of target data to be processed in each time window, so as to generate 3 time windows. The starting time and the ending time of the time window can be determined according to the data generation time of the data to be processed under each time window, and then the window data of the time window is determined.

Illustratively, table 1 is an information table corresponding to initial target to-be-processed data, and in table 1, 6 pieces of initial target to-be-processed data are included, and each piece of initial target to-be-processed data includes data generation time, basic information related to a target service, and the like.

TABLE 1 information Table corresponding to initial target data to be processed

Table 2 is an information table corresponding to the target data to be processed, and in table 2, there are three time windows, and each time window includes two rows of target data to be processed.

TABLE 2 information table corresponding to target data to be processed

Table 3 is a window data table corresponding to the target data to be processed, and in table 3, there are three time windows, each corresponding to a start time and an end time.

TABLE 3 Window data sheet corresponding to target data to be processed

In summary, the data to be processed is screened first to obtain the target data to be processed satisfying the preset number, and then the screened target data to be processed is divided into different time windows, so that the processing efficiency of the data to be processed is improved, the actual calculation amount of the terminal device is also met, the condition that the terminal device fails due to reasons such as large calculation amount is avoided, and the normal implementation of the service is further ensured. And the boundary time of the time window is determined according to the generation time of the data to be processed distributed to the time window, so that the flexibility of a boundary time determination mode is improved.

In addition, the dividing the target data to be processed into time windows corresponding to the number of the window thresholds based on the data generation time of the target data to be processed and the preset number threshold may specifically include:

Specifically, when the target data to be processed is windowed, the target data to be processed is divided according to the data generation time of the target data to be processed, and there may be multiple pieces of data at the same time, so as to ensure that the target data to be processed with the same generation time (which may also be referred to as the same timestamp) are not divided into different time windows (if the target data to be processed is divided into different time windows, the data that should be processed according to the processing sequence specified by a certain service may be processed in a wrong data processing sequence, and the data may not be processed normally). Therefore, if the end time of one time window corresponds to a plurality of pieces of data to be processed, a plurality of pieces of target data to be processed can be divided into the time window instead of the next time window, and the accuracy of data processing is improved.

S303: and realizing the target service according to the window data corresponding to different time windows.

In this embodiment, after the data to be processed is divided into different time windows, window data corresponding to the different time windows may be obtained according to the division result, and then the target service may be implemented according to the window data corresponding to the different time windows.

Optionally, the window data may include a boundary time, where the boundary time may include a start time and an end time, the start time is a data generation time of first to-be-processed data included in a window corresponding to the window data, and the end time is a data generation time of last to-be-processed data included in a window corresponding to the window data, then data in a time period in which the generation time satisfies the start time and the end time may be extracted from the to-be-processed data, and the target service is implemented according to the extracted data.

After the scheme is adopted, the data to be processed corresponding to the target service and containing the data generation time can be obtained in real time, then the data to be processed is divided into different time windows according to the data generation time to obtain the window data corresponding to different time windows, then the target service is realized according to the window data corresponding to different time windows, the data to be processed obtained in real time is divided into different time windows according to the data generation time of the data to be processed, then the data to be processed is processed according to the time windows, and further the target service is realized, the data to be processed obtained in real time can be processed in a streaming mode, the data to be processed can be processed in batch without waiting for the number of days or days after the data to be processed reaches the collection number, the real-time performance of data processing is improved, the method can be applied to the service with higher real-time requirement, and the range of the service scene where the mass data can be applied is improved, and further, the application experience of the user is guaranteed.

Based on the method of fig. 2, the present specification also provides some specific embodiments of the method, which are described below.

In another embodiment, if at least two window data corresponding to each synchronization processing permission lock are provided, the implementing the target service according to the window data corresponding to the different time windows includes:

and after the target synchronous processing authority lock is obtained, acquiring at least two first target window data corresponding to the target synchronous processing authority lock.

And extracting data from the data to be processed according to a second start time and a second end time contained in the at least two pieces of first target window data, wherein the second start time is the earliest start time contained in the at least two pieces of first target window data, and the second end time is the latest end time contained in the at least two pieces of first target window data.

In this embodiment, the existing synchronization processing permission lock is in the form of a coarse-grained lock, for example, a permission lock is allocated to a whole window of data, so that it is ensured that each task can only acquire one window to perform subsequent processing. However, the streaming processing procedure cannot be realized by the conventional processing scheme because the amount of processed data is large and the near real-time property needs to be ensured. Therefore, concurrence can be increased by adopting a fine-grained lock form, that is, an authority lock can be allocated to part of data (for example, a row of data) in window data, that is, one window data can be divided into a plurality of lock granularities according to actual needs, each fine-grained lock can also correspond to at least two time windows, and after a synchronous processing authority lock (namely, a fine-grained lock) is obtained, data required in a plurality of time window ranges can be read at one time, so that the number of data reading times is reduced.

Optionally, the preset maximum window number (which may be referred to as window _ query _ max) corresponding to each synchronization processing permission lock and the preset maximum number (which may be referred to as window _ mutex _ num) of the synchronization processing permission locks may be obtained first, then starting from number 0, every time window with the maximum window number corresponding to the synchronization processing permission lock, the lock number is increased by 1, the maximum lock number is the maximum number of the synchronization processing permission locks-1, and after the maximum number is reached, returning to 0 to restart.

Table 4 is a locked window data table, and continuing to use table 3 as an example, a synchronization processing permission lock may be allocated to every two time windows, and a lock number may be allocated to each synchronization processing permission lock.

TABLE 4 locked Window data sheet

By distributing at least two continuous time windows for one synchronous processing authority lock, multiple window data can be simultaneously acquired after one synchronous processing authority lock is acquired, and then the data to be processed is taken out as much as possible through minimum time and maximum time (namely minimum starting time and maximum ending time) in the multiple time windows in one operation, so that the time continuity of the data is effectively guaranteed, the interaction with a database is effectively reduced, and the data processing efficiency is improved.

In addition, after the synchronization processing permission lock is allocated to the time window, the corresponding to-be-processed data may be acquired according to the allocated synchronization processing permission lock, the acquired to-be-processed data and the corresponding window data and the like are encapsulated into a data block (also referred to as chunk), and the encapsulated data block is submitted to a data processing thread, so as to implement a related target service.

For example, if the current thread acquires the synchronization processing permission lock No. 0 (that is, the target synchronization processing permission lock is the lock No. 0), window data corresponding to the synchronization processing permission lock No. 0 may be acquired from all the time windows (the acquired window data is data corresponding to at least two time windows), then data is extracted from the data to be processed according to the second start time and the second end time included in the at least two window data, and the related target service is implemented according to the extracted data. The to-be-processed data corresponding to each window are arranged according to the sequence of the data generation time, each window data comprises a boundary time, the boundary time can correspond to a start time and an end time, the start time is the data generation time of the first to-be-processed data contained in the window corresponding to the window data, the end time is the data generation time of the last to-be-processed data contained in the window corresponding to the window data, the second start time is the earliest start time in the boundary times contained in the windows corresponding to at least two window data, and the second end time is the latest end time in the boundary times contained in the windows corresponding to at least two first target window data.

In addition, the data processing method in the present application can be divided into two types: synchronous processing mode (also known as MainRoad) and asynchronous processing mode (also known as SideTrack), respectively. In order to improve the processing efficiency of the data to be processed, a synchronous processing mode may be adopted when the target service is realized according to the window data corresponding to different time windows, and an asynchronous processing mode may be adopted when various abnormal scenes are processed. In addition, in order to avoid the mutual influence of the synchronous processing mode and the asynchronous processing mode, the resources of the data corresponding to the synchronous processing mode and the asynchronous processing mode can be isolated.

For example, in the synchronous processing mode, the data processing thread may be a dataprocess pool service, an available lock number may be determined by a preset lock counter, then a synchronous processing permission lock corresponding to the lock number is acquired, processable time window data and corresponding to-be-processed data are acquired according to the acquired synchronous processing permission lock and encapsulated as a data block, meanwhile, a window state corresponding to the time window data may be set as a processing state, the synchronous processing permission lock is released, the lock counter is self-incremented, and then the encapsulated data block may be submitted to the dataprocess pool service for processing, so as to implement a target service. The data block can be used as an independent data processing unit, and includes window data, data to be processed corresponding to a time window, and types of the data block (for example, synchronous processing, asynchronous processing, service failure redraw, delayed arrival data redraw, and window loss heartbeat redraw, which respectively correspond to various processing in the mainload mode and the SideTrack mode).

In another embodiment, the method may further include: and after the asynchronous processing authority lock is obtained, acquiring abnormal data to be processed meeting preset conditions.

In this embodiment, when the newly extracted to-be-processed data is executed to implement the target service, situations such as abnormal processing of the to-be-processed data may occur, and in order to ensure normal processing of other data and normal operation of the service, a mode of processing abnormal data in an abnormal manner may be used to process the abnormal situations.

Furthermore, the abnormal conditions can be various, and different processing modes can be adopted for processing different abnormal conditions.

Optionally, there may be a case that the time window processing fails, and in this case, the acquiring the abnormal data to be processed that meets the preset condition may include:

and acquiring second target window data corresponding to a time window with processing failure, wherein the time window with processing failure is a time window in which the synchronous processing times are greater than a first preset time threshold value and the asynchronous processing times are less than a second preset time threshold value within a first preset time.

And extracting abnormal data to be processed from the data table to be processed according to a third start time and a third end time which are contained in the second target window data, wherein the third start time is the earliest start time contained in the second target window data, and the third end time is the latest end time contained in the second target window data.

Specifically, when processing data to be processed synchronously, if a situation that processing of an encapsulated data block may fail occurs, the data block with an abnormal occurrence may be processed again (where the maximum retry number may be configured by self-definition), and when the retry number reaches the maximum retry number, the state of the time window corresponding to the data block may be set as a processing failure state. Then, after acquiring a specific lock (e.g., a lock dedicated to window processing failure), second target window data corresponding to a time window in a processing failure state may be acquired. The range of the acquired window data may be limited in time, for example, a time window in which processing fails in a first preset time period (which may be any value of 3 to 5 days) may be acquired. In addition, it is also required to ensure that the number of times of synchronous processing of the window has reached the maximum number of times (i.e., a first preset number threshold) and the number of times of asynchronous processing of the window is smaller than the maximum number of times of processing (i.e., a second preset number threshold), and then the window state of the time window corresponding to the acquired second target window data may be updated to the in-processing state, and the lock is released. And then extracting abnormal data to be processed from the data table to be processed according to a third start time and a third end time included in the acquired second target window data, and packaging the abnormal data to be processed into a window abnormal data block (which may also be called a chunk and may include the second target window data and the abnormal data to be processed corresponding to the second target window data). And transmitting the packaged window abnormal data block to a processing task, submitting the task to a task processing thread (such as a DataProcesspool service) and waiting for execution.

Optionally, if there may be a case that processing of part of the data to be processed in the time window fails, where the data to be processed includes a data state, the acquiring of the abnormal data to be processed meeting the preset condition may include:

Specifically, the following steps can be performed according to a return result of a service level port (such as streamprocess).

If no returned result exists, it indicates that all source data in the task window are successfully processed, and a preset interface (for example, a batchUpdate interface in the TiDB) is called to update the data state of all data to be processed to be a processing success state (for example, the value of the corresponding field may be updated to 9);

if the returned result exists, all the data corresponding to the returned result is failed to be processed, a preset interface (for example, a batchUpdate interface in the TiDB) is called to update the processing state of the data to be a processing failure state (for example, the value of the corresponding field may be +1 at a time), and meanwhile, the updating state of the data which is not returned in the time window is a processing success state (that is, the value of the corresponding field is updated to be 9).

Regardless of the returned result, the processing status of the time window is updated to the processing success status (value S). Even if there is a failure in processing the service data, the state of the time window may be updated to a processing success state, where the processing state of the failed data to be processed is an unsuccessful state (9 is a success, and the failure is a value other than 0 and less than 9), and the data to be processed may be processed by a SideTrack mode task (i.e., a re-processing of the failure in processing the service data).

When the abnormal data to be processed which fails to be processed is reprocessed through the SideTrack mode, the data which fails to be processed in part in the time window can be reprocessed. Correspondingly, a specific lock (for example, a special lock for business data processing failure) may be acquired, and after the special lock for business data processing failure is acquired, abnormal data to be processed that fails to be processed may be extracted, where each data to be processed illustratively has a processing status, 0 is a pending status, 1-8 is the number of retries in failure, and 9 is a success status (configurable by self). On this premise, only data failing to be processed need to be screened according to the processing state, that is, data with any number of 1-8 in the data state. In addition, the query range may be limited by time, for example, the data generation time may be acquired within a second preset time length. And then may be packaged as a window exception data block (also referred to as chunk, which may include data that failed processing). And then, the packaged window abnormal data block is transmitted to a processing task, and the task is submitted to a task processing thread (such as DataProcessPoloolService) to wait for execution.

In addition, after the task is submitted to a task processing thread (such as dataprocesspool service) to be executed, the processing state of the abnormal data to be processed can be updated only: if the data processing is successful, setting the state of the data to be processed to be 9; and if the data fails, adding 1 to the state of the data to be processed each time until the processing is successful. If the program still fails after reaching the maximum retry number, the state can be modified after the manual confirmation of the operation and maintenance personnel, and the program can automatically execute the process.

Optionally, there may also be a case of delaying arrival of abnormal to-be-processed data, where the window data includes a window state in the case, and the acquiring of the abnormal to-be-processed data meeting the preset condition specifically may include:

Specifically, the application is a quasi-real-time data processing mode supporting streaming processing, and therefore needs to support processing of the abnormal data to be processed which is delayed to arrive. Correspondingly, a particular lock may be acquired first (illustratively, a dedicated lock may be reached for data latency), and then the exception pending data that is reached for latency may be acquired. When the data to be processed with the exception and delayed in arrival is acquired, the delay time period may be determined first, and then the data to be processed with the exception and delayed in arrival may be acquired. In general, the delay time of the data to be processed does not generally exceed 3 days, and therefore, the start time of the delay time period may be set to be advanced by 3 days from the current time. If the state of the time window is a to-be-processed state or a processing state, it indicates that the time window has not been processed or has not been processed, and the to-be-processed data in the time window range cannot arrive without delay, and does not need to be processed separately, and the to-be-processed data can be processed uniformly by the synchronization task. If the state of the time window is the processing failure state, the asynchronous task can also pull up the time window again for processing, so that the starting time of the minimum time window in the time window with the window state being the processing failure state, the processing state or the to-be-processed state can be taken as the ending time of the data delay. Then, the obtained exception to-be-processed data may be encapsulated into a chunk (mainly data that is delayed to be reached), the encapsulated chunk is then transferred to a processing task, the task is submitted to a task processing thread (for example, dataprocesspool service) for waiting to be executed, and the lock is released after the execution is finished.

In addition, after the execution of the task processing thread is completed, only the processing state of the abnormal data to be processed can be updated: if the data processing is successful, setting the state of the data to be processed to be 9; and if the data fails, adding 1 to the state of the data to be processed each time until the processing is successful. If the program still fails after reaching the maximum retry number, the state can be modified after the manual confirmation of the operation and maintenance personnel, and the program can automatically execute the process.

Optionally, the method may further include a case that the window processing is overtime, where the window data includes a window state, and the acquiring the abnormal data to be processed meeting the preset condition specifically may include:

and determining the window state as a processing state, wherein the duration of the processing state exceeds an abnormal time window of a preset duration threshold.

And updating the synchronous processing times and the asynchronous processing times corresponding to the abnormal time window to be zero, wherein the synchronous processing times and the asynchronous processing times are stored in abnormal window data.

Specifically, for any time window, if a task is interrupted during processing, which may cause the window to be in a processing state all the time, the time window with the missing heartbeat needs to be processed again. Correspondingly, a specific lock (for example, a lock dedicated for losing a heartbeat) may be acquired first, and if the lock cannot be acquired, an exception prompt may be generated. Then, an abnormal time window for extracting the missing heartbeat, that is, an abnormal time window in which the window state is in the processing state for a long time and the duration in the processing state exceeds a preset duration threshold, for example, a window in which processing is started before 10 minutes or the processing state, may be obtained, and this time may be configured by self-defining according to experience. The number of synchronous and asynchronous processes of the exception time window may then be updated to 0 and subsequent re-processing may be run while the lock is released. The abnormal data to be processed may also be obtained from the data to be processed according to the boundary time (i.e., the fourth start time and the fourth end time) in the abnormal window data corresponding to the abnormal time window, and encapsulated as a chunk (where the window data and the data to be processed in the window may be included), and the encapsulated chunk is transmitted to the processing task, and the task is submitted to a task processing thread (e.g., dataprocesssnoop service) for waiting for execution, and the lock is released after the execution is finished.

In summary, automatic pull-up retry of part of abnormal data to be processed by asynchronous tasks can be supported for various abnormal scenes, namely, the asynchronous tasks are executed in normal window data in synchronous tasks, when part of data is abnormal, the abnormal data is marked as failure, then the asynchronous tasks automatically retry the abnormal data, the data processing efficiency is improved, the quasi-real-time performance is good, only the part of data which is failed to be processed is reprocessed, the data which is successfully processed does not need to be reprocessed, the data processing amount is reduced, and the data processing efficiency is further improved.

Fig. 4 is a schematic diagram of a principle of a data processing method provided in an embodiment of the present application, as shown in fig. 4, in this embodiment, the database may be a TDSQL database, the TDSQL database includes data to be processed, the data to be processed may be synchronized to TiDB through DM, and a windowing thread (for example, windowdivision thread) in the terminal device may first acquire a special lock for windowing (i.e., a data division permission lock), then read a first end time in the historical windowing data, perform windowing processing on the data to be processed according to the first end time, obtain new window data, and then release the special lock for windowing. A window data processing thread (for example, windowprocessorthread) may obtain a window extraction lock (also referred to as a synchronous processing permission lock), determine window data corresponding to a target time window to be processed, release the window extraction lock, determine target data to be processed according to the window data corresponding to the target time window, encapsulate the window data corresponding to the target time window and the target data to be processed into data blocks (also referred to as Chunks), and send the encapsulated data blocks to a data processing thread pool (also referred to as dataprocesspool service), where the data processing thread pool dataprocessworktask schedules a processprocessorrunner, and the data processing task streamprocessorrunner calls a corresponding service logic (a specific processing logic implemented by a service developer) to process the data to be processed in the target time window, and executes related subsequent operations according to processing results, such as task failure retry, retry, and related subsequent operations, such as task failure, Data processing status updates, etc.

To sum up, the streaming processing architecture based on the TiDB can support the quasi-real-time processing of financial data, in addition, business developers only need to realize a business processing related interface Streamedprocessor, and other functions can be realized based on a streaming processing architecture layer: data sources are butted, task scheduling, reliability guarantee and the like, and the workload of service developers is reduced.

Based on the same idea, an embodiment of this specification further provides a device corresponding to the foregoing method, and fig. 5 is a schematic structural diagram of a data processing device provided in the embodiment of this application, as shown in fig. 5, the device provided in this embodiment may include:

the obtaining module 501 is configured to obtain data to be processed corresponding to a target service in real time, where the data to be processed includes data generation time.

The processing module 502 is configured to divide the data to be processed into different time windows according to the data generation time, and obtain window data corresponding to the different time windows according to a division result, where the window data of each time window includes a boundary time, each boundary time includes a start time and an end time, the start time is an earliest generation time included in the data to be processed allocated to the time window, and the end time is a latest generation time included in the data to be processed allocated to the time window.

In this embodiment, the processing module 502 is further configured to:

Further, the processing module 502 is further configured to:

The processing module 502 is further configured to implement the target service according to the window data corresponding to the different time windows.

In addition, in another embodiment, if there are at least two window data corresponding to each synchronization processing permission lock, the processing module 502 is further configured to:

Moreover, in another embodiment, the processing module 502 is further configured to:

and after the asynchronous processing authority lock is obtained, acquiring abnormal data to be processed meeting preset conditions.

In this embodiment, the processing module 502 is further configured to:

In this embodiment, the data to be processed includes a data state, and the processing module 502 is further configured to:

In this embodiment, the window data includes a window state, and the processing module 502 is further configured to:

The apparatus provided in the embodiment of the present application can implement the method of the embodiment shown in fig. 2, and the implementation principle and the technical effect are similar, which are not described herein again.

Fig. 6 is a schematic diagram of a hardware structure of an electronic device according to an embodiment of the present application, and as shown in fig. 6, a device 600 according to the embodiment includes: a processor 601 and a memory communicatively coupled to the processor. The processor 601 and the memory 602 are connected by a bus 603.

In a specific implementation, the processor 601 executes the computer executable instructions stored in the memory 602, so that the processor 601 executes the method in the above method embodiment.

For a specific implementation process of the processor 601, reference may be made to the above method embodiments, which implement the principle and the technical effect similarly, and details of this embodiment are not described herein again.

In the embodiment shown in fig. 6, it should be understood that the Processor may be a Central Processing Unit (CPU), other general purpose processors, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with the present invention may be embodied directly in a hardware processor, or in a combination of the hardware and software modules within the processor.

The memory may comprise high speed RAM memory and may also include non-volatile storage NVM, such as at least one disk memory.

The bus may be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, an Extended ISA (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, the buses in the figures of the present application are not limited to only one bus or one type of bus.

The embodiment of the present application further provides a computer-readable storage medium, where a computer execution instruction is stored in the computer-readable storage medium, and when a processor executes the computer execution instruction, the data processing method of the foregoing method embodiment is implemented.

An embodiment of the present application further provides a computer program product, which includes a computer program, and when the computer program is executed by a processor, the data processing method as described above is implemented.

The computer-readable storage medium may be implemented by any type of volatile or non-volatile memory device or combination thereof, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk. Readable storage media can be any available media that can be accessed by a general purpose or special purpose computer.

An exemplary readable storage medium is coupled to the processor such the processor can read information from, and write information to, the readable storage medium. Of course, the readable storage medium may also be an integral part of the processor. The processor and the readable storage medium may reside in an Application Specific Integrated Circuits (ASIC). Of course, the processor and the readable storage medium may also reside as discrete components in the apparatus.

Those of ordinary skill in the art will understand that: all or a portion of the steps of implementing the above-described method embodiments may be performed by hardware associated with program instructions. The program may be stored in a computer-readable storage medium. When executed, the program performs steps comprising the method embodiments described above; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.

Finally, it should be noted that: the above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present application.

Claims

1. A data processing method, comprising:

2. The method according to claim 1, wherein the dividing the data to be processed into different time windows according to the data generation time and obtaining window data corresponding to the different time windows according to a division result comprises:

sequentially extracting target data to be processed corresponding to the number of targets from the initial target data to be processed according to data generation time, wherein the number of targets is the product of a preset number threshold and a preset time window threshold;

3. The method according to claim 2, wherein the dividing the target data to be processed into time windows corresponding to the number of the window thresholds based on the data generation time of the target data to be processed and the preset number threshold comprises:

4. The method according to any one of claims 1 to 3, wherein at least two window data are corresponding to each synchronization processing permission lock, and then the implementing the target service according to the window data corresponding to the different time windows includes:

5. The method of claim 4, further comprising:

6. The method according to claim 5, wherein the acquiring abnormal data to be processed meeting a preset condition comprises:

7. The method according to claim 5, wherein the data to be processed includes a data status, and the obtaining of the abnormal data to be processed that satisfies the preset condition includes:

8. The method according to claim 5, wherein the window data includes a window state, and the acquiring abnormal data to be processed that satisfies a preset condition includes:

9. The method according to claim 5, wherein the window data includes a window state, and the acquiring abnormal data to be processed that satisfies a preset condition includes:

10. A data processing apparatus, comprising:

11. An electronic device, comprising: a processor, and a memory communicatively coupled to the processor;

the memory stores computer-executable instructions;

the processor executes computer-executable instructions stored by the memory to implement the data processing method of any of claims 1-9.

12. A computer-readable storage medium having stored thereon computer-executable instructions which, when executed by a processor, implement a data processing method as claimed in any one of claims 1 to 9.

13. A computer program product comprising a computer program, characterized in that the computer program realizes the data processing method according to any one of claims 1 to 9 when executed by a processor.