CN112035481A

CN112035481A - Data processing method, data processing device, computer equipment and storage medium

Info

Publication number: CN112035481A
Application number: CN202010901630.1A
Authority: CN
Inventors: 姬华强; 钱学广
Original assignee: Ping An Property and Casualty Insurance Company of China Ltd
Current assignee: Ping An Property and Casualty Insurance Company of China Ltd
Priority date: 2020-08-31
Filing date: 2020-08-31
Publication date: 2020-12-04
Anticipated expiration: 2040-08-31
Also published as: CN112035481B

Abstract

The application relates to the field of artificial intelligence, and the data to be processed in the time division range are sequentially stored in a target database by time division of the data table to be processed, so that the data query performance and the data storage efficiency of a system are improved. In particular, the present invention relates to a data processing method, an apparatus, a computer device, and a storage medium, wherein the data processing method includes: acquiring data to be processed, and distributing the data to be processed into a data table to be processed according to a preset distribution strategy; determining time segmentation information corresponding to the data table to be processed according to the acquisition time corresponding to the data to be processed and the data amount corresponding to the data table to be processed; according to the time division information corresponding to the data table to be processed, a plurality of tasks to be executed corresponding to the data table to be processed are determined, and the plurality of tasks to be executed are executed, so that the data to be processed in the data table to be processed are stored in the target database. In addition, the application also relates to a block chain technology, and the data table to be processed can be stored in the block chain.

Description

Data processing method, data processing device, computer equipment and storage medium

Technical Field

The present application relates to the field of artificial intelligence, and in particular, to a data processing method, apparatus, computer device, and storage medium.

Background

When the existing business data system synchronizes mass business data, the business data are often required to be synchronized into a postgresql library from a big data platform, and then the business data are merged into a target database from the postgresql library.

In the process of processing the business data, the business data system cannot process all the business data at one time due to the overlarge data volume, and needs to process in batches. When the existing service data system processes service data, the following problems exist: when the business data is processed in batches, the processed business data is deleted for improving the performance, but the table space cannot be recycled by a vacuum process in a postgresql library due to frequent deletion operations, so that the performance of a business data system is continuously reduced, and the efficiency of the business data system for processing the data is influenced.

Therefore, how to improve the performance and efficiency of data processing of the business data system becomes an urgent problem to be solved.

Disclosure of Invention

The application provides a data processing method, a data processing device, computer equipment and a storage medium, wherein the data to be processed in a data table to be processed is subjected to time division, and the data to be processed in a time division range are sequentially stored in a target database, so that the data query performance and the data storage efficiency of a system are improved.

In a first aspect, the present application provides a data processing method, including:

acquiring data to be processed, and distributing the data to be processed into a data table to be processed according to a preset distribution strategy;

determining time segmentation information corresponding to the data table to be processed according to the acquisition time corresponding to the data to be processed in the data table to be processed and the data volume corresponding to the data table to be processed;

according to the time division information corresponding to the data table to be processed, determining a plurality of tasks to be executed corresponding to the data table to be processed, and executing the plurality of tasks to be executed so as to store the data to be processed in the data table to be processed into a target database.

In a second aspect, the present application further provides a data processing apparatus, comprising:

the data acquisition module is used for acquiring data to be processed and distributing the data to be processed into a data table to be processed according to a preset distribution strategy;

the time division module is used for determining time division information corresponding to the data table to be processed according to the acquisition time corresponding to the data to be processed in the data table to be processed and the data volume corresponding to the data table to be processed;

and the data storage module is used for determining a plurality of tasks to be executed corresponding to the data table to be processed according to the time division information corresponding to the data table to be processed, and executing the plurality of tasks to be executed so as to store the data to be processed in the data table to be processed into a target database.

In a third aspect, the present application further provides a computer device comprising a memory and a processor;

the memory for storing a computer program;

the processor is configured to execute the computer program and to implement the data processing method as described above when executing the computer program.

In a fourth aspect, the present application also provides a computer-readable storage medium, which stores a computer program, which, when executed by a processor, causes the processor to implement the data processing method as described above.

The application discloses a data processing method, a data processing device, computer equipment and a storage medium, wherein the data to be processed is distributed into a data table to be processed according to a preset distribution strategy, so that the data to be processed on different dates can be prevented from being mixed together, and the data query efficiency is improved; the time division information corresponding to the data table to be processed can be determined according to the acquisition time corresponding to the data to be processed in the data table to be processed and the data amount corresponding to the data table to be processed; the multiple to-be-executed tasks corresponding to the to-be-processed data table are determined according to the time division information corresponding to the to-be-processed data table, so that the to-be-executed tasks process the to-be-processed data table, and the performance of system query data can be improved; by executing a plurality of tasks to be executed, the data to be processed in the data table to be processed is stored in the target database, and the data storage efficiency of the system is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic flow chart of a data processing method provided by an embodiment of the present application;

FIG. 2 is a schematic diagram of allocating data to be processed to a data table to be processed according to an embodiment of the present application;

FIG. 3 is a schematic flow chart of the sub-steps of FIG. 1 in determining time slicing information corresponding to a pending data table;

FIG. 4 is a schematic flow chart of the sub-step of FIG. 3 in determining time partition points corresponding to the data table to be processed;

FIG. 5 is a schematic diagram of time division points in a pending data table provided by an embodiment of the present application;

FIG. 6 is a diagram illustrating a time division range corresponding to a to-be-processed data table according to an embodiment of the present application;

FIG. 7 is a schematic flow diagram of the substeps of storing the data to be processed in the target database of FIG. 1;

fig. 8 is a schematic block diagram of a data processing apparatus according to an embodiment of the present application;

fig. 9 is a schematic block diagram of a structure of a computer device according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some, but not all, embodiments of the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The flow diagrams depicted in the figures are merely illustrative and do not necessarily include all of the elements and operations/steps, nor do they necessarily have to be performed in the order depicted. For example, some operations/steps may be decomposed, combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.

It is to be understood that the terminology used in the description of the present application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in the specification of the present application and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

It should also be understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.

The embodiment of the application provides a data processing method and device, computer equipment and a storage medium. The data processing method can be applied to a service data system of a server or a terminal, time division is carried out on a data table to be processed, the data to be processed in a time division range are sequentially stored in a target database, and data query performance and data storage efficiency of the system are improved.

The server may be an independent server or a server cluster. The terminal can be an electronic device such as a smart phone, a tablet computer, a notebook computer, a desktop computer and the like.

Some embodiments of the present application will be described in detail below with reference to the accompanying drawings. The embodiments described below and the features of the embodiments can be combined with each other without conflict.

As shown in fig. 1, the data processing method includes steps S10 through S30.

And step S10, acquiring the data to be processed, and distributing the data to be processed into a data table to be processed according to a preset distribution strategy.

Illustratively, the business data system may be a policy business data system. It should be noted that, in the embodiment of the present application, the data processing method in the embodiment of the present application may be applied to data synchronization of a policy service data system, and may also be applied to other systems. For example, when synchronizing data in a policy service data system, a Hadoop technique is usually adopted to screen policy data that will expire in the last several months from mass data in a big data platform, synchronize the policy data to a postgresql database in the policy service data system, and then merge the policy data into a target database in batches from the postgresql database. For example, the target database may be a renewal list database.

It should be noted that Hadoop is an Apache open source framework written in Java, allowing distributed processing of large datasets across computer clusters using a simple programming model. Applications that work with the Hadoop framework work in an environment that provides distributed storage and computing across computer clusters. The postgresql is a powerful open-source object relational database management system, and can be operated by the postgresql system; such as creating a database, deleting a database, creating a data table, deleting a data table in postgresql, and also inserting data, querying data, updating data, deleting data, and the like in a data table.

Specifically, the data to be processed may be obtained from the big data platform according to a synchronization program. Illustratively, the synchronization program may include a Sqoop tool. It should be noted that Sqoop is a source-opening tool, and is mainly used for data transfer between Hadoop and a traditional database; data in a relational database can be imported into a Hadoop Distributed File System (HDFS), and data of the HDFS can also be imported into the relational database. Relational databases may include, but are not limited to, MySQL, Oracle, Postgresql, and the like.

In some embodiments, allocating the to-be-processed data to the to-be-processed data table according to a preset allocation policy may include: acquiring an acquisition date corresponding to the data to be processed, and numbering data tables with preset number; determining a target data table corresponding to the data to be processed according to the acquisition date corresponding to the data to be processed based on a preset corresponding relation between the acquisition date and the serial number of the data table; and storing the data to be processed into the target data table to obtain a data table to be processed corresponding to the data to be processed.

The preset allocation strategy may include storing the data to be processed into the corresponding target data table according to the acquisition date corresponding to the data to be processed based on a preset corresponding relationship between the acquisition date and the number of the data table.

It is understood that the date of acquisition refers to the date that the business data system synchronizes data from a large data platform, which may span several days to complete. Illustratively, the acquisition date may include a week value, such as monday through sunday. The preset number of data tables can comprise 7 data tables; the 7 data tables are numbered, for example, data table 1, data table 2, data table 3, data table 4, data table 5, data table 6, and data table 7.

Specifically, the acquisition date may be associated with the number of the data table. For example, a preset correspondence between the date of acquisition and the number of the data table is obtained by associating monday with the data table 1, tuesday with the data table 2, and sequentially associating the remaining week values with the number of the data table.

Specifically, based on a preset corresponding relation between the acquisition date and the serial number of the data table, the target data table corresponding to the data to be processed is determined according to the acquisition date corresponding to the data to be processed. For example, if the acquisition date corresponding to the data a to be processed is monday, the target data table corresponding to the data a to be processed may be determined to be data table 1. For example, if the acquisition date corresponding to the to-be-processed data B is tuesday, the target data table corresponding to the to-be-processed data B may be determined as the data table 2. As shown in fig. 2, fig. 2 is a schematic diagram of allocating data to be processed to a data table to be processed.

Specifically, after a target data table corresponding to the data to be processed is determined, the data to be processed is stored in the target data table, and a data table to be processed corresponding to the data to be processed is obtained. Illustratively, if it is determined that the target data table corresponding to the data a to be processed is the data table 1, the data a to be processed is stored in the data table 1, and the data table 1 to be processed corresponding to the data a to be processed is obtained. The to-be-processed data table 1 corresponding to the to-be-processed data a may further include other to-be-processed data with the same acquisition date as the to-be-processed data a. For example, the to-be-processed data table 1 includes a plurality of to-be-processed data whose acquisition date is monday.

For example, the plurality of pieces of data to be processed in the data table to be processed may be sorted according to the acquisition time. Here, the acquisition time refers to a time point within the acquisition date, for example, 0: the time range of 00-23: 59.

It should be emphasized that, in order to further ensure the privacy and security of the to-be-processed data table, the to-be-processed data table may also be stored in a node of a block chain.

The data to be processed are distributed to different data tables to be processed according to the acquisition date and stored according to the preset distribution strategy, so that the data to be processed with the same acquisition date can be stored in the same data table to be processed; therefore, the mutual influence of the data to be processed on different dates can be avoided, and the performance and the efficiency of the system for inquiring the data can be improved.

Step S20, determining time division information corresponding to the to-be-processed data table according to the acquisition time corresponding to the to-be-processed data in the to-be-processed data table and according to the data amount corresponding to the to-be-processed data table.

Illustratively, the pending data table includes a plurality of pending data. It should be noted that, in the same to-be-processed data table, the corresponding acquisition dates of different to-be-processed data are the same, but the corresponding acquisition times of different to-be-processed data are different.

Referring to fig. 3, fig. 3 is a schematic flowchart of determining time division information corresponding to the to-be-processed data table according to the acquisition time corresponding to the to-be-processed data in the to-be-processed data table and according to the data amount corresponding to the to-be-processed data table in step S20, and specifically may include the following steps S201 to S204.

Step S201, determining a start time and an end time corresponding to the to-be-processed data table according to the acquisition time corresponding to the plurality of to-be-processed data.

It should be noted that the start time refers to a time point corresponding to the to-be-processed data acquired first in the to-be-processed data table, that is, a minimum time point; the end time refers to a time point corresponding to the last acquired data to be processed in the data table to be processed, i.e., a maximum time point.

Exemplarily, if the to-be-processed data table 1 includes to-be-processed data a, to-be-processed data B, and to-be-processed data C, where the acquisition time corresponding to the to-be-processed data a is T1, the acquisition time corresponding to the to-be-processed data B is T2, the acquisition time corresponding to the to-be-processed data a is T3, and T1< T2< T3; it may be determined that the start time of table 1 to be processed is T1 and the end time is T2.

Step S202, determining a time range corresponding to the data table to be processed according to the starting time and the ending time corresponding to the data table to be processed.

For example, if the start time of the to-be-processed data table 1 is T1 and the end time is T2, it may be determined that the time range corresponding to the to-be-processed data table 1 is [ T1, T2 ].

Step S203, determining a time division point in a time range corresponding to the data table to be processed according to the data amount corresponding to the data table to be processed, and obtaining the time division point corresponding to the data table to be processed.

For example, the data amount corresponding to the data table to be processed may be determined according to the memory space occupied by the data table to be processed, or may also be determined according to the remaining space in the data table to be processed.

Referring to fig. 4, fig. 4 is a schematic flowchart of determining a time division point in a time range corresponding to the to-be-processed data table according to the data amount corresponding to the to-be-processed data table in step S203 to obtain the time division point corresponding to the to-be-processed data table, and specifically may include the following steps S2031 to S2034.

Step S2031, determining a first time division point and a first division range corresponding to the data table to be processed according to the starting time corresponding to the data table to be processed and a preset time interval.

It should be noted that the preset time interval is used to divide the time range corresponding to the data table to be processed, so as to obtain a plurality of time division ranges.

For example, if the starting time corresponding to the to-be-processed data table 1 is T1 and the preset time interval is T0, it may be determined that the first time division point corresponding to the to-be-processed data table 1 is T1+ T0 and the first division range is [ T1, T1+ T0 ]. The remaining time division points are determined from [ T1+ T0, T2 ].

Step S2032, adjusting the first time division point according to the data amount in the first division range to obtain the adjusted first time division point.

In some embodiments, adjusting the first time division point according to the data amount in the first division range to obtain an adjusted first time division point may include: acquiring the data volume of the to-be-processed data table in the first segmentation range to obtain the data volume corresponding to the first segmentation range; if the data volume corresponding to the first division range is smaller than a preset first data volume threshold value, increasing a preset time interval according to a preset increase multiplying factor, and adjusting the first time division point according to the increased time interval to obtain an adjusted first time division point; or if the data size corresponding to the first division range is larger than a preset second data size threshold, reducing the preset time interval according to the preset reduction rate, and adjusting the first time division point according to the reduced time interval to obtain the adjusted first time division point.

For example, the first data amount threshold may be represented as D1, and the second data amount threshold may be represented as D2. The preset increasing magnification may be represented as S1, and the preset reducing magnification may be represented as S2. The values of the increase magnification S1 and the decrease magnification S2 may be dynamically set according to actual conditions. For example, the increase magnification S1 may be 2, 3, or another value; the reduction magnification S2 may be 0.8, 0.5, 0.3, or the like.

Specifically, the data size of the to-be-processed data table in the first segmentation range is obtained, and the data size corresponding to the first segmentation range is obtained. Illustratively, the data amount of the table 1 to be processed in the first division range [ T1, T1+ T0] is obtained and is denoted as d, and the data amount d corresponding to the first division range [ T1, T1+ T0] is obtained.

Specifically, the relationship between the data amount corresponding to the first division range and a preset first data amount threshold is determined.

In some embodiments, if the data amount corresponding to the first division range is smaller than a preset first data amount threshold, the preset time interval is increased according to a preset increase magnification, and the first time division point is adjusted according to the increased time interval, so as to obtain an adjusted first time division point.

For example, if the data amount D corresponding to the first division range [ T1, T1+ T0] is smaller than the first data amount threshold D1, and the increase magnification S1 is 2, the preset time interval T0 is increased to 2T0, the adjusted first time division point is T1+2T0, and the adjusted first division range is [ T1, T1+2T0 ].

In other embodiments, if the data amount corresponding to the first division range is greater than the preset second data amount threshold, the preset time interval is reduced according to the preset reduction magnification, and the first time division point is adjusted according to the reduced time interval, so as to obtain the adjusted first time division point.

For example, if the data amount D corresponding to the first division range [ T1, T1+ T0] is greater than the second data amount threshold D2, and the reduction ratio S2 is 0.5, the preset time interval T0 is reduced to 0.5T0, the adjusted first time division point is T1+0.5T0, and the adjusted first division range is [ T1, T1+0.5T0 ].

Step S2033, if the adjusted first time division point is equal to or greater than the end time in the time range, stopping dividing to obtain the time division point corresponding to the data table to be processed.

For example, in the table 1 of data to be processed, if the adjusted first time division point T1+2T0 is equal to or greater than the end time T2 in the time range [ T1, T2], it indicates that the division of [ T1, T2] is not required to be continued, and at this time, the first time division point corresponding to the table 1 of data to be processed may be obtained as T2. Namely, the time ranges [ T1, T2] are all used as the corresponding division ranges of the data table 1 to be processed.

Step S2034, if the adjusted first time division point is smaller than the end time in the time range, determining the remaining time division points in the time range according to the adjusted first time division point and the preset time interval cycle, and stopping division until the obtained time division point is larger than or equal to the end time, so as to obtain a plurality of time division points corresponding to the data table to be processed.

For example, in the to-be-processed data table 1, if the adjusted first time division point T1+2T0 is smaller than the end time T2 in the time range [ T1, T2], the division of the remaining portion [ T1+2T0, T2] of the time range [ T1, T2] in the to-be-processed data table 1 may also be continued.

Specifically, the remaining time division points in the time range [ T1, T2] are determined according to the adjusted first time division point T1+2T0 and the preset time interval T0; for example, a second time division point is determined in [ T1+2T0, T2], and when the obtained time division point is greater than or equal to the end time T2, the division is stopped, and a plurality of time division points corresponding to the data table to be processed are obtained.

Exemplarily, in the data table 1 to be processed, a plurality of time division points of the data table 1 to be processed are obtained; for example, time division points a, b, and c. As shown in fig. 5, fig. 5 is a schematic diagram of time division points in a to-be-processed data table.

By dynamically adjusting and determining the time division points in the data table to be processed according to the preset time interval, the magnification increasing rate and the magnification reducing rate, the time range can be divided according to the time division points to obtain the time division range.

And step S204, segmenting the time range corresponding to the data table to be processed according to the time segmentation points to obtain a plurality of time segmentation information corresponding to the data table to be processed.

Illustratively, the time-division information may include a time-division range. It can be understood that the dividing means dividing the time range corresponding to the data table to be processed into a plurality of regions according to the time dividing points, that is, obtaining a plurality of time dividing ranges corresponding to the data table to be processed.

For example, in the to-be-processed data table 1, if the time range of the to-be-processed data table is [ T1, T2], the time division points are a, b and c; then after the division, the time division ranges corresponding to the data table 1 to be processed are [ T1, a ], [ a, b ], [ b, c ], and [ c, T2 ]. As shown in fig. 6, fig. 6 is a schematic diagram of a time division range corresponding to a data table to be processed.

By determining the time division point in the time range according to the data amount corresponding to the data table to be processed, the time division range can be dynamically expanded or reduced according to the actual data amount corresponding to the data to be processed, so that the data amount in each time division range is in a stable state, and the problem of memory overflow when the data to be processed in the time division range is processed later is avoided.

Step S30, determining a plurality of to-be-executed tasks corresponding to the to-be-processed data table according to the time division information corresponding to the to-be-processed data table, and executing the plurality of to-be-executed tasks, so as to store the to-be-processed data in the to-be-processed data table in a target database.

Referring to fig. 7, fig. 7 is a schematic flowchart illustrating the step S30 of determining a plurality of to-be-executed tasks corresponding to the to-be-processed data table according to the time division information corresponding to the to-be-processed data table, and specifically includes the following steps S301 and S302.

And S301, determining the week value corresponding to the data table to be processed.

Specifically, the week value corresponding to the data table to be processed is determined according to the week value corresponding to the data to be processed in the data table to be processed.

It can be understood that, in the same to-be-processed data table, the acquisition dates of the to-be-processed data are the same, that is, the week values corresponding to the to-be-processed data are also the same; therefore, the week value corresponding to the data table to be processed can be determined according to the week value corresponding to the data to be processed.

For example, in the to-be-processed data table 1, if the week value corresponding to the to-be-processed data is monday, the week value corresponding to the to-be-processed data table 1 is monday.

Step S302, adding the week value corresponding to the data table to be processed and a plurality of time division ranges corresponding to the data table to be processed into a task pool to obtain a plurality of tasks to be executed corresponding to the data table to be processed.

Illustratively, a task pool may include a plurality of task tables.

Specifically, a plurality of time division ranges in the data table to be processed are respectively placed into the task table in the task pool, and then the corresponding week values of the data table to be processed are added into the task table, so that a plurality of tasks to be executed corresponding to the data table to be processed are obtained. The task to be executed comprises a week value and a time division range.

Specifically, the number of time-division ranges is equal to the number of tasks to be performed. For example, if the time-division range includes four time-division ranges of [ T1, a), [ a, b), [ b, c), and [ c, T2], four to-be-executed tasks corresponding to the four time-division ranges are generated in the task pool.

Putting the week value and the time division range of the data table to be processed into a task pool to obtain a plurality of tasks to be executed; therefore, the data to be processed in the data table to be processed can be paralleled, and the data storage efficiency of the system is improved.

As shown in fig. 7, fig. 7 is a schematic flowchart of executing a plurality of tasks to be executed in step S30 to store the data to be processed in the data table to be processed in the target database, and specifically includes the following steps S303 to S305.

S303, sequentially determining a target task to be executed in the plurality of tasks to be executed by the executor, and locking the target task to be executed.

It should be noted that, in the embodiment of the present application, the to-be-executed task in the task pool may be executed by the executor. The executor may submit the task to be executed to a thread pool, and the thread pool is responsible for processing the content in the task to be executed.

Specifically, before executing a plurality of tasks to be executed, the number of the actuators needs to be determined according to the number of the tasks to be executed.

It should be noted that, in the embodiment of the present application, a plurality of to-be-executed tasks in the task pool may be executed by a plurality of executors. For example, when the number of tasks to be executed is excessive, the number of actuators may be increased. The execution efficiency can be improved by processing the tasks to be executed in the task pool in parallel through a plurality of executors. The number of the actuators can be dynamically configured according to the number of the tasks to be executed.

Illustratively, the number of actuators may be dynamically configured through an apollo configuration center. It should be noted that the apollo is an open source configuration management center developed by a portable framework department, can centrally manage and apply configurations of different environments and different clusters, can be pushed to an application end in real time after configuration modification, and has the characteristics of standard authority, flow management and the like. For example, the apollo configuration center may perform configuration modification and perform hot publishing; the client can receive the latest configuration in real time and notify the application program.

For example, if there are four tasks to be executed, the number of the actuators may be configured to be 4.

Specifically, target tasks to be executed in the multiple tasks to be executed are sequentially determined, and a table locking is performed on the target tasks to be executed. For example, the executor and the task to be executed may be numbered respectively, and the task to be executed with the same number as that of the executor may be used as the target task to be executed corresponding to the executor. For example, for the executor 1, the task 1 to be executed with the number 1 may be taken as the target task to be executed corresponding to the executor 1. When the executor 1 executes the target task to be executed, the target task to be executed needs to be locked.

The locking table means that when the target task to be executed is locked, other executors cannot execute the target task.

It can be understood that, in order to prevent multiple executors from repeatedly executing the same task to be executed, when the executors execute corresponding target tasks to be executed, the executors need to lock the target tasks to be executed, and update the state of the target tasks to be executed to "executing". By performing table locking operation on the tasks to be executed, the situation that a plurality of executors read repeated data to be processed from the data table to be processed can be avoided, and the multi-task concurrency of the system can be improved.

S304, determining a current data table to be processed corresponding to the target task to be executed according to the week value in the target task to be executed.

Specifically, based on a preset corresponding relationship between the week value and the number of the data table to be processed, the current data table to be processed corresponding to the target task to be executed is determined according to the week value in the target task to be executed.

For example, the preset correspondence relationship between the week value and the number of the data table to be processed may include: monday corresponds to the to-be-processed data table 1, tuesday corresponds to the to-be-processed data table 2, wednesday corresponds to the to-be-processed data table 3, and so on.

For example, if the week value in the target to-be-executed task corresponding to the executor 1 is monday, the executor 1 may determine that the current to-be-processed data table corresponding to the target to-be-executed task is the to-be-processed data table 1.

S305, according to the time division range in the target task to be executed, acquiring the data to be processed in the time division range from the current data table to be processed, and storing the data to be processed into a target database, wherein the target database is stored in a block chain.

In the embodiment of the present application, the target database may be a renewal list database, and may also be another database.

It is emphasized that, to further ensure the privacy and security of the target database, the target database may also be stored in a node of a blockchain.

For example, if the time division range in the target to-be-executed task corresponding to the executor 1 is [ T1, a), and the corresponding current to-be-processed data table is the to-be-processed data table 1, the to-be-processed data in the time division range [ T1, a) is obtained from the to-be-processed data table 1, and the to-be-processed data is stored in the renewal list database.

For example, if the time division range in the target to-be-executed task corresponding to the executor 2 is [ a, b), and the corresponding current to-be-processed data table is the to-be-processed data table 1, the to-be-processed data in the time division range [ a, b) is obtained from the to-be-processed data table 1, and the to-be-processed data is stored in the renewal list database.

Specifically, after the executor stores the data to be processed in the target database, the state of the target task to be executed may be updated. For example, the status of the target task to be executed is updated from "executing" to "executed" to indicate that the target task to be executed is completed.

In some embodiments, after the plurality of to-be-executed tasks are executed to store the to-be-processed data in the to-be-processed data table in the target database, the execution progress corresponding to the plurality of to-be-executed tasks may be further detected, and when all the plurality of to-be-executed tasks are executed, the to-be-processed data table may be deleted.

It should be noted that, after all the tasks to be executed in the task pool are executed, the to-be-processed data table is deleted, so that the deletion operation performed after part of the to-be-processed data in the to-be-processed data is acquired every time can be avoided, and the performance of the system can be improved.

Illustratively, if it is detected that all execution schedules corresponding to a plurality of tasks to be executed are in a complete state, deleting the data table to be processed.

For example, if it is detected that the number of the completion states of the plurality of tasks to be executed is equal to the number of the corresponding time division ranges of the data table to be processed, the target data table to be processed is deleted. For example, for the to-be-processed data table 1, if there are four to-be-executed tasks and the number of the completion states of the four to-be-executed tasks is equal to the number of the time division ranges corresponding to the to-be-processed data table 1, the to-be-processed data table 1 is deleted.

Specifically, the pending data table may be deleted by a truncate command. It can be understood that the table space can be released quickly and efficiently by deleting the to-be-processed data table by using the truncate command, and the problem that the table space cannot be recycled by the vacuum process in the postgresql library can be avoided.

In some embodiments, if it is detected that the execution time of the task to be executed is greater than the preset execution time, the state of the task to be executed is marked as pending.

The preset execution time may be set according to an actual situation, and the specific data is not limited herein.

For example, the task to be executed in the task pool may be monitored by the exception monitor. When the abnormity monitor detects that the execution time of the task to be executed is larger than the preset execution time, the abnormity of the task to be executed is judged, and the state of the task to be executed is updated from 'executing' to 'waiting processing'.

It can be understood that, by monitoring the tasks to be executed, it can be prevented that the tasks to be executed are always in an executing state due to system restart, so that the tasks to be executed in the task pool cannot be completely executed.

According to the data processing method provided by the embodiment, the data to be processed are distributed to different data tables to be processed according to the acquisition date and stored according to the preset distribution strategy, so that the data to be processed with the same acquisition date can be stored in the same data table to be processed; therefore, the mutual influence of the data to be processed on different dates can be avoided, and the performance and the efficiency of the system for inquiring the data can be improved; the time division points in the time range are determined according to the data volume corresponding to the data table to be processed, so that the time division range can be dynamically expanded or reduced according to the actual data volume corresponding to the data to be processed, the data volume in each time division range is in a stable state, and the problem of memory overflow when the data to be processed in the time division range is processed later is avoided; the week value and the time division range of the data table to be processed are put into the task pool to obtain a plurality of tasks to be executed, so that the data to be processed in the data table to be processed can be paralleled, and the data storage efficiency of the system is improved; after all the tasks to be executed in the task pool are executed, the data table to be processed is deleted, so that the deletion operation after part of the data to be processed in the data to be processed is obtained every time can be avoided, and the performance of the system can be improved.

Referring to fig. 8, fig. 8 is a schematic block diagram of a data processing apparatus 100 for executing the foregoing data processing method according to an embodiment of the present application. Wherein, the data processing device can be configured in a server or a terminal.

As shown in fig. 8, the data processing apparatus 100 includes: a data acquisition module 101, a time segmentation module 102 and a data storage module 103.

The data acquisition module 101 is configured to acquire data to be processed and allocate the data to be processed to a data table to be processed according to a preset allocation policy.

The time division module 102 is configured to determine time division information corresponding to the to-be-processed data table according to the acquisition time corresponding to the to-be-processed data in the to-be-processed data table and according to the data amount corresponding to the to-be-processed data table.

The data storage module 103 is configured to determine a plurality of to-be-executed tasks corresponding to the to-be-processed data table according to the time division information corresponding to the to-be-processed data table, and execute the plurality of to-be-executed tasks, so as to store to-be-processed data in the to-be-processed data table in a target database.

It should be noted that, as will be clear to those skilled in the art, for convenience and brevity of description, the specific working processes of the apparatus and the modules described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

The apparatus described above may be implemented in the form of a computer program which is executable on a computer device as shown in fig. 9.

Referring to fig. 9, fig. 9 is a schematic block diagram of a computer device according to an embodiment of the present disclosure. The computer device may be a server or a terminal.

Referring to fig. 9, the computer device includes a processor and a memory connected by a system bus, wherein the memory may include a nonvolatile storage medium and an internal memory.

The processor is used for providing calculation and control capability and supporting the operation of the whole computer equipment.

The internal memory provides an environment for the execution of a computer program on a non-volatile storage medium, which, when executed by a processor, causes the processor to perform any of the data processing methods.

It should be understood that the Processor may be a Central Processing Unit (CPU), and the Processor may be other general purpose processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components, etc. Wherein a general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

Wherein, in one embodiment, the processor is configured to execute a computer program stored in the memory to implement the steps of:

acquiring data to be processed, and distributing the data to be processed into a data table to be processed according to a preset distribution strategy; determining time segmentation information corresponding to the data table to be processed according to the acquisition time corresponding to the data to be processed in the data table to be processed and the data volume corresponding to the data table to be processed; according to the time division information corresponding to the data table to be processed, determining a plurality of tasks to be executed corresponding to the data table to be processed, and executing the plurality of tasks to be executed so as to store the data to be processed in the data table to be processed into a target database.

In one embodiment, when the processor implements the allocation of the to-be-processed data to the to-be-processed data table according to a preset allocation policy, the processor is configured to implement:

acquiring an acquisition date corresponding to the data to be processed, and numbering data tables with preset number;

determining a target data table corresponding to the data to be processed according to the acquisition date corresponding to the data to be processed based on a preset corresponding relation between the acquisition date and the serial number of the data table; and storing the data to be processed into the target data table to obtain a data table to be processed corresponding to the data to be processed.

In one embodiment, the to-be-processed data table includes a plurality of to-be-processed data; the processor is configured to, when determining time division information corresponding to the to-be-processed data table according to acquisition time corresponding to-be-processed data in the to-be-processed data table and according to data amount corresponding to the to-be-processed data table, implement:

determining the starting time and the ending time corresponding to the data table to be processed according to the acquisition time corresponding to the data to be processed; determining a time range corresponding to the data table to be processed according to the starting time and the ending time corresponding to the data table to be processed; determining a time division point in a time range corresponding to the data table to be processed according to the data amount corresponding to the data table to be processed to obtain a time division point corresponding to the data table to be processed; and segmenting the time range corresponding to the data table to be processed according to the time segmentation point to obtain a plurality of time segmentation information corresponding to the data table to be processed.

In one embodiment, when the processor determines the time division point in the time range corresponding to the to-be-processed data table according to the data amount corresponding to the to-be-processed data table to obtain the time division point corresponding to the to-be-processed data table, the processor is configured to:

determining a first time division point and a first division range corresponding to the data table to be processed according to the starting time corresponding to the data table to be processed and a preset time interval; adjusting the first time division point according to the data volume in the first division range to obtain an adjusted first time division point; if the adjusted first time division point is equal to or larger than the end time in the time range, stopping dividing to obtain a time division point corresponding to the data table to be processed; if the adjusted first time division point is smaller than the end time in the time range, circularly determining the remaining time division points in the time range according to the adjusted first time division point and the preset time interval, and stopping division until the obtained time division point is larger than or equal to the end time to obtain a plurality of time division points corresponding to the data table to be processed.

In one embodiment, the processor, when implementing adjustment of the first time division point according to the data amount in the first division range, to obtain an adjusted first time division point, is configured to implement:

acquiring the data volume of the to-be-processed data table in the first segmentation range to obtain the data volume corresponding to the first segmentation range; if the data volume corresponding to the first division range is smaller than a preset first data volume threshold value, increasing the preset time interval according to a preset increase multiplying factor, and adjusting the first time division point according to the increased time interval to obtain an adjusted first time division point; or if the data volume corresponding to the first division range is larger than a preset second data volume threshold, reducing the preset time interval according to a preset reduction rate, and adjusting the first time division point according to the reduced time interval to obtain an adjusted first time division point.

In one embodiment, the time-slicing information comprises a time-slicing range; the processor is used for realizing that when determining a plurality of tasks to be executed corresponding to the data table to be processed according to the time division information corresponding to the data table to be processed, the processor is used for realizing that:

determining a week value corresponding to the data table to be processed; and adding the week value corresponding to the data table to be processed and a plurality of time division ranges corresponding to the data table to be processed into a task pool to obtain a plurality of tasks to be executed corresponding to the data table to be processed.

In one embodiment, the processor, when implementing to execute a plurality of the tasks to be executed to store the data to be processed in the data table to be processed in the target database, is configured to implement:

sequentially determining a target task to be executed of the executor in the plurality of tasks to be executed, and locking the target task to be executed; determining a current data table to be processed corresponding to the target task to be executed according to the week value in the target task to be executed; and acquiring the data to be processed in the time division range from the current data table to be processed according to the time division range in the target task to be executed, and storing the data to be processed in a target database.

In one embodiment, after implementing to execute a plurality of the tasks to be executed to store the data to be processed in the data table to be processed in the target database, the processor is further configured to implement:

detecting the execution progress corresponding to a plurality of tasks to be executed; if the execution progress corresponding to the plurality of tasks to be executed is detected to be in a complete state, deleting the data table to be processed; or if the number of the completion states of the plurality of the tasks to be executed is detected to be equal to the number of the corresponding time division ranges of the data table to be processed, deleting the target data table to be processed.

The embodiment of the present application further provides a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, where the computer program includes program instructions, and the processor executes the program instructions to implement any one of the data processing methods provided in the embodiments of the present application.

The computer-readable storage medium may be an internal storage unit of the computer device described in the foregoing embodiment, for example, a hard disk or a memory of the computer device. The computer readable storage medium may also be an external storage device of the computer device, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital Card (SD Card), a Flash memory Card (Flash Card), and the like provided on the computer device.

Further, the computer-readable storage medium may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function, and the like; the storage data area may store data created according to the use of the blockchain node, and the like.

The block chain referred by the application is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.

While the invention has been described with reference to specific embodiments, the scope of the invention is not limited thereto, and those skilled in the art can easily conceive various equivalent modifications or substitutions within the technical scope of the invention. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A data processing method, comprising:

2. The data processing method according to claim 1, wherein the allocating the to-be-processed data to the to-be-processed data table according to a preset allocation policy comprises:

determining a target data table corresponding to the data to be processed according to the acquisition date corresponding to the data to be processed based on a preset corresponding relation between the acquisition date and the serial number of the data table;

and storing the data to be processed into the target data table to obtain a data table to be processed corresponding to the data to be processed, wherein the data table to be processed is stored in a block chain.

3. The data processing method according to claim 1, wherein the to-be-processed data table includes a plurality of to-be-processed data; the determining time division information corresponding to the data table to be processed according to the acquisition time corresponding to the data to be processed in the data table to be processed and according to the data amount corresponding to the data table to be processed includes:

determining the starting time and the ending time corresponding to the data table to be processed according to the acquisition time corresponding to the data to be processed;

determining a time range corresponding to the data table to be processed according to the starting time and the ending time corresponding to the data table to be processed;

determining a time division point in a time range corresponding to the data table to be processed according to the data amount corresponding to the data table to be processed to obtain a time division point corresponding to the data table to be processed;

and segmenting the time range corresponding to the data table to be processed according to the time segmentation point to obtain a plurality of time segmentation information corresponding to the data table to be processed.

4. The data processing method according to claim 3, wherein the determining a time division point in a time range corresponding to the to-be-processed data table according to the data amount corresponding to the to-be-processed data table to obtain the time division point corresponding to the to-be-processed data table comprises:

determining a first time division point and a first division range corresponding to the data table to be processed according to the starting time corresponding to the data table to be processed and a preset time interval;

adjusting the first time division point according to the data volume in the first division range to obtain an adjusted first time division point;

if the adjusted first time division point is equal to or larger than the end time in the time range, stopping dividing to obtain a time division point corresponding to the data table to be processed;

if the adjusted first time division point is smaller than the end time in the time range, circularly determining the remaining time division points in the time range according to the adjusted first time division point and the preset time interval, and stopping division until the obtained time division point is larger than or equal to the end time to obtain a plurality of time division points corresponding to the data table to be processed.

5. The data processing method according to claim 4, wherein the adjusting the first time division point according to the data amount in the first division range to obtain the adjusted first time division point comprises:

acquiring the data volume of the to-be-processed data table in the first segmentation range to obtain the data volume corresponding to the first segmentation range;

if the data volume corresponding to the first division range is smaller than a preset first data volume threshold value, increasing the preset time interval according to a preset increase multiplying factor, and adjusting the first time division point according to the increased time interval to obtain an adjusted first time division point; or

If the data volume corresponding to the first division range is larger than a preset second data volume threshold value, reducing the preset time interval according to a preset reduction rate, and adjusting the first time division point according to the reduced time interval to obtain an adjusted first time division point.

6. The data processing method of claim 1, wherein the time-division information includes a time-division range; the determining a plurality of to-be-executed tasks corresponding to the to-be-processed data table according to the time division information corresponding to the to-be-processed data table includes:

determining a week value corresponding to the data table to be processed;

adding the week value corresponding to the data table to be processed and a plurality of time division ranges corresponding to the data table to be processed into a task pool to obtain a plurality of tasks to be executed corresponding to the data table to be processed;

the executing the plurality of tasks to be executed to store the data to be processed in the data table to be processed in the target database includes:

sequentially determining a target task to be executed of the executor in the plurality of tasks to be executed, and locking the target task to be executed;

determining a current data table to be processed corresponding to the target task to be executed according to the week value in the target task to be executed;

and acquiring the data to be processed in the time division range from the current data table to be processed according to the time division range in the target task to be executed, and storing the data to be processed into a target database, wherein the target database is stored in a block chain.

7. The data processing method according to any one of claims 1 to 6, wherein after the executing the plurality of tasks to be executed to store the data to be processed in the data table to be processed in the target database, the method further comprises:

detecting the execution progress corresponding to a plurality of tasks to be executed;

if the execution progress corresponding to the plurality of tasks to be executed is detected to be in a complete state, deleting the data table to be processed; or

And if the number of the completion states of the plurality of the tasks to be executed is equal to the number of the corresponding time division ranges of the data table to be processed, deleting the target data table to be processed.

8. A data processing apparatus, comprising:

9. A computer device, wherein the computer device comprises a memory and a processor;

the memory for storing a computer program;

the processor for executing the computer program and implementing the data processing method of any one of claims 1 to 7 when executing the computer program.

10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program which, when executed by a processor, causes the processor to implement the data processing method according to any one of claims 1 to 7.