CN111813805A - Data processing method and device - Google Patents

Data processing method and device Download PDF

Info

Publication number
CN111813805A
CN111813805A CN201910292858.2A CN201910292858A CN111813805A CN 111813805 A CN111813805 A CN 111813805A CN 201910292858 A CN201910292858 A CN 201910292858A CN 111813805 A CN111813805 A CN 111813805A
Authority
CN
China
Prior art keywords
data
data processing
processed
preset
service
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910292858.2A
Other languages
Chinese (zh)
Inventor
崔广维
王守初
马辉
孙志彪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Mobile Communications Group Co Ltd
China Mobile Group Henan Co Ltd
Original Assignee
China Mobile Communications Group Co Ltd
China Mobile Group Henan Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Mobile Communications Group Co Ltd, China Mobile Group Henan Co Ltd filed Critical China Mobile Communications Group Co Ltd
Priority to CN201910292858.2A priority Critical patent/CN111813805A/en
Publication of CN111813805A publication Critical patent/CN111813805A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24552Database cache management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5011Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention provides a data processing method and a data processing device, which are used for reading all to-be-processed business data from a target database and storing the business data into a plurality of caches according to preset classification rules; creating a plurality of data processing tasks according to the classification result of the service data; and distributing the plurality of data processing tasks to a preset number of data processing threads to trigger the data processing threads to process the service data in the corresponding cache based on the data processing tasks. The method has the advantages that the service data to be processed are read from the database in batches in advance, the read data are loaded into the caches in full quantity for classification and arrangement, the service data to be processed are taken out from the caches by the processing threads for processing during program operation, and the processing threads which are executed in parallel are directly butted with the caches, so that the increase of the number of the processing threads does not cause pressure on the database, high concurrency and high expansion are realized, and the operating efficiency of data processing is greatly improved.

Description

Data processing method and device
Technical Field
The present invention relates to the field of mobile communications technologies, and in particular, to a data processing method and apparatus.
Background
In the era of mobile internet, china mobile becomes a link of the industrial chain of mobile internet, and in the face of the increase of user quantity and the development of business, the data quantity is also rapidly increasing, and a new challenge is provided for the processing performance of batch processing operation in a business support system.
In the prior art, the performance of a database is mainly depended on, the DML language of the database is used for operating tens of millions of levels of data, the requirements on the performance of the used database and the performance of a host are very high, the cost of the currently used oracle database is very high, the expansion difficulty is high, and the phenomenon of resource contention is serious during large-batch concurrent operation, so that the operation efficiency of each module is very low.
Therefore, data is queried from the database, calculation is performed to generate a calculation result, and the calculation result is stored in the database. This processing mode increases parallelism to improve processing performance, eventually resulting in increased pressure on the database and inefficient operation.
Disclosure of Invention
The embodiment of the invention aims to provide a data processing method and a data processing device, and aims to solve the problems that the existing data processing mode increases parallelism in order to improve processing performance, finally causes the pressure on a database to be increased, and has low operation efficiency.
In order to solve the above technical problem, the embodiment of the present invention is implemented as follows:
in a first aspect, an embodiment of the present invention provides a data processing method, including:
reading all the service data to be processed from a target database, and storing the service data into a plurality of caches according to a preset classification rule;
creating a plurality of data processing tasks according to the classification result of the service data;
and distributing the plurality of data processing tasks to a preset number of data processing threads so as to trigger the data processing threads to process the corresponding service data in the cache based on the data processing tasks.
In a second aspect, an embodiment of the present invention provides a data processing apparatus, including:
the service data reading module is used for reading the service data to be processed from the target database in a full amount and storing the service data into a plurality of caches according to a preset classification rule;
the processing task creating module is used for creating a plurality of data processing tasks according to the classification result of the service data;
and the data processing triggering module is used for distributing the data processing tasks to a preset number of data processing threads so as to trigger the data processing threads to process the corresponding service data in the cache based on the data processing tasks.
In a third aspect, an embodiment of the present invention provides a computer device, including a processor, a communication interface, a memory, and a communication bus; the processor, the communication interface and the memory complete mutual communication through a bus; the memory is used for storing a computer program; the processor is configured to execute the program stored in the memory to implement the steps of the data processing method according to the first aspect.
In a fourth aspect, the present invention provides a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the computer program implements the steps of the data processing method according to the first aspect.
The embodiment of the invention provides a data processing method and a data processing device, wherein the method comprises the following steps: reading all the service data to be processed from a target database, and storing the service data into a plurality of caches according to a preset classification rule; creating a plurality of data processing tasks according to the classification result of the service data; and distributing the plurality of data processing tasks to a preset number of data processing threads to trigger the data processing threads to process the service data in the corresponding cache based on the data processing tasks. The method has the advantages that the to-be-processed business data are read from the database in batches in advance, the read data are loaded into the caches in full quantity for classification and arrangement, in the process of program operation, the to-be-processed business data are taken out from the caches by the processing threads for processing, the increase of the number of the processing threads does not cause pressure on the database, so that the dependence on the database is effectively reduced, high concurrency and high expansion are realized, and the operating efficiency of data processing is greatly improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only some embodiments described in the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
Fig. 1 is a first flowchart of a data processing method according to an embodiment of the present invention;
FIG. 2 is a second flowchart of a data processing method according to an embodiment of the present invention;
FIG. 3 is a third flowchart illustrating a data processing method according to an embodiment of the present invention;
FIG. 4 is a fourth flowchart illustrating a data processing method according to an embodiment of the present invention;
fig. 5 is a schematic flow chart of a data processing method according to an embodiment of the present invention;
fig. 6 is a schematic flow chart of a sixth data processing method according to an embodiment of the present invention;
fig. 7 is a schematic diagram illustrating an implementation principle of a data processing method according to an embodiment of the present invention;
FIG. 8 is a block diagram of a data processing apparatus according to an embodiment of the present invention;
fig. 9 is a schematic structural diagram of a computer device according to an embodiment of the present invention.
Detailed Description
In order to make those skilled in the art better understand the technical solution of the present invention, the technical solution in the embodiment of the present invention will be clearly and completely described below with reference to the drawings in the embodiment of the present invention, and it is obvious that the described embodiment is only a part of the embodiment of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The embodiment of the invention provides a data processing method and a data processing device, wherein the to-be-processed service data is read from a database in batches in advance, the read data is loaded into a plurality of caches in full for classification and arrangement, and the to-be-processed service data is taken out from the caches by a plurality of processing threads for processing during program operation, so that the plurality of processing threads which are executed in parallel are directly butted with the caches, the increase of the number of the processing threads can not cause pressure on the database, the dependence on the database is effectively reduced, high concurrency and high expansion are realized, and the operating efficiency of data processing is greatly improved.
Fig. 1 is a schematic flow chart of a data processing method according to an embodiment of the present invention, as shown in fig. 1, the method at least includes the following steps:
s101, reading all the service data to be processed from a target database;
the service data to be processed, which is read from the target database in full, may be service data of tens of millions of users, and one or more embodiments of the present specification mainly process massive user service data in a distributed architecture;
s102, storing the service data into a plurality of caches according to a preset classification rule, wherein the preset classification rule can be used for storing the service data acquired from a target database into the plurality of caches according to user identification or service category; the cache can be a map cache or a distributed cache redis, the configuration data is cached by using an in-process map, the service data with large data volume, such as service data, is loaded into the distributed cache redis for caching, and preferably, for the data processing with large data volume of tens of millions of levels, the service data to be processed, which is read from the target database in full, is loaded into the distributed cache for processing;
specifically, for a mobile service database as a target database, classifying and sorting the service data containing ten million levels of users, such as the defaulting data, the recharging data, the score data and the like of the users, stored in the target database according to the identification information of each user, classifying the service data, such as the defaulting data, the recharging data, the score data and the like corresponding to each user identification, and the user identification thereof, and classifying and storing the service data of ten million users acquired from the target database into a plurality of distributed caches for caching according to the service category;
s103, creating a plurality of data processing tasks according to the classification result of the service data, specifically, classifying and sorting the service data acquired from the target database according to the user identification or the service category in the plurality of distributed redis caches to obtain the classification result of the service data according to the service data which is imported from the target database into the plurality of distributed caches for caching, creating a plurality of data processing tasks according to the obtained classification result, and providing the created plurality of data processing tasks for the processing thread for query processing;
s104, distributing the plurality of data processing tasks to a preset number of data processing threads;
s105, triggering a data processing thread to process the service data in the corresponding cache based on the data processing task;
specifically, when a thread of a task is created, according to service data which is imported from a target database into a plurality of distributed caches for caching, classifying and sorting the service data acquired from the target database in the plurality of distributed caches according to user identifiers or service categories to obtain classification results of the service data, creating a plurality of data processing tasks according to the obtained classification results, putting the created plurality of data processing tasks into a thread task pool, and when a task scheduler detects that a plurality of data processing tasks to be processed exist in the thread task pool, allocating the plurality of data processing tasks to a preset number of data processing threads in the thread pool to trigger the data processing threads to process the service data in the corresponding distributed caches based on the data processing tasks;
the data processing method in the embodiment supports breakpoint continuing processing, sequentially executes Structured Query Language (SQL) operation scenes in the program execution process, adds judgment on operation steps in a section through the AOP of java, directly skips successfully executed steps, and executes SQL operations after abnormal restart of a process to realize continuation processing; in addition, all user identification information needing to be processed is loaded into the set in the distributed redis cache, data is taken out of the set during processing, data processing is carried out, unprocessed user identifications are taken out for processing after abnormal restarting, the problem that when massive data such as tens of millions of data are processed, the data needs to be reprocessed due to the fact that a program is abnormal is effectively avoided, and therefore the running stability of the program is further guaranteed;
in the embodiment of the invention, the business data to be processed is read from the database in batches in advance, the read data is loaded into the plurality of caches in full for classification and arrangement, and the plurality of processing threads are adopted to take the business data to be processed out of the plurality of caches for processing during program operation, so that the plurality of processing threads which are executed in parallel are directly butted with the plurality of caches, the increase of the number of the processing threads can not cause pressure on the database, the dependence on the database is effectively reduced, high concurrency and high expansion are realized, and the operating efficiency of data processing is greatly improved.
When batch-wise reading to-be-processed service data from the target database in full, the read service data needs to be sorted in a plurality of distributed redis caches, as shown in fig. 2, in the step S102, the step of storing the service data into the plurality of caches according to a preset sorting rule includes:
s1021, classifying the service data according to the user identification, determining the service data with the same user identification as the data subset to be processed, specifically, for example, regarding the target database as a mobile service database, classifying the service data stored in the target database and containing ten million users, such as the service data of arrearage data, recharge data, and credit data of users, according to each user identification, classifying the service data corresponding to each user identification and the service data of arrearage data, recharge data, and credit data, etc. corresponding to each user identification as one class, determining the service data with the same user identification as the data subset to be processed, obtaining ten million data subsets to be processed corresponding to ten million user identifications, wherein each data subset to be processed includes the mobile service data related to the user identification, arrearage data, recharge data, and credit data, etc., therefore, when the related service of the target user is inquired, the service information such as the arrearage information, the pre-stored information, the recharging information, the integral information and the like of the user can be completely read out according to the identification information of the user, so that the data retrieval times are reduced to a certain extent, and the data processing efficiency is improved.
S1022, performing grouping processing on the multiple subsets of data to be processed to obtain multiple sets of data to be processed, specifically, classifying the service data read from the target database according to the user identifier, determining the obtained service data with the same user identifier as the subsets of data to be processed, and performing grouping processing on the multiple subsets of data to be processed obtained after classification and sorting to obtain multiple sets of data to be processed, where the data to be processed is service data with a certain number of subsets of data to be processed;
s1023, a plurality of groups of data to be processed are collected and stored in a plurality of caches, considering that the target database is a mobile service database, the database usually contains ten million levels of data, if the ten million levels of data are stored in a distributed redis cache, the operation efficiency of a processing host is reduced, and in addition, the cost of the X86 host is lower at present, so the ten million levels of data can be stored in the redis caches of a plurality of X86 hosts, and more concurrent data processing can be provided through the transverse expansion, so that the time for processing the data of the ten million levels of data is effectively shortened, and the data processing efficiency is greatly improved;
for example, aiming at the mobile service database which is a target database, classifying and sorting the service data which is stored in the target database and contains ten million users, such as the service data of arrearage data, recharging data, score data and the like of the users according to the user identifications, classifying the service data which contains the ten million users and is corresponding to each user identification into a class together with the user identifications of the service data, determining the service data with the same user identification as a data subset to be processed, and obtaining ten million data subsets to be processed which contain the ten million user identifications, wherein each data subset to be processed comprises the mobile service data which is related to the user identification, the arrearage data, the recharging data, the score data and the like; grouping the obtained ten million to-be-processed data subsets, for example, dividing the ten million to-be-processed data subsets into 100 groups to obtain 100 groups of to-be-processed data sets, wherein each group contains 10 ten thousand to-be-processed data subsets, and then storing the 100 groups of to-be-processed data sets into a plurality of distributed redis caches;
in order to further improve the data processing efficiency, as shown in fig. 3, in step S103, creating a plurality of data processing tasks according to the classification result of the service data, where the data processing tasks include:
and S1031, for each to-be-processed data set, creating a data processing task according to the identification information of the cache where the to-be-processed data set is located and a preset data processing requirement, wherein the preset data processing requirement comprises: processing requirements such as arrearage data, recharging data, integral data, flow data and the like are inquired; aiming at each data set to be processed, establishing a data processing task according to physical address identification information in a distributed redis cache where the data set to be processed is located, and specific processing requirement information such as arrearage data, recharging data, integral data and flow data is inquired; for example, the preset data processing requirement is to query arrearage data of all users; dividing ten million to-be-processed data subsets into 100 groups to obtain 100 groups of to-be-processed data sets, and creating a data processing task according to physical address identification information in a distributed redis cache where each to-be-processed data set is located and a defaulting processing requirement for each to-be-processed data set, wherein each group of to-be-processed data sets comprises 10 ten million to-be-processed data subsets;
for example, ten million subsets of data to be processed are divided into 100 groups, and 100 data processing tasks are created correspondingly;
for the created multiple data processing tasks, in order to further improve the data processing efficiency, a preset number of data processing threads are adopted to concurrently perform data processing, as shown in fig. 4, in step S104, the allocating the multiple data processing tasks to the preset number of data processing threads includes:
s1041, triggering a task scheduler to select a plurality of data processing threads in an idle state in a data processing thread pool;
s1042, selecting an unprocessed data processing task from the plurality of data processing tasks and allocating the selected unprocessed data processing task to the plurality of data processing threads.
Specifically, after a plurality of data processing tasks are created according to the classification result of the service data, when the task scheduler detects that a plurality of data processing tasks exist in the thread task pool, whether a plurality of data processing threads in an idle state exist in the data processing thread pool is detected, when the plurality of data processing threads in the idle state exist in the data processing thread pool is detected, the task scheduler is triggered to select the plurality of data processing threads in the idle state from the data processing thread pool, unprocessed data processing tasks are selected from the plurality of data processing tasks and are simultaneously allocated to the plurality of data processing threads in the idle state, and concurrent processing task operation is executed until all the processing tasks in the thread task pool are completed.
Wherein, the data processing task includes: as shown in fig. 5, in step S105, triggering the data processing thread to process the service data in the corresponding cache based on the data processing task, where the cache identification information and the preset data processing requirement include:
s1051, triggering a data processing thread to read a data set to be processed from a cache corresponding to cache identification information, and performing concurrent processing on the data set to be processed based on preset data processing requirements, specifically, after selecting unprocessed data processing tasks from a plurality of data processing tasks and allocating the unprocessed data processing tasks to the plurality of data processing threads, triggering each data processing thread allocated to the tasks, reading the data set to be processed from a redis cache corresponding to the distributed redis cache identification information, and performing concurrent processing on the data set to be processed based on the preset data processing requirements; for example, the preset data processing requirement is to calculate point data of a user, trigger each data processing thread allocated to the task, read the identification information of each user included in the set of the data to be processed according to the cache identification information carried in each allocated task and the requirement of calculating the point data of the user and concurrently from the set in the redis cache corresponding to the distributed redis cache identification information carried in the allocated task, search the subset of the data to be processed corresponding to the user identification according to the identification information of the user, and calculate the point data of the user according to the recharge data and the donation data in the subset of the data to be processed.
As shown in fig. 6, after the triggering the data processing thread to process the service data in the corresponding cache based on the data processing task in S105, the method further includes:
s106, writing the data processing result of the data processing thread for the service data in the cache into the target database, specifically, after the plurality of data processing threads concurrently process the service data in the corresponding distributed redis cache based on the corresponding data processing tasks, obtaining the corresponding data processing result, and importing the obtained data processing results into the target database in batches by the thread responsible for importing the data results into the target database.
In order to further reduce the pressure of the target database, in step S106, writing the data processing result of the data processing thread for the service data in the cache into the target database includes:
the method comprises the following steps: storing data processing results of all the data processing threads aiming at the service data in the cache into a preset data temporary storage pool;
step two: judging whether the number of data processing results in a preset data temporary storage pool is greater than a preset threshold value or not;
step three: if so, calling a data result writing thread to write a plurality of data processing results in the preset data temporary storage pool into the target database, and deleting the plurality of data processing results in the preset data temporary storage pool;
specifically, in order to further reduce the pressure of the target database, the processing results generated after the data processing threads process the service data in the cache are not written into the target database one by one, but the data processing results of the service data are firstly stored into a preset data temporary storage pool, when the number of the data processing results put into the preset data temporary storage pool is greater than a preset threshold value, the data result writing threads are called to write a plurality of data processing results in the preset data temporary storage pool into the target database in batch, and the plurality of data processing results in the preset data temporary storage pool are deleted.
The data processing method provided in the above embodiment further supports and operates an account checking function, obtains a corresponding data processing result after the plurality of data processing threads concurrently process the service data in the corresponding distributed redis cache based on the corresponding data processing task, imports the data processing result into the temporary table, performs account checking and tests processing efficiency as compared with a processing result obtained when the same data is processed in the prior art, and performs optimization according to a test condition.
In a specific embodiment, as shown in fig. 7, a schematic diagram of an implementation principle of a processing method for ten million-level data is provided, and a specific implementation process is as follows:
(1) reading all the service data to be processed from a target database, wherein the service data to be processed is ten million-level data;
(2) storing to-be-processed service data acquired from a target database into a plurality of distributed redis caches according to preset classification rules, and performing memory loading in the plurality of distributed redis caches through memory loading threads, wherein the memory loading process comprises the following steps:
(a) classifying the service data according to the user identification, and determining the service data with the same user identification as a data subset to be processed;
for example, aiming at the mobile service database which is a target database, classifying and sorting the service data which is stored in the target database and contains ten million users, such as the service data of arrearage data, recharging data, score data and the like of the users according to the user identifications, classifying the service data which contains the ten million users and is corresponding to each user identification into a class together with the user identifications of the service data, determining the service data with the same user identification as a data subset to be processed, and obtaining ten million data subsets to be processed which contain the ten million user identifications, wherein each data subset to be processed comprises the mobile service data which is related to the user identification, the arrearage data, the recharging data, the score data and the like;
(b) grouping a plurality of to-be-processed data subsets to obtain a plurality of groups of to-be-processed data sets, wherein the to-be-processed data sets are service data with a certain number of to-be-processed data subsets;
(c) storing the obtained multiple groups of data sets to be processed into a plurality of distributed redis caches;
(3) aiming at a plurality of groups of data sets to be processed stored in a plurality of distributed redis caches, creating a corresponding number of data processing tasks by a task creating thread;
(4) the task scheduler detects that a plurality of data processing tasks to be processed exist, and allocates the plurality of data processing tasks to a preset number of data processing threads so as to trigger the data processing threads to process the service data in the corresponding distributed redis cache based on the data processing tasks;
(5) generating a corresponding file by the data processing thread according to a data processing result of the service data in the distributed redis cache, importing the file into a database file, and finally importing the file imported into the database into a target database;
the data processing method in the embodiment of the invention comprises the following steps: reading all the service data to be processed from a target database, and storing the service data into a plurality of caches according to a preset classification rule; creating a plurality of data processing tasks according to the classification result of the service data; and distributing the plurality of data processing tasks to a preset number of data processing threads to trigger the data processing threads to process the service data in the corresponding cache based on the data processing tasks. The method has the advantages that the service data to be processed are read from the database in batches in advance, the read data are loaded into the caches in full quantity for classification and arrangement, the service data to be processed are taken out from the caches by the processing threads for processing during program operation, the processing threads which are executed in parallel are directly connected with the caches, pressure cannot be applied to the database due to the increase of the number of the processing threads, the dependence on the database is effectively reduced, high concurrency and high expansion are achieved, and the operating efficiency of data processing is greatly improved.
On the basis of the same technical concept, a data processing apparatus is further provided in the embodiment of the present invention corresponding to the data processing method provided in the foregoing embodiment, and fig. 8 is a schematic diagram illustrating a first module of the data processing apparatus according to the embodiment of the present invention, where the data processing apparatus is configured to execute the data processing method described in fig. 1 to fig. 7, and as shown in fig. 8, the data processing apparatus includes:
a service data reading module 801, configured to read service data to be processed from a target database in a full amount, and store the service data in multiple caches according to a preset classification rule;
a processing task creating module 802, configured to create a plurality of data processing tasks according to the classification result of the service data;
a data processing triggering module 803, configured to allocate a plurality of data processing tasks to a preset number of data processing threads, so as to trigger the data processing threads to process the corresponding service data in the cache based on the data processing tasks.
In the embodiment of the invention, the business data to be processed is read from the database in batches in advance, the read data is loaded into the plurality of caches in full for classification and arrangement, and the plurality of processing threads are adopted to take the business data to be processed out of the plurality of caches for processing during program operation, so that the plurality of processing threads which are executed in parallel are directly butted with the plurality of caches, the increase of the number of the processing threads can not cause pressure on the database, the dependence on the database is effectively reduced, high concurrency and high expansion are realized, and the operating efficiency of data processing is greatly improved.
Optionally, the service data reading module 801 is specifically configured to:
classifying the service data according to the user identification, and determining the service data with the same user identification as a data subset to be processed;
grouping a plurality of to-be-processed data subsets to obtain a plurality of groups of to-be-processed data sets;
and storing the multiple groups of the data sets to be processed into multiple caches.
Optionally, the processing task creating module 802 is specifically configured to:
and aiming at each data set to be processed, creating a data processing task according to the identification information of the cache where the data set to be processed is located and a preset data processing requirement.
Optionally, the data processing triggering module 803 is specifically configured to:
triggering a task scheduler to select a plurality of data processing threads in an idle state in a data processing thread pool;
and selecting unprocessed data processing tasks from the plurality of data processing tasks and distributing the unprocessed data processing tasks to the plurality of data processing threads.
Optionally, the data processing task includes: caching identification information and a preset data processing requirement; the data processing triggering module 803 is further specifically configured to:
and triggering the data processing thread to read a data set to be processed from the cache corresponding to the cache identification information, and performing concurrent processing on the data set to be processed based on the preset data processing requirement.
Optionally, the apparatus is further configured to:
and writing the data processing result of the data processing thread aiming at the service data in the cache into the target database.
Optionally, the apparatus is further specifically configured to:
storing the data processing results of the data processing threads aiming at the service data in the cache into a preset data temporary storage pool;
judging whether the number of the data processing results in the preset data temporary storage pool is greater than a preset threshold value or not;
if so, calling a data result writing thread to write the plurality of data processing results in the preset data temporary storage pool into the target database, and deleting the plurality of data processing results in the preset data temporary storage pool.
The data processing device in the embodiment of the invention reads the to-be-processed business data from the target database in a full amount, and stores the business data into a plurality of caches according to the preset classification rule; creating a plurality of data processing tasks according to the classification result of the service data; and distributing the plurality of data processing tasks to a preset number of data processing threads to trigger the data processing threads to process the service data in the corresponding cache based on the data processing tasks. The method has the advantages that the service data to be processed are read from the database in batches in advance, the read data are loaded into the caches in full quantity for classification and arrangement, the service data to be processed are taken out from the caches by the processing threads for processing during program operation, the processing threads which are executed in parallel are directly connected with the caches, pressure cannot be applied to the database due to the increase of the number of the processing threads, the dependence on the database is effectively reduced, high concurrency and high expansion are achieved, and the operating efficiency of data processing is greatly improved.
The data processing apparatus provided in the embodiment of the present invention can implement each process in the embodiment corresponding to the data processing method, and is not described here again to avoid repetition.
It should be noted that the data processing apparatus provided in the embodiment of the present invention and the data processing method provided in the embodiment of the present invention are based on the same inventive concept, and therefore, for specific implementation of the embodiment, reference may be made to implementation of the foregoing data processing method, and repeated details are not described again.
Corresponding to the data processing method provided in the foregoing embodiment, based on the same technical concept, an embodiment of the present invention further provides a computer device, where the computer device is configured to execute the foregoing data processing method, and fig. 9 is a schematic structural diagram of a computer device implementing various embodiments of the present invention, as shown in fig. 9. Computer devices may vary widely in configuration or performance and may include one or more processors 901 and memory 902, where the memory 902 may have one or more stored applications or data stored therein. Memory 902 may be, among other things, transient storage or persistent storage. The application program stored in memory 902 may include one or more modules (not shown), each of which may include a series of computer-executable instructions for a computing device. Still further, the processor 901 may be configured to communicate with the memory 902 to execute a series of computer-executable instructions in the memory 902 on a computer device. The computer apparatus may also include one or more power supplies 903, one or more wired or wireless network interfaces 904, one or more input-output interfaces 905, one or more keyboards 906.
In this embodiment, the computer device includes a processor, a communication interface, a memory, and a communication bus; the processor, the communication interface and the memory complete mutual communication through a bus; the memory is used for storing a computer program; the processor is used for executing the program stored in the memory and realizing the following method steps:
reading all the service data to be processed from a target database, and storing the service data into a plurality of caches according to a preset classification rule;
creating a plurality of data processing tasks according to the classification result of the service data;
and distributing the plurality of data processing tasks to a preset number of data processing threads so as to trigger the data processing threads to process the corresponding service data in the cache based on the data processing tasks.
An embodiment of the present application further provides a computer-readable storage medium, in which a computer program is stored, and when executed by a processor, the computer program implements the following method steps:
reading all the service data to be processed from a target database, and storing the service data into a plurality of caches according to a preset classification rule;
creating a plurality of data processing tasks according to the classification result of the service data;
and distributing the plurality of data processing tasks to a preset number of data processing threads so as to trigger the data processing threads to process the corresponding service data in the cache based on the data processing tasks.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.
Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The above description is only an example of the present application and is not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.

Claims (10)

1. A data processing method, comprising:
reading all the service data to be processed from a target database, and storing the service data into a plurality of caches according to a preset classification rule;
creating a plurality of data processing tasks according to the classification result of the service data;
and distributing the plurality of data processing tasks to a preset number of data processing threads so as to trigger the data processing threads to process the corresponding service data in the cache based on the data processing tasks.
2. The method of claim 1, wherein the storing the service data into a plurality of buffers according to a preset classification rule comprises:
classifying the service data according to the user identification, and determining the service data with the same user identification as a data subset to be processed;
grouping a plurality of to-be-processed data subsets to obtain a plurality of groups of to-be-processed data sets;
and storing the multiple groups of the data sets to be processed into multiple caches.
3. The method of claim 2, wherein creating a plurality of data processing tasks according to the classification result of the business data comprises:
and aiming at each data set to be processed, creating a data processing task according to the identification information of the cache where the data set to be processed is located and a preset data processing requirement.
4. The method of claim 1, wherein said assigning a plurality of said data processing tasks to a preset number of data processing threads comprises:
triggering a task scheduler to select a plurality of data processing threads in an idle state in a data processing thread pool;
and selecting unprocessed data processing tasks from the plurality of data processing tasks and distributing the unprocessed data processing tasks to the plurality of data processing threads.
5. The method of claim 1, wherein the data processing task comprises: caching identification information and a preset data processing requirement;
the triggering the data processing thread to process the corresponding service data in the cache based on the data processing task includes:
and triggering the data processing thread to read a data set to be processed from the cache corresponding to the cache identification information, and performing concurrent processing on the data set to be processed based on the preset data processing requirement.
6. The method of any of claims 1 to 5, further comprising:
and writing the data processing result of the data processing thread aiming at the service data in the cache into the target database.
7. The method according to claim 6, wherein the writing the data processing result of the data processing thread for the service data in the cache into the target database comprises:
storing the data processing results of the data processing threads aiming at the service data in the cache into a preset data temporary storage pool;
judging whether the number of the data processing results in the preset data temporary storage pool is greater than a preset threshold value or not;
if so, calling a data result writing thread to write the plurality of data processing results in the preset data temporary storage pool into the target database, and deleting the plurality of data processing results in the preset data temporary storage pool.
8. A data processing apparatus, comprising:
the service data reading module is used for reading the service data to be processed from the target database in a full amount and storing the service data into a plurality of caches according to a preset classification rule;
the processing task creating module is used for creating a plurality of data processing tasks according to the classification result of the service data;
and the data processing triggering module is used for distributing the data processing tasks to a preset number of data processing threads so as to trigger the data processing threads to process the corresponding service data in the cache based on the data processing tasks.
9. A computer device comprising a processor, a communication interface, a memory, and a communication bus; the processor, the communication interface and the memory complete mutual communication through a bus; the memory is used for storing a computer program; the processor, configured to execute the program stored in the memory, implements the method steps of any of claims 1-7.
10. A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, which computer program, when being executed by a processor, carries out the method steps of any of the claims 1-7.
CN201910292858.2A 2019-04-12 2019-04-12 Data processing method and device Pending CN111813805A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910292858.2A CN111813805A (en) 2019-04-12 2019-04-12 Data processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910292858.2A CN111813805A (en) 2019-04-12 2019-04-12 Data processing method and device

Publications (1)

Publication Number Publication Date
CN111813805A true CN111813805A (en) 2020-10-23

Family

ID=72844611

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910292858.2A Pending CN111813805A (en) 2019-04-12 2019-04-12 Data processing method and device

Country Status (1)

Country Link
CN (1) CN111813805A (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112750027A (en) * 2020-12-30 2021-05-04 中电金信软件有限公司 Batch business processing method and device, computer equipment and storage medium
CN112835867A (en) * 2021-01-11 2021-05-25 中国农业银行股份有限公司 Data preprocessing method and device
CN113760570A (en) * 2021-01-07 2021-12-07 北京沃东天骏信息技术有限公司 Data processing method, device, electronic equipment, system and storage medium
CN113806046A (en) * 2021-09-15 2021-12-17 武汉虹信技术服务有限责任公司 Task scheduling system based on thread pool
CN113823014A (en) * 2021-08-24 2021-12-21 广州市瑞立德信息系统有限公司 Message pushing method based on access control processor and access control system
CN113986932A (en) * 2021-12-28 2022-01-28 恒生电子股份有限公司 Data processing method and device, computer equipment and readable storage medium
CN114168233A (en) * 2021-11-16 2022-03-11 北京达佳互联信息技术有限公司 Data processing method, device, server and storage medium
CN115114359A (en) * 2022-05-27 2022-09-27 马上消费金融股份有限公司 User data processing method and device
CN116841936A (en) * 2023-08-29 2023-10-03 深圳市莱仕达电子科技有限公司 Multi-device data processing method, device and system and computer device
CN117076139A (en) * 2023-10-17 2023-11-17 北京融为科技有限公司 Data processing method and related equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102800014A (en) * 2012-07-13 2012-11-28 北京华胜天成科技股份有限公司 Financial data processing method for supply chain financing
CN104714835A (en) * 2013-12-16 2015-06-17 中国移动通信集团公司 Data access processing method and device
CN104881492A (en) * 2015-06-12 2015-09-02 北京京东尚科信息技术有限公司 Cache fragmentation technology based data filtering method and device
CN105635208A (en) * 2014-10-30 2016-06-01 阿里巴巴集团控股有限公司 Business processing method and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102800014A (en) * 2012-07-13 2012-11-28 北京华胜天成科技股份有限公司 Financial data processing method for supply chain financing
CN104714835A (en) * 2013-12-16 2015-06-17 中国移动通信集团公司 Data access processing method and device
CN105635208A (en) * 2014-10-30 2016-06-01 阿里巴巴集团控股有限公司 Business processing method and device
CN104881492A (en) * 2015-06-12 2015-09-02 北京京东尚科信息技术有限公司 Cache fragmentation technology based data filtering method and device

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112750027A (en) * 2020-12-30 2021-05-04 中电金信软件有限公司 Batch business processing method and device, computer equipment and storage medium
CN113760570A (en) * 2021-01-07 2021-12-07 北京沃东天骏信息技术有限公司 Data processing method, device, electronic equipment, system and storage medium
CN112835867A (en) * 2021-01-11 2021-05-25 中国农业银行股份有限公司 Data preprocessing method and device
CN113823014A (en) * 2021-08-24 2021-12-21 广州市瑞立德信息系统有限公司 Message pushing method based on access control processor and access control system
CN113806046A (en) * 2021-09-15 2021-12-17 武汉虹信技术服务有限责任公司 Task scheduling system based on thread pool
CN114168233A (en) * 2021-11-16 2022-03-11 北京达佳互联信息技术有限公司 Data processing method, device, server and storage medium
CN113986932A (en) * 2021-12-28 2022-01-28 恒生电子股份有限公司 Data processing method and device, computer equipment and readable storage medium
CN113986932B (en) * 2021-12-28 2022-04-12 恒生电子股份有限公司 Data processing method and device, computer equipment and readable storage medium
CN115114359A (en) * 2022-05-27 2022-09-27 马上消费金融股份有限公司 User data processing method and device
CN115114359B (en) * 2022-05-27 2023-11-14 马上消费金融股份有限公司 User data processing method and device
CN116841936A (en) * 2023-08-29 2023-10-03 深圳市莱仕达电子科技有限公司 Multi-device data processing method, device and system and computer device
CN116841936B (en) * 2023-08-29 2023-11-21 深圳市莱仕达电子科技有限公司 Multi-device data processing method, device and system and computer device
CN117076139A (en) * 2023-10-17 2023-11-17 北京融为科技有限公司 Data processing method and related equipment
CN117076139B (en) * 2023-10-17 2024-04-02 北京融为科技有限公司 Data processing method and related equipment

Similar Documents

Publication Publication Date Title
CN111813805A (en) Data processing method and device
CN107622091B (en) Database query method and device
CN107450979B (en) Block chain consensus method and device
CN106407207B (en) Real-time newly-added data updating method and device
CN106874348B (en) File storage and index method and device and file reading method
CN107015985B (en) Data storage and acquisition method and device
CN106202092B (en) Data processing method and system
CN113079200A (en) Data processing method, device and system
CN109379398B (en) Data synchronization method and device
US11074246B2 (en) Cluster-based random walk processing
CN107578338B (en) Service publishing method, device and equipment
CN110633208A (en) Incremental code coverage rate testing method and system
CN105975493A (en) File combination method and apparatus
CN109033365B (en) Data processing method and related equipment
CN107451204B (en) Data query method, device and equipment
CN108399175B (en) Data storage and query method and device
CN109788013B (en) Method, device and equipment for distributing operation resources in distributed system
CN110008382B (en) Method, system and equipment for determining TopN data
CN109977317B (en) Data query method and device
US11080299B2 (en) Methods and apparatus to partition a database
CN112527792A (en) Data storage method, device, equipment and storage medium
CN116578641A (en) Database separation method and system based on ketama algorithm
CN110019497B (en) Data reading method and device
CN107562533B (en) Data loading processing method and device
CN112860412B (en) Service data processing method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20201023

RJ01 Rejection of invention patent application after publication