CN112148505A - Data batching system, method, electronic device and storage medium - Google Patents

Data batching system, method, electronic device and storage medium Download PDF

Info

Publication number
CN112148505A
CN112148505A CN202010990015.2A CN202010990015A CN112148505A CN 112148505 A CN112148505 A CN 112148505A CN 202010990015 A CN202010990015 A CN 202010990015A CN 112148505 A CN112148505 A CN 112148505A
Authority
CN
China
Prior art keywords
task
data
fragment
execution
message queue
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010990015.2A
Other languages
Chinese (zh)
Inventor
王伟
贾晶晶
王贝贝
孙春龙
丛琳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
JD Digital Technology Holdings Co Ltd
Original Assignee
JD Digital Technology Holdings Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by JD Digital Technology Holdings Co Ltd filed Critical JD Digital Technology Holdings Co Ltd
Priority to CN202010990015.2A priority Critical patent/CN112148505A/en
Publication of CN112148505A publication Critical patent/CN112148505A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/546Message passing systems or structures, e.g. queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/52Program synchronisation; Mutual exclusion, e.g. by means of semaphores

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The application relates to a data batch running system, a data batch running method, electronic equipment and a storage medium, which are applied to the technical field of data processing, wherein the system comprises: the scheduling center cluster is used for acquiring task data corresponding to the batch running task from the storage layer after the batch running task is triggered, splitting the task data into a plurality of task fragments and storing the task fragments into a message queue; the message queue is used for storing the task fragments; the working node cluster comprises at least one working node, and each working node is used for pulling the task fragment in the message queue and calling a preset service interface to execute the task fragment. The problems that in the prior art, a large amount of tasks are executed by adopting a single thread, so that the task execution speed is low, the task execution time is long, and the efficiency is low are solved; moreover, if a node fails, the execution of all tasks under the node is affected, and all tasks executed on the node also fail, which causes a problem of wasting system resources.

Description

Data batching system, method, electronic device and storage medium
Technical Field
The present application relates to the field of data processing technologies, and in particular, to a data batching system, a data batching method, an electronic device, and a storage medium.
Background
At present, when large-scale data processing is involved, a single-machine deployment processing mode is mainly adopted. With the development of business and the increase of data size, the batch running scheme deployed by a single machine has the following problems:
the single thread execution is adopted during the execution of the large batch of tasks, so that the task execution speed is low, the task execution time is long, and the efficiency is low; moreover, if a node fails, the execution of all tasks under the node is affected, and all tasks executed on the node also fail, which causes waste of system resources.
Disclosure of Invention
The application provides a data batching system, a data batching method, electronic equipment and a storage medium, which are used for solving the problems that in the prior art, a single thread is adopted for executing a large batch of tasks, so that the task execution speed is low, the task execution time is long and the efficiency is low; moreover, if a node fails, the execution of all tasks under the node is affected, and all tasks executed on the node also fail, which causes a problem of wasting system resources.
In a first aspect, an embodiment of the present application provides a data batching system, including:
the system comprises a storage layer, a dispatching center cluster, a message queue and a working node cluster;
the scheduling center cluster is used for acquiring task data corresponding to the batch running task from the storage layer after the batch running task is triggered, dividing the task data into a plurality of task fragments and storing the task fragments to the message queue;
the message queue is used for storing the task fragments;
the working node cluster comprises at least one working node, and each working node is used for pulling the task fragment in the message queue and calling a preset service interface to execute the task fragment.
Optionally, the working node is further configured to:
before the task fragment is executed by calling a preset service interface, detecting whether the execution of the task fragment is overtime or interrupted;
and if so, terminating the execution of the task fragments.
Optionally, the working node is further configured to:
and after the execution of the task fragments is terminated, recording the execution state data of the task fragments, and storing the execution state data to the storage layer.
Optionally, the storage layer comprises a first memory and a second memory,
the first memory is used for storing the task data;
the second memory is used for storing the execution state data.
Optionally, the execution status data includes execution failure information and execution success information;
the dispatching center cluster is further configured to acquire a retry task fragment from the storage layer and resend the retry task fragment to the message queue, where the retry task fragment is a task fragment corresponding to the execution failure information;
and the working node is further configured to pull the retry task fragment from the message queue and call the preset service interface to execute the retry task fragment.
Optionally, the task data includes primary key information of the task data;
the working node is used for judging whether task data in the retry task fragment needs to be retried according to the primary key information to obtain a judgment result;
if the judgment result is negative, calling the preset service interface to execute the task data corresponding to the main key information;
and if the judgment result is yes, giving up calling the preset service interface.
Optionally, the working node is further configured to: and before the task data comprises the primary key information of the task data and the preset service interface is called to execute the task fragment, checking whether the task data in the task fragment is repeated or not according to the primary key information, and if so, calling the preset service interface to execute any one of the repeated task data in the task fragment.
Optionally, the dispatch center cluster is further configured to obtain the execution state data stored in the storage layer;
judging whether a task alarm is triggered according to the execution state data;
if yes, the batch running task is stopped to be executed or alarm information is generated.
Optionally, the working node is further configured to: acquiring an interface current limiting tool before the preset service interface is called to execute the task fragment, and if the interface current limiting tool is not acquired, acquiring the interface current limiting tool again at intervals of a preset time period until the interface current limiting tool is acquired; and if the interface current limiting tool is obtained, executing the step of calling a preset service interface to execute the task fragment.
In a second aspect, an embodiment of the present application provides a data batching method, including:
after detecting that a batch running task is triggered, acquiring task data corresponding to the batch running task from a storage layer;
splitting the batch running task into a plurality of task fragments through a dispatching center cluster;
storing the task fragments to a message queue;
and pulling the task fragment in the message queue through at least one working node in the working node cluster, and calling a preset service interface to execute the task fragment.
In a third aspect, an embodiment of the present application provides an electronic device, including: the system comprises a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory are communicated with each other through the communication bus;
the memory for storing a computer program;
the processor is configured to execute the program stored in the memory to implement the data batching method of the second aspect.
In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium, which stores a computer program, and the computer program, when executed by a processor, implements the data batching method of the second aspect.
Compared with the prior art, the technical scheme provided by the embodiment of the application has the following advantages: according to the data batch running system provided by the embodiment of the application, after a batch running task is triggered, task data corresponding to the batch running task is acquired from a storage layer through a dispatching center cluster, the task data is divided into a plurality of task fragments, the task fragments are stored in a message queue, then the task fragments in the message queue are pulled by working nodes in a working node cluster, and a preset service interface is called to execute the task fragments. Therefore, the batch running task is divided into a plurality of task fragments, and each task fragment is executed by a plurality of working nodes, so that the tasks are synchronously executed, a task single-point execution mode in the prior art is replaced, the progress of task batch running is greatly accelerated, and the execution efficiency of the tasks is improved; in addition, even if one node fails, the task fragments can be executed through other working nodes, the condition that the whole task fails due to the failure of one node in the task execution can not occur, the utilization rate of system resources is improved, and the waste of resources is avoided.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without inventive exercise.
FIG. 1 is a block diagram of a data batching system provided in an embodiment of the present application;
FIG. 2 is a block diagram of a data batching system according to another embodiment of the present application;
FIG. 3 is a flow chart of a data batching method provided by an embodiment of the present application;
fig. 4 is a block diagram of an electronic device according to an embodiment of the present application.
Reference numerals:
the system comprises a storage layer-1, a first storage-11, a second storage-12, a dispatching center cluster-2, a first dispatching host-21, a second dispatching host-22, a message queue-3, a first sub-message queue-31, a second sub-message queue-32, a work node cluster-4, a first work node-41 and a second work node-42.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
An embodiment of the present application provides a data batching system, which can be applied to any form of electronic devices, such as terminals and servers. As shown in FIG. 1, the data batching system comprises: the system comprises a storage layer 1, a dispatching center cluster 2, a message queue 3 and a work node cluster 4. Wherein:
and the storage layer 1 is used for storing task data.
In some embodiments, storage tier 1 may include, but is not limited to, a file management system, a database, a search Engine (ES), a Remote Dictionary service (Redis).
And the scheduling center cluster 2 is used for acquiring task data corresponding to the batch running task from the storage layer 1 after the batch running task is triggered, dividing the task data into a plurality of task fragments, and storing the task fragments in the message queue 3.
In some embodiments, after the batch running task is triggered, the scheduling center cluster 2 acquires corresponding task data from the storage layer 1, splits the task data according to the size of a preset task segment to obtain a plurality of task segments, and stores the task segments in the message queue 3. By splitting a large task (batch running task) into a plurality of small tasks (task fragments), and executing each small task through the working node cluster 4, the tasks are synchronously executed, and the execution efficiency of the tasks is improved.
Wherein the triggering of the batching task may be, but is not limited to, automatically triggered by timing or manually triggered by a policeman on demand. The manual triggering can be realized through a task configuration and operation interface provided by the data batching system.
The task data is split into a plurality of task segments, specifically, the task data corresponding to the batch running task is obtained from the storage layer 1 based on preset task configuration information, the task data is split into a plurality of subtask data, and the subtask data is used as the task segments. The task configuration information comprises the size of task data in the batch task and the size of the task fragment.
The batch running task refers to a process that a policy worker completes corresponding operations on batch user data (the user data is stored in the storage layer 1) through a corresponding service interface, for example: and checking whether a certain batch of users are real names, adjusting the quota of the certain batch of users, and the like.
And the message queue 3 is used for storing the task fragments.
In some embodiments, the message queue 3 is an MQ (message queue) queue, and the MQ queue adopts a first-in first-out data structure, so that a high-performance, high-availability, scalable and final consistency architecture can be realized for problems of application decoupling, asynchronous messages, traffic cut-off and the like.
The working node cluster 4 comprises at least one working node, and each working node is used for pulling the task fragment in the message queue 3 and calling a preset service interface to execute the task fragment.
In some embodiments, the working node cluster 4 sets a plurality of working nodes, and each working node can pull the task fragments from the message queue 3, so that the task fragments are executed simultaneously, the progress of task batch running is greatly accelerated, and the batch running efficiency is improved.
In the embodiment of the application, after the batch running task is triggered, the task data corresponding to the batch running task is acquired from the storage layer through the dispatching center cluster, the task data is divided into a plurality of task fragments, the task fragments are stored in the message queue, then the task fragments in the message queue are pulled by the working nodes in the working node cluster, and the preset service interfaces are called to execute the task fragments. Therefore, the batch running task is divided into a plurality of task fragments, and each task fragment is executed by a plurality of working nodes, so that the tasks are synchronously executed, a task single-point execution mode in the prior art is replaced, the progress of task batch running is greatly accelerated, and the execution efficiency of the tasks is improved; in addition, even if one node fails, the task fragments can be executed through other working nodes, the condition that the whole task fails due to the failure of one node in the task execution can not occur, the utilization rate of system resources is improved, and the waste of resources is avoided.
In another embodiment, the present application provides a data batching system, as shown in fig. 2, comprising: the system comprises a storage layer 1, a dispatching center cluster 2, a message queue 3 and a work node cluster 4. Wherein:
the storage layer 1 comprises a first memory 11 and a second memory 12, the first memory 11 being used for storing task data and the second memory 12 being used for storing execution status data.
Specifically, the first storage 11 may include, but is not limited to, any one or more of a file management system, MySQL database, and search Engine (ES). The second storage 12 may be, but is not limited to, a Remote Dictionary service (Redis).
The ES is a distributed, high-expansion and high-real-time search and data analysis engine which can conveniently enable a large amount of data to have the capabilities of searching, analyzing and exploring.
Redis supports various different modes of sequencing, data is cached in a memory for ensuring efficiency, the Redis can periodically write updated data into a disk or write modification operation into an additional recording file, and master-slave synchronization is realized on the basis.
The dispatching center cluster 2 comprises at least two dispatching hosts. Taking two scheduling hosts as an example, the first scheduling host 21 is configured to obtain task data (denoted as normal task fragments) corresponding to the batch running task; the second scheduling host 22 is configured to obtain task data corresponding to the task fragment (denoted as retry task fragment) whose execution fails. It is understood that when there are more than 3 scheduling hosts, the number of the first scheduling host 21 and the second scheduling host 22 may be 1 or more.
The scheduling host may be, but is not limited to, an electronic device such as a computer or a server.
And the message queue 3 is used for storing the task fragments. The message queue 3 includes at least two sub-message queues, where a first sub-message queue 31 is used to store task data of a normal task fragment sent by the first scheduling host 21, and a second sub-message queue 32 is used to store task data of a retry task fragment sent by the second scheduling host 22.
The message queue 3 may be, but not limited to, an mq (message queue) queue, the first sub-message queue 31 may be set to be 1 or more according to practical situations, and when the first sub-message queue 31 is multiple, the first sub-message queue may be set according to the number of priorities of the task fragments. For example, the priorities of the task slices include three priorities, i.e., high, medium, and low, the first sub message queue 31 may be set to 3. It should be noted that the priority of the task fragment inherits the priority of the batching task, and the priority of the batching task can be set by a policy worker according to the actual situation.
The working node cluster 4 includes two types of working nodes, namely a first working node 41 and a second working node 42, wherein the first working node 41 is used for executing task data of a normal task fragment, and the second working node 42 is used for executing task data of a retry task fragment. For fast and efficient execution of the batching task, various types of work nodes may be set in plurality. For example, the first working node 41 may be set to 4, but not limited to, and the second working node 42 may be set to 2, but not limited to.
The working nodes are used for pulling the task data in the MQ queue, but the maximum number of the fragmented tasks which can be pulled is limited, and the task fragmentation can be executed in different working nodes through the limitation. The maximum number of processing tasks that can be simultaneously processed by the working node may be, but is not limited to, 20, that is, the working node can only obtain 20 task fragments at most from the MQ queue, and until one or more task fragments in the working node are executed, the new task fragment may not be continuously obtained from the MQ queue.
It can be understood that before executing the task fragment, the complete data of the task data needs to be queried from the storage layer 1 for storing the task data according to the primary key information in the task fragment, and then the preset service interface is called to execute the task fragment.
In one particular embodiment, the batching task is triggered automatically or manually by a job scheduling framework (quartz). After the batch running task is triggered, the scheduling center cluster 2 acquires task data corresponding to the batch running task from the storage layer 1, splits the task data according to the size of a preset task fragment to obtain a plurality of task fragments, and stores the task fragments into the sub message queues of the corresponding MQ queues according to the priorities of the task fragments. And each working node in the working node cluster 4 pulls the task fragment from the corresponding sub-message queue, and executes the task fragment by calling a preset service interface. It can be understood that the work node records the fragment task execution state data in the Redis, the scheduling center cluster can acquire the execution state data in the Redis through thread monitoring, and the scheduling center cluster can also store the execution state data in the MySQL database, so that the system or policy personnel can monitor the execution state data.
The task state data comprises task data and the number of successfully executed tasks in the slicing task, task data and the number of failed tasks and execution time of each task data. The fragmentation task comprises task data and main key information of the task data, wherein the main key information is a parameter representing uniqueness of the task data and is used as information of task data index, such as an identity number of a user in the task data.
In some embodiments, in order to avoid that task fragmentation is artificially interrupted, before a working node calls a preset service interface to execute task fragmentation, it is further required to detect whether execution of the task fragmentation is overtime or interrupted, if so, the execution of the task fragmentation is terminated, and execution state data of the task fragmentation is recorded as execution failure; if not, inquiring the task configuration information and the slicing task data.
Specifically, when each task data in the task fragment is executed, if one task data is interrupted or overtime, both the task data and the task state data of the task fragment are recorded as execution failure, and the task state data is sent to the Redis for recording.
The fragment data query refers to that the working node queries complete data of a task from a table of a database for storing task data or an ES index according to primary key data in the task fragment. The task configuration information includes: the task management method comprises the steps of task name, task fragment size, main key information of the task fragment, data storage place (database/ES), execution time, service interface, access condition, early warning rule and the like. The slicing task data comprises: the main key information of each task data in the task fragment, the fragment state data comprises: execution time, number of failures, number of successes.
It can be understood that, when marking the task segment, the primary key information of the task segment may be marked to distinguish the task segments.
In some embodiments, after querying that a task fragment is not interrupted or overtime, a working node starts to execute task data in the task fragment, before calling a preset service interface to execute the task data, the working node further needs to detect whether the task data is interrupted or overtime, if yes, the execution of the task fragment is terminated, and execution state data of the task data is recorded as execution failure; if not, checking whether the task data in the task fragment is repeated according to the primary key information, and if so, calling a preset service interface to execute any one of the repeated task data in the task fragment.
Specifically, before executing each task data in the task fragment, if detecting that the task data is interrupted or overtime, interrupting the execution of the task fragment, and marking the execution state data of the task data as execution failure according to the primary key information of the task data; if the interruption or overtime of the task data is not detected, judging whether the task data in the task fragment is repeated or not based on the primary key information of the task data in the task fragment, and if the repeated task data exists, calling a preset service interface to execute any one of the repeated task data in the task fragment; and if no repeated task data exists, calling a preset service interface through the working node to execute the task fragmentation. Therefore, repeated execution of the task data is avoided, the repeated task data is guaranteed to be executed only once, repeated labor is avoided, and execution efficiency of batch running tasks is improved.
Whether the task data are repeated or not is checked, set (storage operation) and get (query operation) can be changed into atomic operation through the lua script according to the characteristic of the Redis single thread, and misjudgment can be avoided under the condition of high concurrency.
In some embodiments, the scheduling center cluster 2 counts the execution state data of each task fragment through the monitoring thread, and when it is determined that the execution state data in the storage layer 1 includes the execution failure information, the second scheduling host 22 obtains the retry task fragment in the storage layer 1, and retransmits the retry task fragment to the second sub-message queue 32 in the MQ queue. And the retry task fragment is a task fragment corresponding to the execution failure information.
When the working node executes the data in the fragment, the execution state data of each task data in the task fragment is recorded in Redis. When the task fragment fails to be executed, the scheduling center cluster 2 is fed back, the scheduling center cluster 2 checks the execution state data of each piece of task data in the task fragment through the monitoring thread, if the execution failure information exists in the task fragment, the task fragment is marked as 'failure', and the failed task fragment is sent to the second message queue 32.
Further, the working node pulls the retry task fragment from the second sub-message queue 32, and invokes a preset service interface to execute the retry task fragment. The task interface is called again for the task fragment with failed execution, so that the execution success rate of the task data in the batch running task can be improved.
Furthermore, when the retry fragmentation task is executed again, the working node also judges whether the execution result of the task data in the retry task fragmentation is successful according to the primary key information to obtain a judgment result, and if the judgment result is not, a preset service interface is called to execute the task data corresponding to the primary key information; and if so, giving up calling the preset service interface.
Specifically, since the execution state of the task data in the retry task fragment is marked during the previous execution, when executing the task data in the retry task fragment, firstly, according to the primary key information of the task data, querying the task state data executed last time by the task data from the Redis, and based on a preset retry state code, judging whether the task data needs to be retried, if so, the working node calls the service interface again, and executes the task data corresponding to the primary key information; otherwise, the call is abandoned.
In some embodiments, the scheduling center cluster 2 is further configured to obtain execution state data stored in the storage layer 1, determine whether to trigger a task alarm according to the execution state data, and terminate execution of the batch task or generate alarm information if the task alarm is triggered.
Specifically, execution state data of the task fragments are stored in the second storage 12 of the storage layer 1, the execution state data in the batch running task execution process is recorded in Redis of the storage layer, the scheduling center cluster 2 acquires the execution state data in the Redis based on thread monitoring, whether a task alarm is triggered is judged, if yes, the alarm is triggered, the batch running task execution is terminated, or the scheduling center cluster 2 generates alarm information to prompt policy personnel that the batch running task execution fails.
The executing state data includes the number of task data which are executed successfully and the number of task data which are executed unsuccessfully, and the judging whether to trigger the task alarm specifically includes: judging whether the quantity of the task data failed to be executed exceeds a threshold value set by a preset alarm rule or not according to the preset alarm rule, and determining that a task alarm can be triggered; and if the threshold value is not exceeded, not triggering the task alarm.
In some embodiments, the working node is further configured to obtain the interface current limiting tool before the preset service interface is called to execute the task fragment, and if the interface current limiting tool is not obtained, obtain the interface current limiting tool again at intervals of a preset time period until the interface current limiting tool is obtained; and if the interface current limiting tool is obtained, executing a step of calling a preset service interface to execute the task fragment.
Because the system is a distributed and highly-concurrent calling service interface, in order to prevent the service system from being debugged and reduce the failed calling amount, the execution speed of the fragmentation task is controlled by setting a current-limiting operation. Typically, the working node uses a three-party guava throttling tool to smoothly invoke the service interface. Specifically, the working node tries to acquire the interface current limiting tool, and if the interface current limiting tool is not acquired, the current thread waits for a preset time period and then continuously tries to acquire the interface current limiting tool until the interface current limiting tool is acquired. And after the interface current limiting tool is obtained, calling a service interface to execute the task fragment.
The number of times of calling the interfaces per second by each working node is equal to the total number of times of calling the interfaces per second by the number of the working nodes, the interface current limiting tool can be but is not limited to an interface current limiting token, and the preset time period can be but is not limited to 50 milliseconds.
In some embodiments, after a task fragment is executed by a working node calling a preset service interface, the working node may further continue to determine whether to execute a next task fragment, if so, execute the next task fragment based on the related embodiments, and if not, send task execution state data of the task fragment to the Redis for recording.
The data batch running system provides a distributed scheduling mode for risk policy personnel and product personnel, so that tasks and the warning rules of the tasks can be configured by self without developers, and the development period of a new task is greatly shortened. The large task is automatically split into a plurality of small tasks according to the task configuration information through the dispatching center cluster 2, the small tasks are distributed to different working nodes through the MQ queues to be executed simultaneously, and the task execution time is reduced in a multiplied mode. In the task execution process, the Redis of the storage layer records the primary key information of the task data contained in the batch running task, so that the working node can quickly check whether the original data contained in the task fragment is repeated or not, and ensure that the repeated data is executed only once. In addition, when the task fragment in the working node retries execution, the task data which fails to execute in the task fragment can be quickly positioned according to the primary key information of the task data, and the task data which fails to execute is automatically retried or not retried according to the preset retry state code. The data batch running process is more automatic and faster, the execution efficiency of the tasks is improved, even if one node fails, the task fragments can be executed through other working nodes, the condition that the whole task fails due to the failure of one node in the task execution can not occur, the utilization rate of system resources is improved, and the waste of resources is avoided.
Based on the same concept, the embodiment of the present application provides a data batching method, and the specific implementation of the method may refer to the description of the system embodiment section, and repeated details are not repeated, as shown in fig. 3, the method includes:
step 301, after detecting that the batching task is triggered, acquiring task data corresponding to the batching task from the storage layer.
And 302, splitting the batch running task into a plurality of task fragments through the dispatching center cluster.
And step 303, storing the task fragments to a message queue.
And 304, pulling the task fragment in the message queue through at least one working node in the working node cluster, and calling a preset service interface to execute the task fragment.
In the embodiment of the application, the batch running task is divided into a plurality of task fragments by the dispatching center cluster, and each task fragment is executed by a plurality of working nodes, so that the tasks are synchronously executed, a task single-point execution mode in the prior art is replaced, the progress of task batch running is greatly accelerated, and the execution efficiency of the tasks is improved; in addition, even if one node fails, the task fragments can be executed through other working nodes, the condition that the whole task fails due to the failure of one node in the task execution can not occur, the utilization rate of system resources is improved, and the waste of resources is avoided.
Based on the same concept, an embodiment of the present application provides an electronic device, as shown in fig. 4, the electronic device mainly includes: a processor 401, a communication interface 402, a memory 403 and a communication bus 404, wherein the processor 401, the communication interface 402 and the memory 403 communicate with each other via the communication bus 404. Wherein, the memory 403 stores programs executable by the processor 401, and the processor 401 executes the programs stored in the memory 403, implementing the following steps:
after detecting that the batch running task is triggered, acquiring task data corresponding to the batch running task from a storage layer;
splitting a batch running task into a plurality of task fragments through a dispatching center cluster;
storing the task fragments to a message queue;
and pulling the task fragment in the message queue through at least one working node in the working node cluster, and calling a preset service interface to execute the task fragment.
The communication bus 404 mentioned in the above electronic device may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus 404 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in FIG. 4, but this does not indicate only one bus or one type of bus.
The communication interface 402 is used for communication between the above-described electronic apparatus and other apparatuses.
The Memory 403 may include a Random Access Memory (RAM) or a non-volatile Memory (non-volatile Memory), such as at least one disk Memory. Alternatively, the memory may be at least one memory device located remotely from the aforementioned processor 401.
The Processor 401 may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), etc., and may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic devices, discrete gates or transistor logic devices, and discrete hardware components.
In yet another embodiment of the present application, there is also provided a computer-readable storage medium having stored therein a computer program, which when run on a computer, causes the computer to execute the data batching system described in the above embodiments.
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the application to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored on a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wire (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wirelessly (e.g., infrared, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that includes one or more of the available media. The available media may be magnetic media (e.g., floppy disks, hard disks, tapes, etc.), optical media (e.g., DVDs), or semiconductor media (e.g., solid state drives), among others.
It is noted that, in this document, relational terms such as "first" and "second," and the like, may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The foregoing are merely exemplary embodiments of the present invention, which enable those skilled in the art to understand or practice the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (12)

1. A data batching system, comprising: the system comprises a storage layer, a dispatching center cluster, a message queue and a working node cluster;
the scheduling center cluster is used for acquiring task data corresponding to the batch running task from the storage layer after the batch running task is triggered, dividing the task data into a plurality of task fragments and storing the task fragments to the message queue;
the message queue is used for storing the task fragments;
the working node cluster comprises at least one working node, and each working node is used for pulling the task fragment in the message queue and calling a preset service interface to execute the task fragment.
2. The data batching system according to claim 1, wherein said work node is further configured to:
before the task fragment is executed by calling a preset service interface, detecting whether the execution of the task fragment is overtime or interrupted;
and if so, terminating the execution of the task fragments.
3. The data batching system according to claim 2, wherein said work node is further configured to:
and after the execution of the task fragments is terminated, recording the execution state data of the task fragments, and storing the execution state data to the storage layer.
4. The data batching system according to claim 3, wherein said storage layer comprises a first memory and a second memory;
the first memory is used for storing the task data;
the second memory is used for storing the execution state data.
5. The data batching system according to claim 3 or 4, wherein said execution state data includes execution failure information and execution success information;
the dispatching center cluster is further configured to acquire a retry task fragment from the storage layer and resend the retry task fragment to the message queue, where the retry task fragment is a task fragment corresponding to the execution failure information;
and the working node is further configured to pull the retry task fragment from the message queue and call the preset service interface to execute the retry task fragment.
6. The data batching system according to claim 5, wherein said task data includes primary key information for said task data;
the working node is used for judging whether task data in the retry task fragment needs to be retried according to the primary key information to obtain a judgment result;
if the judgment result is negative, calling the preset service interface to execute the task data corresponding to the main key information;
and if the judgment result is yes, giving up calling the preset service interface.
7. The data batching system according to claim 1, wherein said work node is further configured to: and before the task data comprises the primary key information of the task data and the preset service interface is called to execute the task fragment, checking whether the task data in the task fragment is repeated or not according to the primary key information, and if so, calling the preset service interface to execute any one of the repeated task data in the task fragment.
8. The data batching system according to claim 3, wherein said dispatch center cluster is further configured to obtain said execution state data stored by said storage tier;
judging whether a task alarm is triggered according to the execution state data;
if yes, the batch running task is stopped to be executed or alarm information is generated.
9. The data batching system according to claim 1, wherein said work node is further configured to: acquiring an interface current limiting tool before the preset service interface is called to execute the task fragment, and if the interface current limiting tool is not acquired, acquiring the interface current limiting tool again at intervals of a preset time period until the interface current limiting tool is acquired; and if the interface current limiting tool is obtained, executing the step of calling a preset service interface to execute the task fragment.
10. A data batching method, comprising:
after detecting that a batch running task is triggered, acquiring task data corresponding to the batch running task from a storage layer;
splitting the batch running task into a plurality of task fragments through a dispatching center cluster;
storing the task fragments to a message queue;
and pulling the task fragment in the message queue through at least one working node in the working node cluster, and calling a preset service interface to execute the task fragment.
11. An electronic device, comprising: the system comprises a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory are communicated with each other through the communication bus;
the memory for storing a computer program;
the processor, executing the program stored in the memory, implementing the data batching method of claim 10.
12. A computer-readable storage medium storing a computer program, wherein the computer program when executed by a processor implements the data batching method of claim 10.
CN202010990015.2A 2020-09-18 2020-09-18 Data batching system, method, electronic device and storage medium Pending CN112148505A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010990015.2A CN112148505A (en) 2020-09-18 2020-09-18 Data batching system, method, electronic device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010990015.2A CN112148505A (en) 2020-09-18 2020-09-18 Data batching system, method, electronic device and storage medium

Publications (1)

Publication Number Publication Date
CN112148505A true CN112148505A (en) 2020-12-29

Family

ID=73893236

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010990015.2A Pending CN112148505A (en) 2020-09-18 2020-09-18 Data batching system, method, electronic device and storage medium

Country Status (1)

Country Link
CN (1) CN112148505A (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112819600A (en) * 2021-02-25 2021-05-18 深圳前海微众银行股份有限公司 Timed task execution method, timed task execution device, timed task execution equipment and computer storage medium
CN113179331A (en) * 2021-06-11 2021-07-27 苏州大学 Distributed special protection service scheduling method facing mobile edge calculation
CN113176967A (en) * 2021-04-29 2021-07-27 北京奇艺世纪科技有限公司 Data processing method and device, electronic equipment and computer readable storage medium
CN113485812A (en) * 2021-07-23 2021-10-08 重庆富民银行股份有限公司 Partition parallel processing method and system based on large data volume task
CN113835859A (en) * 2021-09-24 2021-12-24 成都质数斯达克科技有限公司 Task scheduling method, device and equipment and readable storage medium
CN114240109A (en) * 2021-12-06 2022-03-25 中电金信软件有限公司 Method, device and system for cross-region processing batch running task
CN117251508A (en) * 2023-09-22 2023-12-19 湖南长银五八消费金融股份有限公司 Borrowing batch accounting method, device, equipment and storage medium
CN117421111A (en) * 2023-10-16 2024-01-19 广州今之港教育咨询有限公司 Batch processing method, system, equipment and medium for message tasks

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106484886A (en) * 2016-10-17 2017-03-08 金蝶软件(中国)有限公司 A kind of method of data acquisition and its relevant device
US9672274B1 (en) * 2012-06-28 2017-06-06 Amazon Technologies, Inc. Scalable message aggregation
CN107977275A (en) * 2017-12-05 2018-05-01 腾讯科技(深圳)有限公司 Task processing method and relevant device based on message queue
CN108228326A (en) * 2017-12-29 2018-06-29 深圳乐信软件技术有限公司 Batch tasks processing method and distributed system
US20180212857A1 (en) * 2017-01-26 2018-07-26 International Business Machines Corporation Proactive channel agent
CN108762931A (en) * 2018-05-31 2018-11-06 康键信息技术(深圳)有限公司 Method for scheduling task, server based on distributed scheduling system and storage medium
US20180375783A1 (en) * 2017-06-27 2018-12-27 Atlassian Pty Ltd Retry handling in messaging queues
CN110362401A (en) * 2019-06-20 2019-10-22 深圳壹账通智能科技有限公司 Data run the member host in batch method, apparatus, storage medium and cluster
CN111160873A (en) * 2019-12-31 2020-05-15 中国银行股份有限公司 Batch processing device and method based on distributed architecture

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9672274B1 (en) * 2012-06-28 2017-06-06 Amazon Technologies, Inc. Scalable message aggregation
CN106484886A (en) * 2016-10-17 2017-03-08 金蝶软件(中国)有限公司 A kind of method of data acquisition and its relevant device
US20180212857A1 (en) * 2017-01-26 2018-07-26 International Business Machines Corporation Proactive channel agent
US20180375783A1 (en) * 2017-06-27 2018-12-27 Atlassian Pty Ltd Retry handling in messaging queues
CN107977275A (en) * 2017-12-05 2018-05-01 腾讯科技(深圳)有限公司 Task processing method and relevant device based on message queue
CN108228326A (en) * 2017-12-29 2018-06-29 深圳乐信软件技术有限公司 Batch tasks processing method and distributed system
CN108762931A (en) * 2018-05-31 2018-11-06 康键信息技术(深圳)有限公司 Method for scheduling task, server based on distributed scheduling system and storage medium
CN110362401A (en) * 2019-06-20 2019-10-22 深圳壹账通智能科技有限公司 Data run the member host in batch method, apparatus, storage medium and cluster
CN111160873A (en) * 2019-12-31 2020-05-15 中国银行股份有限公司 Batch processing device and method based on distributed architecture

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112819600A (en) * 2021-02-25 2021-05-18 深圳前海微众银行股份有限公司 Timed task execution method, timed task execution device, timed task execution equipment and computer storage medium
CN112819600B (en) * 2021-02-25 2024-06-07 深圳前海微众银行股份有限公司 Method, device, equipment and computer storage medium for executing timing task
CN113176967A (en) * 2021-04-29 2021-07-27 北京奇艺世纪科技有限公司 Data processing method and device, electronic equipment and computer readable storage medium
CN113176967B (en) * 2021-04-29 2023-08-15 北京奇艺世纪科技有限公司 Data processing method and device, electronic equipment and computer readable storage medium
CN113179331A (en) * 2021-06-11 2021-07-27 苏州大学 Distributed special protection service scheduling method facing mobile edge calculation
CN113179331B (en) * 2021-06-11 2022-02-11 苏州大学 Distributed special protection service scheduling method facing mobile edge calculation
CN113485812A (en) * 2021-07-23 2021-10-08 重庆富民银行股份有限公司 Partition parallel processing method and system based on large data volume task
CN113485812B (en) * 2021-07-23 2023-12-12 重庆富民银行股份有限公司 Partition parallel processing method and system based on large-data-volume task
CN113835859B (en) * 2021-09-24 2023-12-26 成都质数斯达克科技有限公司 Task scheduling method, device, equipment and readable storage medium
CN113835859A (en) * 2021-09-24 2021-12-24 成都质数斯达克科技有限公司 Task scheduling method, device and equipment and readable storage medium
CN114240109A (en) * 2021-12-06 2022-03-25 中电金信软件有限公司 Method, device and system for cross-region processing batch running task
CN117251508A (en) * 2023-09-22 2023-12-19 湖南长银五八消费金融股份有限公司 Borrowing batch accounting method, device, equipment and storage medium
CN117421111A (en) * 2023-10-16 2024-01-19 广州今之港教育咨询有限公司 Batch processing method, system, equipment and medium for message tasks

Similar Documents

Publication Publication Date Title
CN112148505A (en) Data batching system, method, electronic device and storage medium
US8660995B2 (en) Flexible event data content management for relevant event and alert analysis within a distributed processing system
US9344381B2 (en) Event management in a distributed processing system
US8730816B2 (en) Dynamic administration of event pools for relevant event and alert analysis during event storms
US9246865B2 (en) Prioritized alert delivery in a distributed processing system
US9419650B2 (en) Flexible event data content management for relevant event and alert analysis within a distributed processing system
US8499203B2 (en) Configurable alert delivery in a distributed processing system
US8689050B2 (en) Restarting event and alert analysis after a shutdown in a distributed processing system
US9201756B2 (en) Administering event pools for relevant event analysis in a distributed processing system
CN110365762B (en) Service processing method, device, equipment and storage medium
US20140040673A1 (en) Administering Incident Pools For Incident Analysis
JPH10501907A (en) Method and apparatus for monitoring and controlling programs in a network
US20230029198A1 (en) Scheduling complex jobs in a distributed network
US8103905B2 (en) Detecting and recovering from process failures
CN109726151B (en) Method, apparatus, and medium for managing input-output stack
CN107426012B (en) Fault recovery method and device based on super-fusion architecture
US20110302192A1 (en) Systems and methods for first data capture through generic message monitoring
CN115580522A (en) Method and device for monitoring running state of container cloud platform
US20180123866A1 (en) Method and apparatus for determining event level of monitoring result
CN115729727A (en) Fault repairing method, device, equipment and medium
CN106484536B (en) IO scheduling method, device and equipment
US11687269B2 (en) Determining data copy resources
US11941432B2 (en) Processing system, processing method, higher-level system, lower-level system, higher-level program, and lower-level program
CN117707740A (en) Distributed big data task scheduling method and computer equipment
CN111858047A (en) File interaction method, device and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: Room 221, 2 / F, block C, 18 Kechuang 11th Street, Daxing District, Beijing, 100176

Applicant after: Jingdong Technology Holding Co.,Ltd.

Address before: Room 221, 2 / F, block C, 18 Kechuang 11th Street, Daxing District, Beijing, 100176

Applicant before: Jingdong Digital Technology Holding Co., Ltd