CN111459659A - Data processing method, device, scheduling server and medium - Google Patents

Data processing method, device, scheduling server and medium Download PDF

Info

Publication number
CN111459659A
CN111459659A CN202010161333.8A CN202010161333A CN111459659A CN 111459659 A CN111459659 A CN 111459659A CN 202010161333 A CN202010161333 A CN 202010161333A CN 111459659 A CN111459659 A CN 111459659A
Authority
CN
China
Prior art keywords
task
subtask
target
processing
processed
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010161333.8A
Other languages
Chinese (zh)
Inventor
蔡高年
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Life Insurance Company of China Ltd
Original Assignee
Ping An Life Insurance Company of China Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Life Insurance Company of China Ltd filed Critical Ping An Life Insurance Company of China Ltd
Priority to CN202010161333.8A priority Critical patent/CN111459659A/en
Publication of CN111459659A publication Critical patent/CN111459659A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/5017Task decomposition

Abstract

The embodiment of the invention discloses a data processing method, a data processing device, a scheduling server and a medium, wherein the method comprises the following steps: the method comprises the steps of receiving a task to be processed sent by a terminal, splitting the task to be processed according to a target splitting strategy to obtain at least one subtask if the task to be processed meets a preset splitting condition, adding an identifier for each subtask, sending each subtask with the identifier added to a corresponding execution server, receiving a processing result returned by the execution server corresponding to each subtask, merging the processing results returned by the execution servers corresponding to each subtask according to the identifier, and obtaining a target processing result for the task to be processed. By implementing the method, the data can be split, and the efficiency of data processing is improved.

Description

Data processing method, device, scheduling server and medium
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a data processing method, an apparatus, a scheduling server, and a medium.
Background
With the development of computer technology, computers can be used for processing various tasks, but with the improvement of computer performance, the complexity of tasks is increased day by day, so that the tasks can be processed by adopting a computer cluster.
The aggregation task can cause a huge aggregation script, so that the subsequent script maintenance cost is increased dramatically, a new service statistical logic is provided subsequently, or the original calculation logic needs to be changed, the problem is more likely to occur when the huge script is modified, and the switching principle of software design is not satisfied, so that the problem to be solved urgently is how to improve the processing efficiency of the aggregation task which occupies larger resources.
Disclosure of Invention
The embodiment of the invention provides a data processing method, a data processing device, a scheduling server and a medium, which can split data and improve the efficiency of data processing.
In a first aspect, an embodiment of the present invention provides a data processing method, where the method includes:
receiving a task to be processed sent by a terminal;
if the task to be processed meets a preset splitting condition, splitting the task to be processed according to a target splitting strategy to obtain at least one subtask, wherein the target splitting strategy is used for determining a target splitting node for the task to be processed and a corresponding relation between each subtask obtained by splitting and an execution server used for task processing;
adding an identifier for each subtask, and sending each subtask with the identifier added to a corresponding execution server, so that the execution server corresponding to each subtask processes each subtask;
receiving a processing result returned by the execution server corresponding to each subtask, wherein the processing result returned by the execution server corresponding to each subtask contains the identifier;
and merging the processing results returned by the execution server corresponding to each subtask according to the identification to obtain a target processing result aiming at the to-be-processed task.
In a second aspect, an embodiment of the present invention provides a data processing apparatus, where the apparatus includes:
the receiving module is used for receiving the tasks to be processed sent by the terminal;
the splitting module is used for splitting the task to be processed according to a target splitting strategy to obtain at least one subtask if the task to be processed meets a preset splitting condition, wherein the target splitting strategy is used for determining a target splitting node for the task to be processed and a corresponding relation between each subtask obtained by splitting and an execution server for task processing;
the sending module is used for adding an identifier for each subtask and sending each subtask with the identifier added to a corresponding execution server so that the execution server corresponding to each subtask can process each subtask;
the receiving module is further configured to receive a processing result returned by the execution server corresponding to each subtask, where the processing result returned by the execution server corresponding to each subtask includes the identifier;
and the merging module is used for merging the processing results returned by the execution servers corresponding to the sub-tasks according to the identification to obtain target processing results for the tasks to be processed.
In a third aspect, an embodiment of the present invention provides a scheduling server, including a processor, an input interface, an output interface, and a memory, where the processor, the input interface, the output interface, and the memory are connected to each other, where the memory is used to store a computer program, and the computer program includes program instructions, and the processor is configured to call the program instructions to execute the method according to the first aspect.
In a fourth aspect, the present invention provides a computer-readable storage medium, wherein the computer-readable storage medium stores a computer program, and the computer program includes program instructions, which, when executed by a processor, cause the processor to execute the method of the first aspect.
In the embodiment of the invention, a scheduling server receives a task to be processed sent by a terminal, if the task to be processed meets a preset splitting condition, the scheduling server splits the task to be processed according to a target splitting strategy to obtain at least one subtask, adds an identifier for each subtask, sends each subtask with the identifier added to a corresponding execution server, receives a processing result returned by the execution server corresponding to each subtask, and combines the processing results returned by the execution servers corresponding to each subtask according to the identifier to obtain a target processing result for the task to be processed. By implementing the method, the data can be split, and the efficiency of data processing is improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a block diagram of a data processing system according to an embodiment of the present invention;
FIG. 2 is a flow chart of another data processing method provided by the embodiment of the invention;
FIG. 3 is a flow chart of another data processing method provided by the embodiment of the invention;
FIG. 4 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of a scheduling server according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, a data processing system according to an embodiment of the present invention is provided, including: the system comprises a scheduling server, at least one terminal connected with the scheduling server and a device cluster connected with the scheduling server, wherein the device cluster comprises at least one execution server. Among others, terminals may include, but are not limited to: the mobile phone, the tablet computer, the computer and the like are used for interacting with the user, for example, receiving a task to be processed uploaded by the user; and for interacting with the scheduling server, such as sending the pending task to the scheduling server to request data processing services, and so on. The scheduling server is configured to receive a to-be-processed task sent by the terminal, allocate the to-be-processed task to each execution server, and specifically, after splitting the to-be-processed task into each sub-task based on a certain splitting policy, send each sub-task to each execution server, so that each execution server processes the sub-task to obtain a processing result. The execution server is used for processing the received tasks and returning processing results to the scheduling server. The scheduling server can also combine the processing results for each subtask returned by the execution server to obtain a target processing result for the task to be processed.
Based on the above description of the embodiment of the data processing system, an embodiment of the present invention provides a data processing method, which may be applied to the data processing system shown in fig. 1, and may be specifically executed by the scheduling server shown in fig. 1. Referring to fig. 2, the data processing procedure may include the following steps S201 to S205:
s201, the scheduling server receives a task to be processed sent by the terminal.
In the embodiment of the invention, the task to be processed can be input by a user through the terminal, the terminal sends the task to be processed to the scheduling server after detecting the task to be processed input by the user, and the scheduling server receives the task to be processed sent by the terminal. The task to be processed may be a data aggregation task with dozens of leaf computations, and may be classified into a retrieval type task, a computation type task, a composite type task, and the like according to types, where the composite type task includes a plurality of task modules, such as a computation module and a retrieval module.
S202, if the task to be processed meets the preset splitting condition, the scheduling server splits the task to be processed according to the target splitting strategy to obtain at least one subtask.
In the embodiment of the invention, after receiving a task to be processed, a scheduling server detects whether the task to be processed meets a preset splitting condition, wherein the preset splitting condition can be that the task to be processed comprises a plurality of detachable nodes, each detachable node comprises a connection point between different processing modules in the task to be processed, each detachable node is used for splitting the task to be processed to obtain a subtask, namely X detachable nodes can split the task to be processed into X subtasks, and X is a positive integer.
In one implementation, the detachable node of the to-be-processed task is pre-input in the terminal by a user, and the scheduling server receives information of the detachable node in the to-be-processed task while receiving the to-be-processed task sent by the terminal, for example, the to-be-processed task is a computing type task, and the user pre-sets that the computing type task includes 3 detachable nodes for detaching the computing type task into 3 computing type subtasks, where each computing type subtask may correspond to different computing contents and does not interfere with each other. For another example, the task to be processed is a retrieval-type task, and the user preset that the retrieval-type task includes 2 detachable nodes, which are used to detach the retrieval-type task into 2 retrieval-type subtasks, which are a first retrieval subtask and a second retrieval subtask, respectively, where the first retrieval subtask may be retrieved in the first database, and the second retrieval subtask may be retrieved in the second database. Or, the retrieval-type task is used for retrieving "apple" and "grape", the first retrieval sub-task may retrieve "apple" correspondingly, the second sub-task may retrieve "grape" correspondingly, and for example, the to-be-processed task is a composite task, and the user presets that the composite task includes 2 detachable nodes for detaching the to-be-processed task into a calculation-type sub-task and a retrieval-type sub-task.
In one implementation, the detachable node of the to-be-processed task may be detected and determined by the scheduling server, for example, the to-be-processed task is a computation type task, and the scheduling server may detect a computation module included in the computation type task and determine a connection point between the computation modules as a detachable node. For another example, for a retrieval-type task, the scheduling server may detect different retrieval contents corresponding to the retrieval-type task, and determine a connection node between the different contents as a detachable node, and for a composite task, the scheduling server may detect a computation module and a retrieval module in the composite task, and determine a connection point between the different modules as a detachable node.
Further, if the scheduling server detects that the to-be-processed task meets the splitting condition, splitting the to-be-processed task according to a target splitting strategy to obtain at least one subtask. The target splitting strategy is used for determining a target splitting node for the task to be processed and a corresponding relation between each subtask obtained by splitting processing and an execution server used for task processing.
In a specific implementation, the specific determination manner of the target splitting policy may include that the scheduling server obtains information of the detachable nodes in the tasks to be processed and the execution server for processing the tasks, where the information includes the number and the remaining load capacity, and the scheduling server determines the target splitting policy according to the information of the detachable nodes and the execution server. Specifically, the scheduling server obtains that the number of the detachable nodes in the task to be processed is K, the number of the execution servers for processing the task is M, and K and M are positive integers,
in one implementation manner, if the number M of the execution servers is greater than or equal to the number K of the detachable nodes, the scheduling server determines the K detachable nodes as target detachable nodes for the task to be processed, selects the K execution servers with the largest remaining load from the M execution servers as target execution servers, and establishes the corresponding relationship between the K subtasks and the K target execution servers according to the principle that the task amount is matched with the remaining load. The K detachable nodes are configured to detach the task to be processed into K subtasks, where the task amount may specifically be a memory amount occupied by the processing task, and the remaining load amount may specifically be a remaining memory amount. The specific way that the scheduling server establishes the corresponding relation between the K subtasks and the K target execution servers according to the principle that the task amount is matched with the residual load amount is that the scheduling server obtains the task amount of each of the K subtasks and the residual load amount of each of the K target execution servers, the scheduling server sorts the K subtasks according to the sequence of the task amounts from large to small to obtain a first sequence set, and sorts the K target execution servers according to the sequence of the residual load amounts from large to small to obtain a second sequence set, and the scheduling server determines that the subtasks sorted to the I position in the first sequence set correspond to the subtasks sorted to the I position in the second sequence set, wherein I is a positive integer. For example, the to-be-processed task includes 2 detachable nodes, which are respectively a node 1 and a node 2, the number of the execution servers is 3, and the execution servers are respectively an execution server 1, an execution server 2, and an execution server 3, the scheduling server determines that the number of the execution servers is greater than the number of the detachable nodes, and determines the node 1 and the node 2 as target detachable nodes, where the node 1 is configured to detach the to-be-processed task to obtain a first sub-task, the node 2 is configured to detach the to-be-processed task to obtain a second sub-task, the task amount corresponding to the first sub-task is 30, the task amount corresponding to the second sub-task is 20, the remaining load amount of the execution server 1 is 200, the remaining load amount of the execution server 2 is 150, and the remaining load amount of the execution server 3 is 100, the scheduling server determines the execution server 1 and the execution server 2 as target execution, and determines that the first subtask corresponds to the execution server 1 and the second subtask corresponds to the execution server 2.
In one implementation mode, if the number M of the execution servers is smaller than the number K of the detachable nodes, the scheduling server selects M detachable nodes from the K detachable nodes as target detachable nodes, and determines the task amount of a subtask corresponding to each detachable node in the M detachable nodes and the residual load amount of each execution server in the M execution servers; the scheduling server establishes a corresponding relation between the M subtasks and the M target execution servers according to a principle that the task amount is matched with the residual load amount, wherein the M detachable nodes are used for detaching the task to be processed into the M subtasks, and the specific mode of selecting the target detachable nodes from the detachable nodes by the scheduling server can be random selection or equal interval selection. The specific mode that the scheduling server establishes the corresponding relation between the M subtasks and the M target execution servers according to the principle that the task amount is matched with the residual load capacity comprises the following steps that the scheduling server obtains the task amount of each subtask in the M subtasks and the residual load capacity of each execution server in the M target execution servers; the scheduling server sequences the M subtasks according to the sequence of the task amount from large to small to obtain a third sequence set; sequencing the M target execution servers according to the sequence of the residual load from large to small to obtain a fourth sequence set; and the scheduling server determines that the subtask ordered to the I-th bit in the third sequence set corresponds to the subtask ordered to the I-th bit in the fourth sequence set, wherein I is a positive integer. For example, the task to be retrieved includes 3 detachable nodes, which are respectively a node 1, a node 2, and a node 3, the number of execution servers is 2, and is respectively an execution server 1 and an execution server 2, the scheduling server determines that the number of execution servers is smaller than the number of detachable nodes, and randomly determines the node 1 and the node 2 as target detachable nodes, where the node 1 is configured to detach the task to be processed to obtain a first sub-task, the node 2 is configured to detach the task to be processed to obtain a second sub-task, the task amount corresponding to the first sub-task is 50, the task amount corresponding to the second sub-task is 30, the remaining load amount of the execution server 1 is 200, and the remaining load amount of the execution server 2 is 150, the scheduling server determines that the execution server 1 and the execution server 2 are target execution servers, and determines that the first sub-task corresponds to the execution server 1, the second subtask corresponds to the execution server 2.
Further, after the scheduling server determines a target splitting strategy for the task to be processed, the task to be processed is split according to the target splitting strategy, and at least one subtask is obtained.
S203, the scheduling server adds an identifier for each subtask and sends each subtask after identification to a corresponding execution server.
In the embodiment of the invention, the scheduling server splits the task to be processed according to the target splitting strategy to obtain at least one subtask, adds an identifier to each subtask, and sends each subtask with the identifier added to the corresponding execution server, so that the execution server corresponding to each subtask processes each subtask.
In a specific implementation, the identifier is used to indicate a source of the subtask, that is, the subtask is specifically split by which to-be-processed task, for example, for the to-be-processed task 1, the corresponding identifier is "2019", and the to-be-processed task 1 is split to obtain the subtask 1 and the subtask 2, and then both the subtask 1 and the subtask 2 include the identifier "2019". Optionally, in order to embody the sequence between the respective subtasks, the sequence between the subtasks may also be added to the identifier, for example, for the subtask 1, the corresponding identifier is "2019-1", and for the subtask 2, the corresponding identifier is "2019-2".
And S204, the scheduling server receives the processing result returned by the execution server corresponding to each subtask.
In the embodiment of the invention, after the scheduling server sends the subtasks to the corresponding execution server, the execution server can process each subtask to obtain a processing result, wherein the processing result contains the identifier of the subtask, each execution server sends the processing result of the corresponding subtask to the scheduling server, and the scheduling server receives each processing result.
And S205, the scheduling server performs merging processing on the processing results returned by the execution server corresponding to each subtask according to the identifier to obtain a target processing result for the to-be-processed task.
In the embodiment of the invention, after receiving the processing result returned by the execution server corresponding to each subtask, the scheduling server performs merging processing on the processing result returned by the execution server corresponding to each subtask according to the identifier to obtain the target processing result for the task to be processed. Namely, the scheduling server merges the processing results containing the same identifier to obtain the processing result for the task to be processed.
In the embodiment of the invention, a scheduling server receives a task to be processed sent by a terminal, if the task to be processed meets a preset splitting condition, the scheduling server splits the task to be processed according to a target splitting strategy to obtain at least one subtask, adds an identifier for each subtask, sends each subtask with the identifier added to a corresponding execution server, receives a processing result returned by the execution server corresponding to each subtask, and combines the processing results returned by the execution servers corresponding to each subtask according to the identifier to obtain a target processing result for the task to be processed. By implementing the method, the data can be split, and the efficiency of data processing is improved.
Based on the description of the above embodiment of the data processing method, an embodiment of the present invention provides another data processing method, which may be applied to the data processing system shown in fig. 1, and may be specifically executed by the scheduling server shown in fig. 1. Referring to fig. 3, the data processing procedure may include the following steps S301 to S308:
s301, the scheduling server receives the to-be-processed task sent by the terminal.
In the embodiment of the invention, the task to be processed can be input by a user through the terminal, the terminal sends the task to be processed to the scheduling server after detecting the task to be processed input by the user, and the scheduling server receives the task to be processed sent by the terminal.
S302, if the task to be processed meets the preset splitting condition, the scheduling server splits the task to be processed according to the target splitting strategy to obtain at least one subtask.
In the embodiment of the invention, the target splitting strategy is used for determining the target splitting node for the task to be processed and the corresponding relation between each subtask obtained by splitting processing and the execution server used for task processing. The scheduling server may split the to-be-processed task at the target splitting node indicated by the target splitting policy to obtain at least one subtask, and determine an execution server corresponding to each subtask according to the target splitting policy.
S303, the scheduling server adds an identifier for each subtask and sends each subtask after identification to the corresponding execution server.
In the embodiment of the invention, the scheduling server splits the task to be processed according to the target splitting strategy to obtain at least one subtask, adds an identifier to each subtask, and sends each subtask with the identifier added to the corresponding execution server, so that the execution server corresponding to each subtask processes each subtask.
S304, the scheduling server monitors the processing state of the target execution server on the target subtasks.
In the embodiment of the invention, after the scheduling server sends each identified subtask to the corresponding execution server, the processing state of the target execution server on the target subtask is monitored, wherein the target subtask is any one of at least one subtask, the target execution server is the execution server corresponding to the target subtask and is used for processing the target subtask, a detachable node exists in the target subtask, and the detachable node is used for splitting the target subtask into a first subtask and a second target subtask.
S305, if the processing state of the target subtask is detected to meet the secondary splitting condition, the scheduling server performs secondary splitting processing on the target subtask to obtain a first target subtask and a second target subtask.
In the embodiment of the present invention, the secondary splitting condition includes that the processing state of the target server for the second target subtask in the target subtask is unprocessed, and an idle execution server currently exists. And if the scheduling server detects that the processing state of the target subtask meets the secondary splitting condition, the scheduling server performs secondary splitting processing on the target subtask to obtain a first target subtask and a second target subtask.
S306, the scheduling server sends an instruction for suspending processing of a second target subtask in the target subtask to the target execution server, and sends the second target subtask to the idle execution server, so that the idle execution server processes the second target subtask.
In the embodiment of the invention, after the scheduling server splits the target subtasks, the scheduling server sends an instruction for suspending processing of a second target subtask in the target subtasks to the target execution server, so that the target execution server suspends the execution of the target subtask, and sends the second target subtask to the idle execution server, so that the idle execution server processes the second target subtask. That is, the scheduling server monitors the processing conditions of each execution server for the subtasks in real time, when a target execution server has a slow processing progress for the target subtask and an idle server currently exists, the scheduling server splits the target subtask with the slow processing progress again, and enables the target execution server to continue to execute a part of content in the target subtask and execute another part of content in the target subtask by using the idle execution server. By implementing the method, the task can be split for many times, and the processing efficiency of the task is further improved.
And S307, the scheduling server receives the processing result returned by the execution server corresponding to each subtask.
In the embodiment of the invention, after the scheduling server sends the subtasks to the corresponding execution server, the execution server can process each subtask to obtain a processing result, wherein the processing result contains the identifier of the subtask, each execution server sends the processing result of the corresponding subtask to the scheduling server, and the scheduling server receives each processing result.
And S308, the scheduling server performs merging processing on the processing results returned by the execution server corresponding to each subtask according to the identification to obtain a target processing result aiming at the task to be processed.
In the embodiment of the invention, after receiving the processing result returned by the execution server corresponding to each subtask, the scheduling server performs merging processing on the processing result returned by the execution server corresponding to each subtask according to the identifier to obtain the target processing result for the task to be processed. Namely, the scheduling server merges the processing results containing the same identifier to obtain the processing result for the task to be processed.
Furthermore, the scheduling server can also obtain the processing time used for obtaining the target processing result; detecting whether the processing time length is less than a preset time length or not; if so, the scheduling server determines the target splitting strategy as an optimal splitting strategy and stores the optimal splitting strategy in a preset database; when the scheduling server receives a new processing task, detecting whether the new processing task is matched with the task to be processed, wherein the processing sequence of the new processing task is behind the task to be processed; and if so, the scheduling server determines the preferred splitting strategy as a splitting strategy aiming at the new processing task, and splits the new processing task based on the preferred splitting strategy. The specific way for the scheduling server to detect whether the new processing task is matched with the to-be-processed task may be that the scheduling server obtains a first task quantity corresponding to the new processing task and a first number of the detachable nodes, and obtains a second task quantity corresponding to the to-be-processed task and a second number of the detachable nodes; and if the first quantity is the same as the second quantity and the difference value between the first task quantity and the second task quantity is smaller than a preset difference value, determining that the new processing task is matched with the task to be processed. When the scheduling server receives similar tasks again, the previous preferred strategy is directly determined as the splitting strategy aiming at the new processing task, so that the time used for determining the splitting strategy is saved, and the data processing efficiency is improved.
In the embodiment of the invention, a scheduling server receives a task to be processed sent by a terminal; if the task to be processed meets the preset splitting condition, splitting the task to be processed according to a target splitting strategy to obtain at least one subtask, adding an identifier for each subtask by a scheduling server, and sending each subtask with the identifier added to a corresponding execution server, further, in the process that the execution server processes the subtask, monitoring the processing progress of the execution server in real time by the scheduling server, performing secondary splitting on the subtask with slower processing, receiving a processing result returned by the execution server corresponding to each subtask by the scheduling server, wherein the processing result returned by the execution server corresponding to each subtask contains the identifier of the subtask; and merging the processing results returned by the execution servers corresponding to each subtask according to the identification to obtain a target processing result aiming at the to-be-processed task. By the method, the tasks can be split, and the efficiency of data processing is improved.
Based on the description of the data processing method embodiment, the embodiment of the invention also discloses a data processing device. The data processing device may be a computer program (including program code) running in the dispatch server or may be a physical device included in the dispatch server. The data processing apparatus may perform the methods illustrated in fig. 2 and 3. Referring to fig. 4, the data processing apparatus 40 includes: the device comprises a receiving module 401, a splitting module 402, a sending module 403, a merging module 404, an obtaining module 405 and a determining module 406.
A receiving module 401, configured to receive a to-be-processed task sent by a terminal;
a splitting module 402, configured to split the to-be-processed task according to a target splitting policy to obtain at least one sub-task if the to-be-processed task meets a preset splitting condition, where the target splitting policy is used to determine a target splitting node for the to-be-processed task and a corresponding relationship between each sub-task obtained by splitting and an execution server used for task processing;
a sending module 403, configured to add an identifier to each subtask, and send each subtask with the identifier added to a corresponding execution server, so that the execution server corresponding to each subtask processes each subtask;
the receiving module 401 is further configured to receive a processing result returned by the execution server corresponding to each subtask, where the processing result returned by the execution server corresponding to each subtask includes the identifier;
and a merging module 404, configured to merge processing results returned by the execution server corresponding to each subtask according to the identifier, so as to obtain a target processing result for the to-be-processed task.
In an implementation manner, the obtaining module 405 is configured to obtain information of a detachable node and an execution server for task processing in the task to be processed, where the information includes a quantity and a remaining load amount, and the detachable node includes a connection point between different processing modules in the task to be processed;
a determining module 406, configured to determine the target splitting policy according to the information of the detachable node and the execution server.
In an implementation manner, the number of the detachable nodes is K, the number of the execution servers for task processing is M, and K and M are positive integers, and the determining module 406 is specifically configured to:
if the number M of the execution servers is larger than or equal to the number K of the detachable nodes, determining the K detachable nodes as target detachable nodes aiming at the task to be processed, wherein the K detachable nodes are used for detaching the task to be processed into K subtasks;
selecting K execution servers with the maximum residual load capacity from the M execution servers as target execution servers;
and establishing a corresponding relation between the K subtasks and the K target execution servers according to a principle of matching the task amount with the residual load amount.
In an implementation manner, the determining module 406 is specifically configured to:
acquiring the task quantity of each subtask in the K subtasks and the residual load quantity of each execution server in the K target execution servers;
sequencing the K subtasks according to the sequence of the task amount from large to small to obtain a first sequence set;
sequencing the K target execution servers according to the sequence of the residual load from large to small to obtain a second sequence set;
and the subtasks ordered to the I-th bit in the first sequence set correspond to the execution servers ordered to the I-th bit in the second sequence set, wherein I is a positive integer.
In an implementation manner, the determining module 406 is specifically configured to:
if the number M of the execution servers is smaller than the number K of the detachable nodes, M detachable nodes are selected from the K detachable nodes as target detachable nodes, and the M detachable nodes are used for detaching the task to be processed into M subtasks;
determining the task quantity of the subtasks corresponding to each of the M detachable nodes and the residual load quantity of each of the M execution servers;
and establishing a corresponding relation between the M subtasks and the M execution servers according to a principle of matching the task amount with the residual load amount.
In one implementation, the splitting module 402 is specifically configured to:
monitoring the processing state of a target execution server on a target subtask, wherein the target subtask is any one of the at least one subtask, the target execution server is an execution server corresponding to the target subtask, and a detachable node exists in the target subtask and is used for detaching the target subtask into a first target subtask and a second target subtask;
if the processing state of the target subtask is detected to meet a secondary splitting condition, performing secondary splitting processing on the target subtask to obtain a first sub-target task and a second target subtask, wherein the secondary splitting condition includes that the processing state of the target server for the second target subtask in the target subtask is unprocessed, and an idle execution server currently exists;
a sending module 403, configured to send, to the target execution server, an instruction to suspend processing of a second target subtask in the target subtask, and send the second target subtask to the idle execution server, so that the idle execution server processes the second target subtask.
In an implementation manner, the determining module 406 is specifically configured to:
acquiring processing time used for processing the task to be processed;
detecting whether the processing time length is less than a preset time length or not;
if so, determining the target splitting strategy as a preferred splitting strategy, and storing the preferred splitting strategy;
when a new processing task is received, detecting whether the new processing task is matched with the task to be processed;
and if so, determining the preferred splitting strategy as the splitting strategy aiming at the new processing task, and splitting the new processing task based on the preferred splitting strategy.
In the embodiment of the present invention, a receiving module 401 receives a to-be-processed task sent by a terminal, if the to-be-processed task meets a preset splitting condition, a splitting module 402 splits the to-be-processed task according to a target splitting policy to obtain at least one subtask, a sending module 403 adds an identifier to each subtask and sends each subtask with the identifier added to a corresponding execution server, the receiving module 401 receives a processing result returned by the execution server corresponding to each subtask, and a merging module 404 merges the processing results returned by the execution server corresponding to each subtask according to the identifier to obtain a target processing result for the to-be-processed task. By implementing the method, the data can be split, and the efficiency of data processing is improved.
Fig. 5 is a schematic structural diagram of a dispatch server according to an embodiment of the present invention. As shown in fig. 5, the scheduling server includes: at least one processor 501, an input device 503, an output device 504, a memory 505, at least one communication bus 502. Wherein a communication bus 502 is used to enable connective communication between these components. The memory 505 may be a high-speed RAM memory or a non-volatile memory (non-volatile memory), such as at least one disk memory. The memory 505 may alternatively be at least one memory device located remotely from the processor 501. Wherein the processor 501 may be combined with the apparatus described in fig. 4, the memory 505 stores a set of program codes, and the processor 501, the input device 503, and the output device 504 call the program codes stored in the memory 505 to perform the following operations:
a processor 501, configured to receive a to-be-processed task sent by a terminal;
the processor 501 is configured to split the to-be-processed task according to a target splitting policy to obtain at least one sub-task if the to-be-processed task meets a preset splitting condition, where the target splitting policy is used to determine a target splitting node for the to-be-processed task and a corresponding relationship between each sub-task obtained by splitting and an execution server used for task processing;
the processor 501 is configured to add an identifier to each subtask, and send each subtask with the identifier added to a corresponding execution server, so that the execution server corresponding to each subtask processes each subtask;
the processor 501 is configured to receive a processing result returned by the execution server corresponding to each subtask, where the processing result returned by the execution server corresponding to each subtask includes the identifier;
and the processor 501 is configured to perform merging processing on the processing results returned by the execution server corresponding to each subtask according to the identifier, so as to obtain a target processing result for the to-be-processed task.
In one implementation, the processor 501 is specifically configured to:
acquiring information of a detachable node and an execution server for task processing in the task to be processed, wherein the information comprises the number and the residual load capacity, and the detachable node comprises a connection point between different processing modules in the task to be processed;
and determining the target splitting strategy according to the information of the detachable node and the execution server.
In an implementation manner, the number of the detachable nodes is K, the number of the execution servers for task processing is M, where K and M are positive integers, and the processor 501 is specifically configured to:
if the number M of the execution servers is larger than or equal to the number K of the detachable nodes, determining the K detachable nodes as target detachable nodes aiming at the task to be processed, wherein the K detachable nodes are used for detaching the task to be processed into K subtasks;
selecting K execution servers with the maximum residual load capacity from the M execution servers as target execution servers;
and establishing a corresponding relation between the K subtasks and the K target execution servers according to a principle of matching the task amount with the residual load amount.
In one implementation, the processor 501 is specifically configured to:
acquiring the task quantity of each subtask in the K subtasks and the residual load quantity of each execution server in the K target execution servers;
sequencing the K subtasks according to the sequence of the task amount from large to small to obtain a first sequence set;
sequencing the K target execution servers according to the sequence of the residual load from large to small to obtain a second sequence set;
and the subtasks ordered to the I-th bit in the first sequence set correspond to the execution servers ordered to the I-th bit in the second sequence set, wherein I is a positive integer.
In one implementation, the processor 501 is specifically configured to:
if the number M of the execution servers is smaller than the number K of the detachable nodes, M detachable nodes are selected from the K detachable nodes as target detachable nodes, and the M detachable nodes are used for detaching the task to be processed into M subtasks;
determining the task quantity of the subtasks corresponding to each of the M detachable nodes and the residual load quantity of each of the M execution servers;
and establishing a corresponding relation between the M subtasks and the M execution servers according to a principle of matching the task amount with the residual load amount.
In one implementation, the processor 501 is specifically configured to:
monitoring the processing state of a target execution server on a target subtask, wherein the target subtask is any one of the at least one subtask, the target execution server is an execution server corresponding to the target subtask, and a detachable node exists in the target subtask and is used for detaching the target subtask into a first target subtask and a second target subtask;
if the processing state of the target subtask is detected to meet a secondary splitting condition, performing secondary splitting processing on the target subtask to obtain a first sub-target task and a second target subtask, wherein the secondary splitting condition includes that the processing state of the target server for the second target subtask in the target subtask is unprocessed, and an idle execution server currently exists;
and sending an instruction for suspending processing of a second target subtask in the target subtask to the target execution server, and sending the second target subtask to the idle execution server, so that the idle execution server processes the second target subtask.
In one implementation, the processor 501 is specifically configured to:
acquiring processing time used for processing the task to be processed;
detecting whether the processing time length is less than a preset time length or not;
if so, determining the target splitting strategy as a preferred splitting strategy, and storing the preferred splitting strategy;
when a new processing task is received, detecting whether the new processing task is matched with the task to be processed;
and if so, determining the preferred splitting strategy as the splitting strategy aiming at the new processing task, and splitting the new processing task based on the preferred splitting strategy.
In the embodiment of the present invention, a processor 501 receives a to-be-processed task sent by a terminal, if the to-be-processed task meets a preset splitting condition, the processor 501 splits the to-be-processed task according to a target splitting policy to obtain at least one subtask, the processor 501 adds an identifier to each subtask and sends each subtask with the identifier added to a corresponding execution server, the processor 501 receives a processing result returned by the execution server corresponding to each subtask, and the processor 501 merges the processing results returned by the execution servers corresponding to each subtask according to the identifier to obtain a target processing result for the to-be-processed task. By implementing the method, the data can be split, and the efficiency of data processing is improved.
The module in the embodiment of the present invention may be implemented by a general-purpose integrated circuit, such as a Central Processing Unit (CPU), or an Application Specific Integrated Circuit (ASIC).
It should be understood that, in the embodiment of the present invention, the Processor 501 may be a Central Processing Unit (CPU), and the Processor may also be other general processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components, and the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The bus 502 may be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, an Extended ISA (EISA) bus, or the like, and the bus 502 may be divided into a site bus, a data bus, a control bus, or the like, where fig. 5 is merely a thick line for ease of illustration, but does not indicate only one bus or one type of bus.
It will be understood by those skilled in the art that all or part of the processes of the methods of the above embodiments may be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The computer-readable storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.
The above disclosure is only for the purpose of illustrating the preferred embodiments of the present invention, and it is therefore to be understood that the invention is not limited by the scope of the appended claims.

Claims (10)

1. A method of data processing, the method comprising:
receiving a task to be processed sent by a terminal;
if the task to be processed meets a preset splitting condition, splitting the task to be processed according to a target splitting strategy to obtain at least one subtask, wherein the target splitting strategy is used for determining a target splitting node for the task to be processed and a corresponding relation between each subtask obtained by splitting and an execution server used for task processing;
adding an identifier for each subtask, and sending each subtask with the identifier added to a corresponding execution server, so that the execution server corresponding to each subtask processes each subtask;
receiving a processing result returned by the execution server corresponding to each subtask, wherein the processing result returned by the execution server corresponding to each subtask contains the identifier;
and merging the processing results returned by the execution server corresponding to each subtask according to the identification to obtain a target processing result aiming at the to-be-processed task.
2. The method of claim 1, wherein after receiving the pending task sent by the terminal, the method further comprises:
acquiring information of a detachable node and an execution server for task processing in the task to be processed, wherein the information comprises the number and the residual load capacity, and the detachable node comprises a connection point between different processing modules in the task to be processed;
and determining the target splitting strategy according to the information of the detachable node and the execution server.
3. The method according to claim 2, wherein the number of the detachable nodes is K, the number of the execution servers for task processing is M, and K and M are positive integers, and the determining the target splitting policy according to the information of the detachable nodes and the execution servers comprises:
if the number M of the execution servers is larger than or equal to the number K of the detachable nodes, determining the K detachable nodes as target detachable nodes aiming at the task to be processed, wherein the K detachable nodes are used for detaching the task to be processed into K subtasks;
selecting K execution servers with the maximum residual load capacity from the M execution servers as target execution servers;
and establishing a corresponding relation between the K subtasks and the K target execution servers according to a principle of matching the task amount with the residual load amount.
4. The method according to claim 3, wherein the establishing the correspondence between the K subtasks and the K target execution servers according to a principle that a task amount is matched with a remaining load amount comprises:
acquiring the task quantity of each subtask in the K subtasks and the residual load quantity of each execution server in the K target execution servers;
sequencing the K subtasks according to the sequence of the task amount from large to small to obtain a first sequence set;
sequencing the K target execution servers according to the sequence of the residual load from large to small to obtain a second sequence set;
and the subtasks ordered to the I-th bit in the first sequence set correspond to the execution servers ordered to the I-th bit in the second sequence set, wherein I is a positive integer.
5. The method of claim 3, further comprising:
if the number M of the execution servers is smaller than the number K of the detachable nodes, M detachable nodes are selected from the K detachable nodes as target detachable nodes, and the M detachable nodes are used for detaching the task to be processed into M subtasks;
determining the task quantity of the subtasks corresponding to each of the M detachable nodes and the residual load quantity of each of the M execution servers;
and establishing a corresponding relation between the M subtasks and the M execution servers according to a principle of matching the task amount with the residual load amount.
6. The method according to claim 1, wherein before receiving the processing result returned by the execution server corresponding to each sub-task, the method further comprises:
monitoring the processing state of a target execution server on a target subtask, wherein the target subtask is any one of the at least one subtask, the target execution server is an execution server corresponding to the target subtask, and a detachable node exists in the target subtask and is used for detaching the target subtask into a first target subtask and a second target subtask;
if the processing state of the target subtask is detected to meet a secondary splitting condition, performing secondary splitting processing on the target subtask to obtain a first sub-target task and a second target subtask, wherein the secondary splitting condition includes that the processing state of the target server for the second target subtask in the target subtask is unprocessed, and an idle execution server currently exists;
and sending an instruction for suspending processing of a second target subtask in the target subtask to the target execution server, and sending the second target subtask to the idle execution server, so that the idle execution server processes the second target subtask.
7. The method according to claim 1, wherein after the merging the at least one processing result according to the identifier to obtain a target processing result, the method further comprises:
acquiring processing time used for processing the task to be processed;
detecting whether the processing time length is less than a preset time length or not;
if so, determining the target splitting strategy as a preferred splitting strategy, and storing the preferred splitting strategy;
when a new processing task is received, detecting whether the new processing task is matched with the task to be processed;
and if so, determining the preferred splitting strategy as the splitting strategy aiming at the new processing task, and splitting the new processing task based on the preferred splitting strategy.
8. A data processing apparatus, characterized in that the apparatus comprises:
the receiving module is used for receiving the tasks to be processed sent by the terminal;
the splitting module is used for splitting the task to be processed according to a target splitting strategy to obtain at least one subtask if the task to be processed meets a preset splitting condition, wherein the target splitting strategy is used for determining a target splitting node for the task to be processed and a corresponding relation between each subtask obtained by splitting and an execution server for task processing;
the sending module is used for adding an identifier for each subtask and sending each subtask with the identifier added to a corresponding execution server so that the execution server corresponding to each subtask can process each subtask;
the receiving module is further configured to receive a processing result returned by the execution server corresponding to each subtask, where the processing result returned by the execution server corresponding to each subtask includes the identifier;
and the merging module is used for merging the processing results returned by the execution servers corresponding to the sub-tasks according to the identification to obtain target processing results for the tasks to be processed.
9. A dispatch server comprising a processor, an input interface, an output interface, and a memory, the processor, the input interface, the output interface, and the memory being interconnected, wherein the memory is configured to store a computer program comprising program instructions, the processor being configured to invoke the program instructions to perform the method of any of claims 1-7.
10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program comprising program instructions that, when executed by a processor, cause the processor to carry out the method according to any one of claims 1-7.
CN202010161333.8A 2020-03-10 2020-03-10 Data processing method, device, scheduling server and medium Pending CN111459659A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010161333.8A CN111459659A (en) 2020-03-10 2020-03-10 Data processing method, device, scheduling server and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010161333.8A CN111459659A (en) 2020-03-10 2020-03-10 Data processing method, device, scheduling server and medium

Publications (1)

Publication Number Publication Date
CN111459659A true CN111459659A (en) 2020-07-28

Family

ID=71685153

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010161333.8A Pending CN111459659A (en) 2020-03-10 2020-03-10 Data processing method, device, scheduling server and medium

Country Status (1)

Country Link
CN (1) CN111459659A (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111813565A (en) * 2020-09-15 2020-10-23 北京东方通科技股份有限公司 Method and system for balancing workload in a grid computing environment
CN112035258A (en) * 2020-08-31 2020-12-04 中国平安财产保险股份有限公司 Data processing method, device, electronic equipment and medium
CN112162701A (en) * 2020-09-18 2021-01-01 北京浪潮数据技术有限公司 Storage space recovery method, device, equipment and computer storage medium
CN112395085A (en) * 2020-11-05 2021-02-23 深圳市中博科创信息技术有限公司 HDFS-based distributed relational database scheduling method
CN112468573A (en) * 2020-11-24 2021-03-09 新天科技股份有限公司 Data pushing method, device, equipment and storage medium
CN112637267A (en) * 2020-11-27 2021-04-09 成都质数斯达克科技有限公司 Service processing method and device, electronic equipment and readable storage medium
CN112862451A (en) * 2021-03-02 2021-05-28 岭东核电有限公司 Work order processing method and device, computer equipment and storage medium
CN114265873A (en) * 2022-03-01 2022-04-01 成都鲁易科技有限公司 Data processing method and device, storage medium, terminal and server
CN115658269A (en) * 2022-11-01 2023-01-31 上海玫克生储能科技有限公司 Heterogeneous computing terminal for task scheduling
CN117041259A (en) * 2023-09-28 2023-11-10 新华三技术有限公司 Scheduling method and device for computing resources
WO2024066342A1 (en) * 2022-09-26 2024-04-04 京东科技信息技术有限公司 Task processing method and apparatus, electronic device, and storage medium

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112035258B (en) * 2020-08-31 2022-06-17 中国平安财产保险股份有限公司 Data processing method, device, electronic equipment and medium
CN112035258A (en) * 2020-08-31 2020-12-04 中国平安财产保险股份有限公司 Data processing method, device, electronic equipment and medium
CN111813565A (en) * 2020-09-15 2020-10-23 北京东方通科技股份有限公司 Method and system for balancing workload in a grid computing environment
CN112162701A (en) * 2020-09-18 2021-01-01 北京浪潮数据技术有限公司 Storage space recovery method, device, equipment and computer storage medium
CN112162701B (en) * 2020-09-18 2023-12-22 北京浪潮数据技术有限公司 Storage space recycling method, device, equipment and computer storage medium
CN112395085A (en) * 2020-11-05 2021-02-23 深圳市中博科创信息技术有限公司 HDFS-based distributed relational database scheduling method
CN112395085B (en) * 2020-11-05 2022-10-25 深圳市中博科创信息技术有限公司 HDFS-based distributed relational database scheduling method
CN112468573A (en) * 2020-11-24 2021-03-09 新天科技股份有限公司 Data pushing method, device, equipment and storage medium
CN112468573B (en) * 2020-11-24 2023-05-23 新天科技股份有限公司 Data pushing method, device, equipment and storage medium based on distributed deployment
CN112637267A (en) * 2020-11-27 2021-04-09 成都质数斯达克科技有限公司 Service processing method and device, electronic equipment and readable storage medium
CN112862451A (en) * 2021-03-02 2021-05-28 岭东核电有限公司 Work order processing method and device, computer equipment and storage medium
CN114265873A (en) * 2022-03-01 2022-04-01 成都鲁易科技有限公司 Data processing method and device, storage medium, terminal and server
WO2024066342A1 (en) * 2022-09-26 2024-04-04 京东科技信息技术有限公司 Task processing method and apparatus, electronic device, and storage medium
CN115658269A (en) * 2022-11-01 2023-01-31 上海玫克生储能科技有限公司 Heterogeneous computing terminal for task scheduling
CN115658269B (en) * 2022-11-01 2024-02-27 上海玫克生储能科技有限公司 Heterogeneous computing terminal for task scheduling
CN117041259A (en) * 2023-09-28 2023-11-10 新华三技术有限公司 Scheduling method and device for computing resources
CN117041259B (en) * 2023-09-28 2024-01-12 新华三技术有限公司 Scheduling method and device for computing resources

Similar Documents

Publication Publication Date Title
CN111459659A (en) Data processing method, device, scheduling server and medium
CN110209496B (en) Task fragmentation method and device based on data processing and fragmentation server
CN109033814B (en) Intelligent contract triggering method, device, equipment and storage medium
CN108762905B (en) Method and device for processing multitask events
CN111737022A (en) Interface calling method, system, equipment and medium based on micro-service
CN110928905A (en) Data processing method and device
CN114398179B (en) Method and device for acquiring tracking identifier, server and storage medium
CN111026552B (en) Resource scheduling method and device, electronic equipment and computer readable storage medium
CN110908797B (en) Call request data processing method, device, equipment, storage medium and system
CN112965809A (en) Deep learning task processing system and method
CN115794262A (en) Task processing method, device, equipment, storage medium and program product
CN111538572A (en) Task processing method, device, scheduling server and medium
CN114168352A (en) Multi-core task scheduling method and device, electronic equipment and storage medium
CN107634978B (en) Resource scheduling method and device
CN111831408A (en) Asynchronous task processing method and device, electronic equipment and medium
CN110460673B (en) Service address acquisition method, device and equipment
CN115629951B (en) Task full-link tracking method, first node, link system and medium
CN112363834A (en) Task processing method, device, terminal and storage medium
CN111475230A (en) Application function configuration method and device and electronic equipment
CN108616576B (en) Method and device for scheduling application server
CN112925623B (en) Task processing method, device, electronic equipment and medium
CN114356713A (en) Thread pool monitoring method and device, electronic equipment and storage medium
CN109040491A (en) On-hook behavior processing method, device, computer equipment and storage medium
CN115525411A (en) Method, device, electronic equipment and computer readable medium for processing service request
CN111324443B (en) Data processing method, device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination