CN114780214A

CN114780214A - Task processing method, device, system and equipment

Info

Publication number: CN114780214A
Application number: CN202210352913.4A
Authority: CN
Inventors: 王学敏; 孟维业; 孙智超; 王涛
Original assignee: China Telecom Corp Ltd
Current assignee: China Telecom Corp Ltd
Priority date: 2022-04-01
Filing date: 2022-04-01
Publication date: 2022-07-22
Anticipated expiration: 2042-04-01
Also published as: CN114780214B

Abstract

The embodiment of the application discloses a task processing method, a device, a system and electronic equipment, wherein the method divides a task to be processed to obtain a subtask set; distributing subtasks in the subtask set to the data clusters so that the data clusters process the subtasks to obtain subtask processing results corresponding to the subtasks, and further, without concentrating data of each data cluster, distributing the subtasks to be processed to the data clusters for local calculation by splitting the tasks to be processed into the subtasks and sending the subtasks to the data clusters to enable calculation power to sink to positions close to the data; and then, based on the subtask processing results fed back by the data clusters, obtaining target task processing results corresponding to the tasks to be processed, so that the data clusters only need to upload the subtask processing results, and then summarize the subtask processing results to obtain the target task processing results, thereby reducing the data transmission overhead of each data cluster and further improving the task completion efficiency.

Description

Task processing method, device, system and equipment

Technical Field

The present disclosure relates to the field of big data technologies, and in particular, to a task processing method, apparatus, system, computer-readable storage medium, electronic device, and computer program product.

Background

Currently, big data processing technology is widely known and applied, and in recent years, application scenarios that need to process big data distributed in different regional clusters appear. Many companies or organizations need to purchase cloud servers in different regions to build clusters to process corresponding services according to different service types or regions, and the different clusters generate a large amount of data every day and need to be processed uniformly.

In the related art, in order to process a large amount of data distributed in different area clusters, a processing method mainly used is centralized processing, that is, data in different area clusters are transmitted to one cluster and then analyzed and processed. However, such a processing method is not always feasible, and besides the time and cost problems caused by the transmission of a large amount of raw data, there is a problem that some countries or regions prohibit the transmission of user raw data to other regions in order to protect data privacy.

Therefore, in consideration of the above problems, how to more reasonably process large data in different regions is an urgent problem to be solved.

Disclosure of Invention

In order to solve the above technical problem, embodiments of the present application provide a task processing method, an apparatus, a system, a computer-readable storage medium, an electronic device, and a computer program product, so as to perform more reasonable processing on tasks related to big data clusters in different regions.

According to an aspect of an embodiment of the present application, there is provided a task processing method, including: acquiring a task to be processed; splitting a task to be processed to obtain a subtask set; distributing the subtasks in the subtask set to the data cluster so that the data cluster processes the subtasks to obtain a subtask processing result corresponding to the subtasks; and obtaining a target task processing result corresponding to the task to be processed based on the subtask processing result fed back by the data cluster.

In some embodiments, the number of data clusters is multiple; assigning the subtasks in the set of subtasks to the data cluster, including: acquiring task parameters of subtasks in the subtask set; wherein the task parameters are data required for processing the subtasks; matching the plurality of data clusters based on the task parameters of the subtasks to obtain the data clusters to be distributed corresponding to each subtask; and distributing the subtasks to the data clusters to be distributed corresponding to the subtasks.

In some embodiments, matching the plurality of data clusters based on the task parameters of the subtasks to obtain the data cluster to be allocated corresponding to each subtask includes: acquiring metadata of each data cluster; detecting whether each data cluster meets the processing condition of the subtask or not based on the task parameter of the subtask and the metadata of each data cluster; and taking the data clusters which accord with the processing conditions of the subtasks as the data clusters to be distributed corresponding to the subtasks.

In some embodiments, the number of data clusters to be allocated is multiple; the method for distributing the subtasks to the data clusters to be distributed corresponding to the subtasks comprises the following steps: respectively obtaining current data processing capacity values of a plurality of data clusters to be distributed; sequencing a plurality of data clusters to be distributed based on the current data processing capacity value to obtain a sequencing result; and determining a target data cluster to be allocated based on the sequencing result, and allocating the subtask to the target data cluster to be allocated.

In some embodiments, prior to assigning the subtasks in the set of subtasks to the data cluster, the method further comprises: receiving a registration request sent by a data cluster; based on the registration request, sending an identity certificate corresponding to the data cluster; receiving a communication connection request sent by a data cluster, wherein the communication connection request comprises an identity certificate to be authenticated corresponding to the data cluster; and authenticating the identity certificate to be authenticated, and if the authentication is passed, establishing communication connection with the data cluster.

In some embodiments, after assigning the subtasks in the set of subtasks to the data cluster, the method further comprises: acquiring fault information sent by a data cluster, wherein the fault information carries processing information of a subtask for processing a fault in the data cluster; determining reason information causing the sub-task processing fault to trigger to generate fault information according to the processing information; acquiring a standby data cluster corresponding to the subtask with the fault based on the reason information; and distributing the sub-task with the fault to the standby data cluster so that the standby data cluster can reprocess the sub-task with the fault.

In some embodiments, the number of data clusters is multiple; obtaining a target task processing result corresponding to the task to be processed based on the subtask processing result fed back by the data cluster, wherein the target task processing result comprises: acquiring subtasks corresponding to subtask processing results fed back by the data cluster, and modifying task states of the subtasks into completion states; and if the task states of all the subtasks in the subtask set are detected to be complete states, converging the subtask processing results fed back by each data cluster to obtain a target task processing result.

According to an aspect of an embodiment of the present application, there is provided a task processing apparatus including: the task acquisition module is configured to acquire a task to be processed; the task splitting module is configured to split a task to be processed to obtain a subtask set; the task allocation module is configured to allocate the subtasks in the subtask set to the data cluster so that the data cluster processes the subtasks to obtain a subtask processing result corresponding to the subtasks; and the result acquisition module is configured to obtain a target task processing result corresponding to the task to be processed based on the subtask processing result fed back by the data cluster.

According to one aspect of the embodiment of the application, a task processing system is provided, which comprises a task processing control platform and a plurality of data clusters, wherein the task processing control platform is in communication connection with the data clusters through an intermediate agent node; wherein: the task processing control platform is used for acquiring the tasks to be processed, splitting the tasks to be processed to obtain a subtask set, and respectively allocating the subtasks in the subtask set to the data clusters; the data cluster is used for processing the subtasks, obtaining subtask processing results corresponding to the subtasks, and sending the subtask processing results to the task processing control platform; and the task processing control platform is also used for obtaining a target task processing result corresponding to the task to be processed based on the subtask processing result fed back by the data cluster.

According to an aspect of an embodiment of the present application, there is provided an electronic device including: a memory storing computer readable instructions; and the processor reads the computer readable instructions stored in the memory to process the steps in the task processing method.

According to an aspect of embodiments of the present application, there is provided a computer-readable storage medium having stored thereon computer-readable instructions, which, when processed by a processor of a computer, cause the computer to process the steps in the task processing method as above.

In the technical scheme provided by the embodiment of the application, the task to be processed is obtained; splitting a task to be processed to obtain a subtask set; distributing subtasks in the subtask set to the data clusters so that the data clusters process the subtasks to obtain subtask processing results corresponding to the subtasks, and further, without concentrating data of each data cluster, distributing the subtasks to be processed to the data clusters for local calculation by splitting the tasks to be processed into the subtasks and sending the subtasks to the data clusters to enable calculation power to sink to positions close to the data; and then, based on the subtask processing results fed back by the data clusters, obtaining target task processing results corresponding to the tasks to be processed, so that the data clusters only need to upload the subtask processing results, and then summarize the subtask processing results to obtain the target task processing results, thereby reducing the data transmission overhead of each data cluster and further improving the task completion efficiency.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application. It is obvious that the drawings in the following description are only some embodiments of the application, and that for a person skilled in the art, other drawings can be derived from them without inventive effort. In the drawings:

FIG. 1 is a schematic illustration of an implementation environment to which the present application relates;

FIG. 2 is a flow chart diagram of a task processing method as shown in an exemplary embodiment of the present application;

FIG. 3 is a diagram illustrating obtaining pending tasks in an exemplary embodiment of the present application;

FIG. 4 is a flow chart of a task processing method shown in another exemplary embodiment of the present application;

FIG. 5 is a flow chart diagram of a task processing method, shown in another exemplary embodiment of the present application;

FIG. 6 is a flowchart of a task processing method, shown in another exemplary embodiment of the present application;

FIG. 7 is a schematic diagram of a task processing system shown in an exemplary embodiment of the present application;

FIG. 8 is a block diagram of a task processing device shown in an exemplary embodiment of the present application;

FIG. 9 illustrates a schematic structural diagram of a computer system suitable for use in implementing the electronic device of an embodiment of the present application.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the following exemplary examples do not represent all implementations consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the application, as detailed in the appended claims.

The block diagrams shown in the figures are functional entities only and do not necessarily correspond to physically separate entities. I.e. these functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor means and/or microcontroller means.

The flowcharts shown in the figures are illustrative only and do not necessarily include all of the contents and operations/steps, nor are they necessarily processed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the actual processing order may be changed according to the actual situation.

Reference to "a plurality" in this application means two or more. "and/or" describe the association relationship of the associated objects, meaning that there may be three relationships, e.g., A and/or B may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship.

In recent years, big data processing is well known by many people, and a new application scenario, namely cross-region big data processing, is developed, the traditional big data processing aims at data in a single cluster, and the cross-region big data processing processes big data distributed in different regional clusters. With the development of the internet and globalization, many companies or organizations will establish service end clusters in different regions to process corresponding services according to their own service types or different regions. The business jobs in these clusters generate a large amount of data each day, which may come from different sources such as terminal devices, sensors, etc., and the analysis of this large amount of data is an operation that needs to be performed each day. For example, a company needs to analyze network traffic in each cluster to detect whether there is a network attack, and such a job must perform uniform analysis on log data in all clusters to obtain a more accurate result.

The existing big data processing system is basically in a centralized processing mode, namely when data across regions are processed, the data of each cluster must be firstly centralized into one cluster. However, such operations necessarily result in a large amount of data transmission between clusters, with a consequent time and cost penalty, which many countries or regions now prohibit such transregional data transmission in order to protect data privacy, even without considering the time and cost factors.

Based on this, in order to ensure efficiency of processing tasks related to data clusters of multiple regions, embodiments of the present application propose a task processing method, an apparatus, a system, a computer-readable storage medium, an electronic device, and a computer program product.

The following describes a task processing method provided in an embodiment of the present application, where fig. 1 is a schematic diagram of an implementation environment related to the task processing method in the present application. As shown in fig. 1, the implementation environment includes a terminal 110 and a server 120, and the terminal 110 and the server 120 may be directly or indirectly connected through wired or wireless communication, which is not limited herein.

The terminal 110 may be, but is not limited to, a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart watch, and the like. The terminal 110 may be generally referred to as one of a plurality of terminals, and the embodiment is only illustrated by the terminal 110. Those skilled in the art will appreciate that the number of terminals may be greater or less. For example, the number of the terminal may be only one, or several tens or hundreds, or more, and in this case, the implementation environment of the task processing method further includes other terminals. The number of terminals and the type of the device are not limited in the embodiments of the present application.

The server 120 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as cloud service, a cloud database, cloud computing, cloud functions, cloud storage, network service, cloud communication, middleware service, domain name service, security service, CDN, and a big data and artificial intelligence platform. The server 120 is configured to provide a background service for an application executed by the terminal 110.

Optionally, the wireless network or wired network described above uses standard communication techniques and/or protocols. The Network is typically the Internet, but can be any Network including, but not limited to, a Local Area Network (LAN), a Metropolitan Area Network (MAN), a Wide Area Network (WAN), a mobile, wired or wireless Network, a private Network, or any combination of virtual private networks. In some embodiments, data exchanged over a network is represented using techniques and/or formats including Hypertext Mark-up Language (HTML), Extensible Markup Language (XML), and the like. All or some of the links may also be encrypted using conventional encryption techniques such as Secure Socket Layer (SSL), Transport Layer Security (TLS), Virtual Private Network (VPN), Internet Protocol Security (IPsec). In other embodiments, custom and/or dedicated data communication techniques may also be used in place of, or in addition to, the data communication techniques described above.

Optionally, the server 120 undertakes primary task processing work, and the terminal 110 undertakes secondary task processing work; or, the server 120 undertakes the secondary task processing work, and the terminal 110 undertakes the primary task processing work; alternatively, the server 120 or the terminal 110 may respectively undertake the task processing work separately.

Exemplarily, the terminal 110 sends a task processing instruction to the server 120, and the server 120 receives the task processing instruction sent by the terminal 110 to obtain a task to be processed; the server 120 splits the task to be processed to obtain a subtask set; distributing the subtasks in the subtask set to the data cluster so that the data cluster processes the subtasks to obtain a subtask processing result corresponding to the subtasks; and obtaining a target task processing result corresponding to the task to be processed based on the subtask processing result fed back by the data cluster.

Referring to fig. 2, fig. 2 is a flowchart illustrating a task processing method according to an exemplary embodiment of the present application. The task processing method can be applied to the implementation environment shown in fig. 1, and is specifically processed by the server 120 in the implementation environment. It should be understood that the method may be applied to other exemplary implementation environments and specifically processed by devices in other implementation environments, and the embodiment is not limited to the implementation environment to which the method is applied.

The following describes in detail a task processing method proposed in the embodiments of the present application with a server as a specific processing subject.

As shown in fig. 2, in an exemplary embodiment, the task processing method at least includes steps S210 to S240, which are described in detail as follows:

and step S210, acquiring the task to be processed.

The task to be processed refers to the task data to be processed currently.

It is understood that there may be multiple tasks at the server, and one task may be randomly determined from the multiple tasks as a task to be processed. Optionally, the task with the earliest submission time may be determined from the multiple tasks as the task to be processed according to the time sequence of submission of the tasks; optionally, the task with the highest priority may be determined from the multiple tasks as the task to be processed according to the priority ranking order of the tasks in the multiple tasks, which is not limited in this application.

For example, please refer to fig. 3, and fig. 3 is a schematic diagram illustrating obtaining a task to be processed in an exemplary embodiment of the present application. As shown in fig. 3, the server is provided with a user interaction interface, and the user interaction interface obtains task data input by the user by detecting task configuration operations of the user. For example, a file uploading component may be arranged on the user interaction interface, and the user uploads a file including task data by triggering the file uploading component, so as to obtain the task data according to the file uploaded by the user, such as a program code file including task content, a configuration file including a task processing flow, and the like. For example, a task data configuration component may be further disposed on the user interaction interface, so as to obtain task data by detecting an interaction operation between the user and the task configuration component, where the task data configuration component includes a task type configuration component, a task priority configuration component, and the like, the user selects a task type of the task through the task type configuration component, and sets a priority of the task through the task priority configuration component. It can be understood that the present application does not limit the manner of acquiring the task data, for example, the task data may also be stored in a database, and the task data is obtained by querying the database. Further, the obtained task data is input into a task queue, and the server selects the current task data to be processed from the N task data contained in the task queue to obtain the task to be processed.

Illustratively, the server may actively acquire the task data when detecting that a preset event is triggered, and if detecting that the current time reaches a preset time, acquire the task data from the database; the server may also obtain task data corresponding to the task processing request when receiving the task processing request, which is not limited in the present application.

And S220, splitting the task to be processed to obtain a subtask set.

A complete task to be processed is refined into a plurality of subtasks through task splitting operation, and the subtasks form a subtask set.

For example, the task to be processed may be split according to the resource condition of the data cluster. For example, according to the current data structure information of each data cluster, the task to be processed is split to obtain a subtask set. If the task to be processed is to count the number of the first type data and the number of the second type data, it is known that the first type data exists in the data cluster a according to the data structure information of the data cluster a, and it is known that the second type data exists in the data cluster B according to the data structure information of the data cluster B, the task to be processed needs to be refined into a first subtask and a second subtask, wherein the first subtask is to count the number of the first type data, and the second subtask is to count the number of the second type data.

For example, the task to be processed may be split according to workflow information of the task to be processed, where the workflow information refers to a processing rule between each step in the task to be processed. The workflow information may be obtained according to a workflow input operation of a user, or may be obtained by analyzing task data by the server. Each step can be refined into one sub-task, and multiple steps can also be refined into one sub-task according to the dependency relationship among the steps.

For example, the task to be processed may be split according to the resource condition of the data cluster and the workflow information of the task to be processed, which is not limited in this application.

And step S230, distributing the subtasks in the subtask set to the data cluster so that the data cluster processes the subtasks to obtain a subtask processing result corresponding to the subtasks.

The data clusters store data needed for processing the tasks to be processed, and the data needed for processing the tasks to be processed may be stored in the multiple data clusters respectively, so that the tasks to be processed can be processed only by combining the data needed for processing the tasks to be processed included in the data clusters. Therefore, in order to avoid that the data clusters upload data required by the tasks to be processed in a centralized manner, the tasks to be processed are refined into a plurality of subtasks, and the subtasks are issued to the data clusters corresponding to the subtasks for local calculation.

It can be understood that multiple subtasks in a subtask set may be allocated to the same data cluster, and then, processing rules configured correspondingly by the multiple subtasks, that is, processing logic information between each subtask needs to be sent to the data cluster at the same time, so that the data cluster processes the multiple subtasks according to the processing rules between the multiple subtasks.

For example, data required for sub-task processing may be acquired, whether each data cluster meets a sub-task processing condition is determined according to the data required for sub-task processing, and if the data cluster has the data required for sub-task processing, it is indicated that the data cluster meets the sub-task processing condition of the sub-task, that is, the sub-task is allocated to the data cluster.

For example, a plurality of data clusters may meet sub-task processing conditions of the same sub-task, and the plurality of data clusters may be sorted, so as to select a data cluster finally matched by the sub-task according to a sorting result, and send the sub-task to the data cluster finally matched by the sub-task.

Illustratively, based on the above embodiment, the number of the tasks to be processed may be multiple, the server may process the multiple tasks to be processed in parallel, to obtain a plurality of subtask sets of the tasks to be processed, each subtask set belongs to the same task to be processed, and a task identifier may be added to each subtask set according to the difference of the tasks to be processed to which the subtask set belongs, so as to distinguish the subtask sets of different tasks to be processed. Then, distributing each subtask according to the priority of the subtask in the plurality of subtask sets, so that the server can process a plurality of tasks to be processed simultaneously. The subtasks with higher priority in each task to be processed can be processed in sequence according to the priority sequence of the subtasks in each task to be processed, and compared with the processing time for processing each task to be processed in sequence, the overall processing time for processing the subtasks of a plurality of tasks to be processed is shorter, and the overall task processing efficiency is improved.

Further, the data cluster processes the received subtasks to obtain a subtask processing result. Optionally, the data cluster may refine the subtask again, so that the refined subtask is respectively allocated to each computing device in the data cluster for processing, and then the processing results after processing by each computing device are summarized to obtain the subtask processing results.

And S240, obtaining a target task processing result corresponding to the task to be processed based on the subtask processing result fed back by the data cluster.

The target task processing result is a task processing result corresponding to the task to be processed, and is obtained according to a subtask processing result corresponding to each subtask of the task to be processed.

Illustratively, when it is detected that all the subtasks corresponding to the task to be processed have obtained the subtask processing results, all the subtask processing results are aggregated, the aggregated subtask processing results are calculated to obtain the target task processing result, and the target task processing result is stored.

When data across regions are processed, data of each cluster needs to be concentrated into one cluster at first, so that a large amount of data transmission exists among the clusters, and further time cost and cost of data transmission are improved; and then, based on the subtask processing results fed back by the data clusters, obtaining target task processing results corresponding to the tasks to be processed, so that the data clusters only need to upload the subtask processing results, and then summarize the subtask processing results to obtain the target task processing results, thereby reducing the data transmission overhead of each data cluster and further improving the task completion efficiency.

Referring to fig. 4, in an exemplary embodiment, the number of the data clusters is multiple, and the task processing method includes steps S410 to S460, which are described in detail as follows:

and step S410, acquiring the task to be processed.

The specific implementation of step S410 is the same as step S210 in fig. 2, and is not described herein again.

And step S420, splitting the task to be processed to obtain a subtask set.

The specific implementation of step S420 is the same as step S220 in fig. 2, and is not described herein again.

S430, acquiring task parameters of subtasks in the subtask set; wherein the task parameters are data required for processing the subtasks.

The task parameter refers to data required for processing the sub-task, for example, if the sub-task counts the number of the first type of data, the task parameter of the sub-task is the first type of data.

It can be understood that the task parameters of the subtasks include direct data and indirect data, the direct data refers to data that can be directly acquired by the server, and the indirect data refers to data that cannot be directly acquired by the server. For example, the data that the server can directly obtain may be data in a server local database, or data input by a user, or data obtained by calculating data in the server local database or data input by the user, that is, data that can be obtained without performing data obtaining operation on other data clusters; the data that the server cannot directly acquire may be data in other data clusters, or data obtained by calculating data in other data clusters, that is, data that can be obtained only by performing data acquisition operation on other data clusters.

Illustratively, each subtask is correspondingly provided with a task parameter table, the task parameter of each subtask is respectively added to the task parameter table corresponding to each subtask, and the task parameters in the task parameter table are marked, so that the task parameters corresponding to the marks are indicated as direct data or indirect data through the marks.

And step S440, matching the plurality of data clusters based on the task parameters of the subtasks to obtain the data clusters to be distributed corresponding to each subtask.

The purpose of matching the plurality of data clusters based on the task parameters of the subtasks is to detect whether the data clusters meet the processing conditions of the subtasks, and if the data clusters meet the processing conditions of the subtasks, the data clusters are used as data clusters to be distributed of the subtasks; and if the data cluster does not accord with the processing condition of the subtask, skipping the data cluster so as to identify the next data cluster.

Optionally, in the foregoing exemplary embodiment, the method in this embodiment of the application may further implement step S440 as follows, and details are as follows: acquiring metadata of each data cluster; detecting whether each data cluster meets the processing condition of the subtask or not based on the task parameters of the subtask and the metadata of each data cluster; and taking the data clusters which accord with the processing conditions of the subtasks as the data clusters to be distributed corresponding to the subtasks.

The metadata is data (data about data) describing data, mainly information describing data property (property), and is used for supporting functions such as indicating storage location, history data, resource searching, file recording, and the like.

For example, the server stores metadata information of each data cluster in communication connection with the server, and may be that the data cluster in communication connection with the server reports its current metadata to the server periodically, so that the server updates the stored metadata information of each data cluster according to the metadata reported by each data cluster. Then, the server may obtain the metadata of each data cluster according to the locally stored metadata information of each data cluster.

For example, when distributing the subtasks, the server may send a metadata acquisition request to each data cluster in communication connection with the server, so that each data cluster reports its current metadata to the server according to the metadata acquisition request, and the server obtains metadata fed back by each data cluster.

Further, indirect data corresponding to the subtask is obtained according to task parameters of the subtask, metadata of each data cluster is matched according to data information of the indirect data, if all indirect data of the subtask are stored in the data cluster, it is indicated that the data cluster meets processing conditions of the subtask, and the data cluster meeting the processing conditions of the subtask is used as a data cluster to be distributed corresponding to the subtask.

And step S450, distributing the subtasks to the data clusters to be distributed corresponding to the subtasks so that the data clusters to be distributed process the subtasks to obtain subtask processing results corresponding to the subtasks.

For example, the server generates a task processing request according to the subtask to issue the task processing request to the to-be-distributed data cluster corresponding to the subtask, and the to-be-distributed data cluster processes the subtask according to the task processing request. The task processing request may include at least one sub-task and a processing rule between the sub-tasks, and the task processing request may also include information such as a priority of the sub-task and a processing time of the sub-task, which is not limited in this application.

Optionally, in the foregoing exemplary embodiment, the number of the data clusters to be allocated is multiple, and the embodiment of the present application may further implement step S450 by using the following method, where details are as follows: respectively acquiring current data processing capacity values of a plurality of data clusters to be distributed; sequencing a plurality of data clusters to be distributed based on the current data processing capacity value to obtain a sequencing result; and determining a target data cluster to be distributed based on the sequencing result, and distributing the subtasks to the target data cluster to be distributed.

The current data processing capacity value refers to the capacity of a data cluster to be allocated to process tasks, and for the same subtask, the higher the current data processing capacity value of the data cluster to be allocated is, the shorter the time required by the data cluster to be allocated to complete the subtask is, whereas the lower the current data processing capacity value of the data cluster to be allocated is, the longer the time required by the data cluster to be allocated to complete the subtask is.

Illustratively, the current data processing capability value of each data set to be allocated is calculated according to the resource usage condition of each data set to be allocated, the data transmission time of each data set to be allocated, the data transmission cost of each data set to be allocated, and other information. The resource usage condition of the data cluster refers to the number of idle resources of the data cluster, and for the same subtask, the smaller the number of idle resources of the data cluster is, the longer the time required by the data cluster to complete the subtask is, whereas the larger the number of idle resources of the data cluster is, the shorter the time required by the data cluster to complete the subtask is. The data transmission time of a data cluster refers to the time required for one data transmission between the data cluster and a server. The data transmission cost of the data cluster refers to the cost required for one-time data transmission between the data cluster and the server.

In the embodiment of the application, information such as resource use condition of each data cluster, data transmission time of each data cluster, data transmission cost of each data cluster and the like can be weighted and calculated to obtain the current data processing capability value of each data set to be distributed. The weight value for performing the weighting calculation may be preset, or may be dynamically determined according to the subtasks, for example, according to information such as the specified completion time and the priority of the subtasks.

After the current data processing capacity value of each data set to be distributed is obtained, the data sets to be distributed are sorted based on the current data processing capacity value to obtain a sorting result, a target data set to be distributed is determined based on the sorting result, and the subtasks are distributed to the target data set to be distributed. For example, the data set to be allocated with the highest current data processing capacity value in the sorting result may be used as the target data cluster to be allocated.

For example, the sub-task matching value may also be calculated according to the priority of the sub-task, the calculation amount of the sub-task, and the current data processing capability value of each to-be-allocated data set, so that the to-be-allocated data set with the highest sub-task matching value of the sub-task is used as the target to-be-allocated data cluster. For example, the higher the priority of the subtask is, the more urgent the subtask is, the more idle resources of the data cluster need to be allocated to the subtask, whereas the lower the priority of the subtask is, the less urgent the subtask is, and the subtask can be allocated to the data cluster with less idle resources. The calculation amount of the subtask refers to a calculation resource required for completing the subtask, and for the same data cluster, the larger the calculation amount of the subtask is, the longer the time required by the data cluster for completing the subtask is, whereas the smaller the calculation amount of the subtask is, the shorter the time required by the data cluster for completing the subtask is.

In the embodiment of the application, the priority of the subtasks, the calculation amount of the subtasks and the current data processing capability value of each data set to be distributed can be weighted and calculated to obtain the subtask matching value of each data set to be distributed. The weight value for performing the weighting calculation may be preset, or may be dynamically determined according to the subtasks, for example, according to information such as the specified completion time and priority of the subtasks.

By comprehensively considering the information of each subtask and the information of each data cluster, each subtask is guaranteed to be allocated to a proper data cluster, so that the completion efficiency of each task to be processed is guaranteed.

And step S460, obtaining a target task processing result corresponding to the task to be processed based on the subtask processing result fed back by the data cluster to be allocated.

The specific implementation of step S460 is the same as step S240 in fig. 2, and is not described herein again.

Referring to fig. 5, in an exemplary embodiment, the number of the data clusters is multiple, and the task processing method includes steps S510 to S580, which are described in detail as follows:

step S510, receiving a registration request sent by the data cluster, and sending an identity certificate corresponding to the data cluster based on the registration request.

The registration request may carry Central Processing Unit (CPU) information, memory information, and the like of the data cluster, and the data cluster sends resource information of the data cluster to the server through the registration request; the identity certificate is used for uniquely identifying identity information of the data cluster.

For example, when the data cluster communicates with the server for the first time, the data cluster sends the registration request to the server, or the data cluster needs to periodically send the registration request to the server, which is not limited in this application.

In the embodiment of the application, after receiving a registration request of a data cluster, a server generates a unique identity certificate for the data cluster, and sends the identity certificate to the data cluster.

The encrypted identity certificate may be sent to the data cluster, and the encryption method used for encryption includes, but is not limited to, a symmetric encryption method, an asymmetric encryption method, and the like, which is not limited in this application. It can be understood that the encrypted identity certificate can be decrypted only by a decryption method corresponding to the encryption method, so that the identity certificate is obtained by decryption, and the security of information transmission is improved.

Step S520, receiving a communication connection request sent by the data cluster, where the communication connection request includes an identity certificate to be authenticated corresponding to the data cluster, and authenticating the identity certificate to be authenticated.

The communication connection request sent by the data cluster is used for requesting the server side to establish communication connection, so that the data cluster and the server side perform data transmission after the communication connection is established. Therefore, the communication connection request includes the identity certificate to be authenticated, so that the server authenticates the identity certificate to be authenticated.

Step S530, judging whether the authentication passes, if so, establishing communication connection with the data cluster, and executing step S540; if the authentication fails, the process ends.

Exemplarily, the server side authenticates an identity certificate to be authenticated of the data cluster, and if the authentication is passed, the server side indicates that the data cluster can be trusted, and establishes communication connection with the data cluster; if the authentication is not passed, the data cluster is not trusted, and the communication connection with the data cluster is refused to be established.

Optionally, the server calculates the identity certificate according to a preset algorithm to determine whether a result obtained by the calculation meets a preset rule, and further determines whether the data cluster corresponding to the identity certificate to be authenticated is trusted, where the preset algorithm is an algorithm agreed by the server and the trusted data cluster; the identity certificate of the trusted data cluster may also be stored in the database of the server, and whether the data cluster corresponding to the identity certificate to be authenticated is trusted is determined by querying the database, which is not limited in the present application.

And step S540, the server side performs metadata synchronization with each data cluster.

For example, the data cluster in communication connection with the server periodically reports its current metadata to the server, so that the server updates the stored metadata information of each data cluster according to the metadata reported by each data cluster. Then, the server may obtain the metadata of each data cluster according to the locally stored metadata information of each data cluster. The method and the system for distributing the subtasks may also be that the server sends a metadata obtaining request to each data cluster in communication connection with the server when distributing the subtasks, so that each data cluster reports its current metadata to the server according to the metadata obtaining request, and then the server obtains metadata fed back by each data cluster, which is not limited by the application.

And step S550, acquiring the task to be processed.

The specific implementation of step S550 is the same as step S210 in fig. 2, and is not described herein again.

And step S560, splitting the task to be processed to obtain a subtask set, and allocating the subtasks in the subtask set to the data cluster so that the data cluster processes the subtasks to obtain a subtask processing result corresponding to the subtasks.

The specific implementation of step S560 is the same as step S220 and step S230 in fig. 2, and is not described herein again.

And step S570, acquiring the subtasks corresponding to the subtask processing results fed back by the data cluster, and modifying the task states of the subtasks into completion states.

Each subtask in the subtask set of the server corresponds to a task state, the task state includes but is not limited to an unallocated state, an uncompleted state, a completed state, and the like, the unallocated state of the subtask indicates that the subtask has not been allocated to the data cluster, the uncompleted state of the subtask indicates that the subtask has been allocated to the data cluster but the data cluster has not fed back a subtask processing result, and the completed state of the subtask indicates that the subtask has been allocated to the data cluster and the data cluster has fed back a subtask processing result.

Step S580, determining whether task states of all subtasks in the subtask set are all completion states, and if yes, aggregating the subtask processing results fed back by each data cluster to obtain a target task processing result; if not, the process continues to step S560.

And judging whether all the subtasks of the task to be processed are processed and finished. And if the processing is finished, converging the subtask processing results fed back by each data cluster to obtain a target task processing result. For example, each task to be processed is correspondingly provided with a subtask processing result set, and the subtask processing results fed back by each data cluster are added to the subtask processing result set, wherein if a calculation operation is required to be performed among a plurality of subtask processing results in the subtask processing result set, a corresponding calculation is performed, so as to obtain a calculated target task processing result. Optionally, the calculation may be performed once each time a subtask processing result fed back by the data cluster is received, or may be performed once after all subtasks have been processed, which is not limited in the present application.

If any subtask is not in the completion state, the step S560 is continuously executed until all subtasks are in the completion state, and a target task processing result is obtained, so as to store the target task processing result or display the target task processing result by the user.

Referring to fig. 6, in an exemplary embodiment, the task processing method includes steps S610 to S680, which are described in detail as follows:

and step S610, acquiring the task to be processed.

The specific implementation of step S610 is the same as step S210 in fig. 2, and is not described herein again.

And S620, splitting the task to be processed to obtain a subtask set.

The specific implementation of step S620 is the same as step S220 in fig. 2, and is not described herein again.

And step S630, distributing the subtasks in the subtask set to the data cluster so that the data cluster processes the subtasks to obtain a subtask processing result corresponding to the subtasks.

The specific implementation of step S630 is the same as step S230 in fig. 2, and is not described herein again.

And step S640, acquiring fault information sent by the data cluster, wherein the fault information carries processing information of a subtask with a processing fault in the data cluster.

The failure information may be log information in the processing process of the subtask to reflect a specific processing condition of the subtask in the data cluster.

Illustratively, the data cluster may send a heartbeat data packet to the server at a preset time interval, and may carry information such as completion status of each subtask in the data cluster in the heartbeat data packet, where the completion status of the subtask includes, but is not limited to, completion progress information of the subtask, fault information of the subtask, and the like, so that the server may monitor processing conditions of each subtask in real time.

For example, the data cluster may send a heartbeat data packet to the intermediate proxy node every preset time, and information such as a completion state of each subtask in the data cluster may be carried in the heartbeat data packet, so that the intermediate proxy node of the data cluster monitors, in real time, a processing condition of each subtask in the local data cluster, and then sends the failure information to the server when monitoring that the subtask with a processing failure exists, so as to reduce signaling interaction between the data cluster and the server and reduce data transmission cost.

And step S650, determining the reason information causing the fault triggering of the subtask processing according to the fault information.

The reason information refers to a condition causing a fault in the sub-task processing process.

For example, the fault information may be subjected to keyword recognition to obtain keyword information corresponding to the fault information, so as to perform fault classification on the fault information according to the keyword information, and determine the cause information causing the fault information to be generated by triggering the subtask processing fault.

Optionally, including but not limited to the following reason information:

(1) the task processing times out. In the execution process of each subtask, the processing time of each subtask is fixed, in order to ensure that the subtask cannot be executed indefinitely due to an abnormal reason, so that a user cannot acquire the abnormal state of the task in time, timeout time is configured for each task, and if the subtask cannot be completed within a specified time, the task is considered to be failed. There are many reasons for the timeout of job operation, which may be that a certain subtask cannot be finished after a delay due to unstable software of the computing device in the data cluster, or that a certain subtask takes longer time to complete than usual due to insufficient computing resources of the computing device in the data cluster. When the task processing is overtime, the fault information contains obvious characteristic words, such as built timed out, Marking the built as failed and the like;

(2) the operating environment is abnormal. The normal processing of a sub-task generally has certain requirements on the environment in which the sub-task operates. For example, a MapReduce job depends on the execution environment of Hadoop, so a Hadoop program must be installed on the execution node of the job. Generally, the online of a subtask requires the configuration of each running node of the subtask, including installing some programs, adding environment variables, adding local dependency libraries, applying an access right of an operation node to some databases to the DBA, and the like. However, an exception may occur in the environment of executing the node for some reasons, for example, the operation and maintenance personnel add a new running node but miss some configurations, or some dependent files are deleted by mistake due to program bug, and at this time, the operation may be performed in error. For the case of a local dependency library missing or an environment variable missing, usually, the operation cannot find the specified file or program path, and thus an error occurs, and the failure information includes obvious feature words, such as a cableaccessibility file, a command not found, and the like. For example, for the lack of cluster access authority or database access authority, the failure information includes obvious feature words, such as not permitted, access dense, and the like.

(3) And (6) program bug. The newly online subtask may have bug due to insufficient testing, and failure may occur in the process of processing. The subtask which has run on the line may also cause the program to be modified due to the change of the requirement, and bug is introduced, so that the subtask which can normally run originally also fails. In this case, the subtask is normally aborted because the program throws a runtime exception, and the exception stack information is generally printed in the failure information.

And step S660, acquiring the backup data cluster corresponding to the subtask with the fault based on the reason information.

The method in step S230 in fig. 2 may be used to select the initial backup data cluster corresponding to the sub-task with the fault, and then screen the initial backup data cluster according to the reason information to obtain the backup data cluster corresponding to the sub-task with the fault. For example, if the reason information indicates that the subtask processing is overtime due to insufficient computing resources of the computing devices in the data cluster, acquiring the data cluster of which the computing resources meet the subtask with the fault as a standby data cluster; for example, if the reason information indicates that the subtask processing fails due to an error in the operating environment of the computing device in the data cluster, the data cluster whose operating environment satisfies the failed subtask is acquired as the standby data cluster.

And step S670, distributing the sub task with the fault to the standby data cluster so that the standby data cluster processes the sub task with the fault again.

And distributing the sub-tasks with faults to the standby data cluster so that the standby data cluster can reprocess the sub-tasks with faults, and then timely process the sub-tasks with abnormity in the tasks to be processed, thereby ensuring the processing progress of the tasks to be processed.

And step S680, obtaining a target task processing result corresponding to the task to be processed based on the subtask processing result fed back by the data cluster.

The specific implementation of step S680 is the same as step S240 in fig. 2, and is not described herein again.

Referring to fig. 7, fig. 7 is a schematic diagram illustrating a task processing system according to an exemplary embodiment of the present application. The task processing system comprises a task processing control platform and a plurality of data clusters, wherein the task processing control platform is in communication connection with the data clusters through an intermediate agent node. Wherein: the task processing control platform is used for acquiring the tasks to be processed, splitting the tasks to be processed to obtain a subtask set, and respectively allocating the subtasks in the subtask set to the data clusters; the data cluster is used for processing the subtasks, obtaining subtask processing results corresponding to the subtasks, and sending the subtask processing results to the task processing control platform; and the task processing control platform is also used for obtaining a target task processing result corresponding to the task to be processed based on the subtask processing result fed back by the data cluster.

As shown in fig. 7, the task processing control platform is configured in the server, and includes a user interaction layer and a core layer, where the user interaction layer is used to provide a user interaction interface for a user; the core layer is used for executing the task processing method. The core layer comprises a task scheduling module, a metadata management module, a security authentication module and a resource registration module, wherein the task scheduling module is used for splitting tasks of the tasks to be processed, distributing subtasks, monitoring the subtasks, acquiring processing results of target tasks and the like, so that the subtasks of the tasks to be processed are correctly executed according to a preset sequence and flow, and the task execution abnormity is timely found; the metadata management module is used for storing metadata of each data cluster so as to store a data directory and data details of each data cluster; the safety authentication module is used for performing identity authentication on each data cluster to ensure the credibility and safety of a communication channel; the resource registration module is used for storing resource information of each data cluster, such as CPU information, memory information, and the like of each data cluster.

In the embodiment of the application, the data cluster is in communication connection with the task processing control platform through the intermediate proxy node, and the data cluster comprises computing resources and storage resources. The intermediate agent node can be an agent and comprises a big data interaction layer, a workflow engine and a control interaction layer, wherein the big data interaction layer is used for communicating with a local data cluster; the control interaction layer is used for communicating with the task processing control platform; the workflow engine processes the received subtasks, such as allocating the subtasks to specific computing devices in a local data cluster, further splitting the subtasks, monitoring the execution condition of the subtasks, and the like. The data transmission between the data clusters and the task processing control platform can be further reduced by introducing the intermediate agent nodes, so that the management of each data cluster is facilitated, and the data transmission between the data clusters and the task processing control platform is integrated and processed through the intermediate agent nodes, and the coupling between the data clusters is reduced.

In some embodiments, the task processing control platform may further include a big data interaction layer to communicate with the local data cluster through the big data interaction layer.

Fig. 8 is a block diagram of a task processing device shown in an exemplary embodiment of the present application. The task processing device may be applied to the implementation environment shown in fig. 1. The task processing apparatus may also be applied to other exemplary implementation environments, and is specifically configured in other devices, and this embodiment does not limit the implementation environment to which the apparatus is applied.

As shown in fig. 8, the exemplary task processing device 800 includes: a task obtaining module 810, a task splitting module 820, a task allocating module 830, and a result obtaining module 840. Specifically, the method comprises the following steps:

the task obtaining module 810 is configured to obtain a task to be processed.

And the task splitting module 820 is configured to split the task to be processed to obtain a subtask set.

The task allocation module 830 is configured to allocate the subtasks in the subtask set to the data cluster, so that the data cluster processes the subtasks to obtain a subtask processing result corresponding to the subtasks.

The result obtaining module 840 is configured to obtain a target task processing result corresponding to the task to be processed based on the sub-task processing result fed back by the data cluster.

On the basis of the above exemplary embodiment, the task allocation module 830 further includes a task parameter obtaining unit, a data cluster matching unit, and an allocation unit. Specifically, the method comprises the following steps:

the task parameter acquisition unit is configured to acquire task parameters of subtasks in the subtask set; wherein the task parameters are data required for processing the subtasks.

And the data cluster matching unit is configured to match the plurality of data clusters based on the task parameters of the subtasks to obtain the data clusters to be distributed corresponding to the subtasks.

And the distribution unit is configured to distribute the subtasks to the data clusters to be distributed corresponding to the subtasks.

On the basis of the above exemplary embodiment, the data cluster matching unit further includes a metadata obtaining unit, a data detecting unit, and a data cluster obtaining unit to be allocated. Specifically, the method comprises the following steps:

a metadata acquisition unit configured to acquire metadata of each data cluster.

And the data detection unit is configured to detect whether each data cluster meets the processing condition of the subtask or not based on the task parameter of the subtask and the metadata of each data cluster.

And the data cluster to be distributed acquisition unit is configured to take the data cluster meeting the processing condition of the subtask as the data cluster to be distributed corresponding to the subtask.

On the basis of the above exemplary embodiment, the number of data clusters to be allocated is multiple; the allocation unit further comprises a current data processing capability value acquisition unit, a sorting unit and a target allocation unit. Specifically, the method comprises the following steps:

and the current data processing capacity value acquisition unit is configured to respectively acquire current data processing capacity values of a plurality of data clusters to be distributed.

And the sequencing unit is configured to sequence the plurality of data clusters to be distributed based on the current data processing capacity value to obtain a sequencing result.

And the target distribution unit is configured to determine a target data cluster to be distributed based on the sorting result and distribute the subtask to the target data cluster to be distributed.

On the basis of the above exemplary embodiment, the task processing device 800 further includes a registration request receiving unit, an identity certificate transmitting unit, a connection request receiving unit, and a communication connection establishing unit. Specifically, the method comprises the following steps:

and the registration request receiving unit is configured to receive a registration request sent by the data cluster.

And the identity certificate sending unit is configured to send the identity certificate corresponding to the data cluster based on the registration request.

The connection request receiving unit is configured to receive a communication connection request sent by a data cluster, and the communication connection request includes an identity certificate to be authenticated corresponding to the data cluster.

And the communication connection establishing unit is configured to authenticate the identity certificate to be authenticated, and establish communication connection with the data cluster if the authentication is passed.

On the basis of the above exemplary embodiment, the task processing device 800 further includes a failure information obtaining unit, a cause information confirming unit, a standby data cluster obtaining unit, and a failure processing unit. Specifically, the method comprises the following steps:

and the fault information acquisition unit is configured to acquire fault information sent by the data cluster, wherein the fault information carries processing information of a subtask with a processing fault in the data cluster.

And the reason information confirming unit is configured to determine the reason information which causes the subtask processing fault to trigger the generation of the fault information according to the fault information.

And the standby data cluster acquisition unit is configured to acquire the standby data cluster corresponding to the sub-task with the fault based on the reason information.

And the fault processing unit is configured to distribute the sub-tasks with faults to the standby data clusters so that the standby data clusters can reprocess the sub-tasks with faults.

On the basis of the above exemplary embodiment, the number of data clusters is plural; the result obtaining module 840 further includes a state modifying unit and a target task processing result obtaining unit. Specifically, the method comprises the following steps:

and the state modifying unit is used for acquiring the subtasks corresponding to the subtask processing results fed back by the data cluster so as to modify the task states of the subtasks into a finished state.

And the target task processing result acquisition unit is configured to converge the subtask processing results fed back by each data cluster to obtain a target task processing result if the task states of all the subtasks in the subtask set are detected to be complete states.

It should be noted that the task processing apparatus provided in the foregoing embodiment and the task processing method provided in the foregoing embodiment belong to the same concept, and specific manners of processing operations of each module and unit have been described in detail in the method embodiment, and are not described again here. In practical applications, the task processing device provided in the above embodiment may distribute the above functions by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules to complete all or part of the above described functions, which is not limited herein.

FIG. 9 illustrates a schematic structural diagram of a computer system suitable for use to implement the electronic device of the embodiments of the subject application.

It should be noted that the computer system 900 of the electronic device shown in fig. 9 is only an example, and should not bring any limitation to the functions and the scope of the application of the embodiments.

As shown in fig. 9, the electronic device 900 is embodied in the form of a general purpose computing device. Components of electronic device 900 may include, but are not limited to: the at least one processing unit 910, the at least one storage unit 920, a bus 930 connecting different system components (including the storage unit 920 and the processing unit 910), and a display unit 940.

Where the storage unit stores program code, the program code may be executed by the processing unit 910 to cause the processing unit 910 to perform the steps according to various exemplary embodiments of the present disclosure described in the above-mentioned "exemplary methods" section of this specification.

The storage unit 920 may include readable media in the form of volatile memory units, such as a random access memory unit (RAM)921 and/or a cache memory unit 922, and may further include a read only memory unit (ROM) 923.

Storage unit 920 may also include a program/utility 924 having a set (at least one) of program modules 925, such program modules 925 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.

Bus 930 can be any of several types of bus structures including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local bus using any of a variety of bus architectures.

The electronic device 900 may also communicate with one or more external devices 970 (e.g., keyboard, pointing device, bluetooth device, etc.), with one or more devices that enable a user to interact with the electronic device 900, and/or with any device (e.g., router, modem, etc.) that enables the electronic device 900 to communicate with one or more other computing devices. Such communication may occur via input/output (I/O) interfaces 950. Also, the electronic device 900 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN) and/or a public network, such as the Internet) via the network adapter 960. As shown, the network adapter 960 communicates with the other modules of the electronic device 900 via the bus 930. It should be appreciated that although not shown, other hardware and/or application modules may be used in conjunction with the electronic device 900, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.

In particular, according to embodiments of the present application, the processes described above with reference to the flowcharts may be implemented as a computer application. For example, embodiments of the present application include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising a computer program for performing the method illustrated by the flow chart. Which when executed by the processing unit 910 performs the various functions defined in the system of the present application.

It should be noted that the computer readable media shown in the embodiments of the present application may be computer readable signal media or computer readable storage media or any combination of the two. The computer readable storage medium may be, for example, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a Read-Only Memory (ROM), an Erasable Programmable Read-Only Memory (EPROM), a flash Memory, an optical fiber, a portable Compact Disc Read-Only Memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with a computer program embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. The computer program embodied on the computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.

The units described in the embodiments of the present application may be implemented by an application program or by hardware, and the described units may also be disposed in a processor. Wherein the names of the elements do not in some way constitute a limitation on the elements themselves.

Another aspect of the present application also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the foregoing task processing method. The computer-readable storage medium may be included in the electronic device described in the above embodiment, or may exist separately without being incorporated in the electronic device.

Another aspect of the application also provides a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and executes the computer instructions, so that the computer device executes the task processing method provided in the above embodiments.

The above description is only a preferred exemplary embodiment of the present application, and is not intended to limit the embodiments of the present application, and one of ordinary skill in the art can easily make various changes and modifications according to the main concept and spirit of the present application, so that the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A method for processing a task, the method comprising:

acquiring a task to be processed;

splitting the task to be processed to obtain a subtask set;

distributing subtasks in the subtask set to a data cluster so that the data cluster processes the subtasks to obtain subtask processing results corresponding to the subtasks;

and obtaining a target task processing result corresponding to the task to be processed based on the subtask processing result fed back by the data cluster.

2. The method of claim 1, wherein the number of data clusters is plural; the allocating the subtasks in the subtask set to the data cluster includes:

acquiring task parameters of subtasks in the subtask set; wherein the task parameters are data required for processing the subtasks;

matching the plurality of data clusters based on the task parameters of the subtasks to obtain the data clusters to be distributed corresponding to each subtask;

and distributing the subtasks to the data clusters to be distributed corresponding to the subtasks.

3. The method according to claim 2, wherein the matching a plurality of data clusters based on the task parameters of the subtasks to obtain the data cluster to be allocated corresponding to each subtask comprises:

acquiring metadata of each data cluster;

detecting whether each data cluster meets the processing condition of the subtask or not based on the task parameters of the subtask and the metadata of each data cluster;

and taking the data cluster which meets the processing condition of the subtask as the data cluster to be distributed corresponding to the subtask.

4. The method according to claim 2, wherein the number of the data clusters to be allocated is plural; the allocating the subtask to the data cluster to be allocated corresponding to the subtask includes:

respectively obtaining current data processing capacity values of a plurality of data clusters to be distributed;

sorting the plurality of data clusters to be distributed based on the current data processing capacity value to obtain a sorting result;

and determining a target data cluster to be distributed based on the sequencing result, and distributing the subtask to the target data cluster to be distributed.

5. The method of any of claims 1-4, wherein prior to said assigning subtasks of the set of subtasks to the data cluster, the method further comprises:

receiving a registration request sent by the data cluster;

based on the registration request, sending an identity certificate corresponding to the data cluster;

receiving a communication connection request sent by the data cluster, wherein the communication connection request comprises an identity certificate to be authenticated corresponding to the data cluster;

and authenticating the identity certificate to be authenticated, and if the authentication is passed, establishing communication connection with the data cluster.

6. The method of any of claims 1-4, wherein after said assigning subtasks of the set of subtasks to the data cluster, the method further comprises:

acquiring fault information sent by the data cluster, wherein the fault information carries processing information of a subtask with a processing fault in the data cluster;

determining reason information causing the subtask processing fault to trigger generation of the fault information according to the fault information;

acquiring a standby data cluster corresponding to the subtask with the fault based on the reason information;

and distributing the sub task with the fault to a standby data cluster so as to enable the standby data cluster to reprocess the sub task with the fault.

7. The method of any of claims 1 to 4, wherein the number of data clusters is plural; the obtaining of the target task processing result corresponding to the task to be processed based on the subtask processing result fed back by the data cluster includes:

acquiring subtasks corresponding to the subtask processing results fed back by the data cluster, and modifying the task states of the subtasks into completion states;

and if the task states of all the subtasks in the subtask set are detected to be the completion states, converging the subtask processing results fed back by each data cluster to obtain the target task processing result.

8. A task processing apparatus, characterized in that the apparatus comprises:

the task acquisition module is configured to acquire a task to be processed;

the task splitting module is configured to split the tasks to be processed to obtain a subtask set;

the task allocation module is configured to allocate the subtasks in the subtask set to a data cluster so that the data cluster processes the subtasks to obtain a subtask processing result corresponding to the subtasks;

and the result acquisition module is configured to acquire a target task processing result corresponding to the task to be processed based on the subtask processing result fed back by the data cluster.

9. A task processing system is characterized by comprising a task processing control platform and a plurality of data clusters, wherein the task processing control platform is in communication connection with the data clusters through an intermediate proxy node; wherein:

the task processing control platform is used for acquiring a task to be processed, splitting the task to be processed to obtain a subtask set, and respectively allocating the subtasks in the subtask set to the data cluster;

the data cluster is used for processing the subtasks, obtaining subtask processing results corresponding to the subtasks, and sending the subtask processing results to the task processing control platform;

and the task processing control platform is also used for obtaining a target task processing result corresponding to the task to be processed based on the subtask processing result fed back by the data cluster.

10. An electronic device, comprising:

a processor; and

a memory for storing one or more programs that, when processed by the one or more processors, cause the one or more processors to implement a task processing method as recited in any one of claims 1 to 7.