CN114780214B

CN114780214B - Task processing method, device, system and equipment

Info

Publication number: CN114780214B
Application number: CN202210352913.4A
Authority: CN
Inventors: 王学敏; 孟维业; 孙智超; 王涛
Original assignee: China Telecom Corp Ltd
Current assignee: China Telecom Corp Ltd
Priority date: 2022-04-01
Filing date: 2022-04-01
Publication date: 2024-01-09
Anticipated expiration: 2042-04-01
Also published as: CN114780214A

Abstract

The embodiment of the application discloses a task processing method, a device, a system and electronic equipment, wherein the method obtains a subtask set by splitting tasks to be processed; the subtasks in the subtask set are distributed to the data clusters, so that the data clusters process the subtasks to obtain subtask processing results corresponding to the subtasks, and therefore data of each data cluster are not required to be concentrated; and then, based on the subtask processing results fed back by the data clusters, obtaining target task processing results corresponding to the tasks to be processed, so that the data clusters only need to upload the subtask processing results, and then, the target task processing results can be obtained by summarizing the subtask processing results, thereby reducing the cost of data transmission of the data clusters and further improving the task completion efficiency.

Description

Task processing method, device, system and equipment

Technical Field

The present disclosure relates to the field of big data technology, and in particular, to a task processing method, apparatus, system, computer readable storage medium, electronic device, and computer program product.

Background

Currently, big data processing technology is widely known and applied, and in recent years, application scenes in which big data distributed in different regional clusters needs to be processed appear. Many companies or organizations need to purchase cloud servers in different regions to build clusters to process corresponding services according to their own service types or regions, and these different clusters generate a large amount of data each day and need to perform unified processing.

In the related art, in order to process a large amount of data distributed in clusters of different areas, a centralized processing method is mainly adopted, that is, data in clusters of different areas is transmitted to one cluster and then analyzed. However, such a processing is not always possible, and besides the time and cost problems associated with the transmission of large amounts of raw data, there is a problem that some countries or regions may prohibit the transmission of user raw data to other regions in order to protect the privacy of the data.

Therefore, considering the above problems, how to more reasonably process big data in different areas is a problem to be solved.

Disclosure of Invention

To solve the above technical problems, embodiments of the present application provide a task processing method, apparatus, system, computer readable storage medium, electronic device, and computer program product, so as to more reasonably process tasks related to large data clusters in different regions.

According to an aspect of the embodiments of the present application, there is provided a task processing method, including: acquiring a task to be processed; task splitting is carried out on the task to be processed to obtain a subtask set; distributing the subtasks in the subtask set to the data cluster so that the data cluster processes the subtasks to obtain subtask processing results corresponding to the subtasks; and obtaining a target task processing result corresponding to the task to be processed based on the subtask processing result fed back by the data cluster.

In some embodiments, the number of data clusters is a plurality; assigning subtasks in the set of subtasks to the data cluster, comprising: acquiring task parameters of sub-tasks in a sub-task set; wherein the task parameters are data required for processing the subtasks; matching the plurality of data clusters based on the task parameters of the subtasks to obtain data clusters to be allocated corresponding to the subtasks; and distributing the subtasks to the data clusters to be distributed corresponding to the subtasks.

In some embodiments, matching a plurality of data clusters based on task parameters of the subtasks to obtain data clusters to be allocated corresponding to each subtask includes: acquiring metadata of each data cluster; based on the task parameters of the subtasks and the metadata of each data cluster, detecting whether each data cluster accords with the processing conditions of the subtasks; and taking the data cluster meeting the processing conditions of the subtasks as the data cluster to be allocated corresponding to the subtasks.

In some embodiments, the number of clusters of data to be allocated is a plurality; the method for distributing the subtasks to the data clusters to be distributed corresponding to the subtasks comprises the following steps: respectively obtaining current data processing capability values of a plurality of data clusters to be distributed; sorting the plurality of data clusters to be distributed based on the current data processing capacity value to obtain a sorting result; and determining a target data cluster to be allocated based on the sorting result, and allocating the subtasks to the target data cluster to be allocated.

In some embodiments, prior to assigning the subtasks in the set of subtasks to the data cluster, the method further comprises: receiving a registration request sent by a data cluster; based on the registration request, sending an identity certificate corresponding to the data cluster; receiving a communication connection request sent by a data cluster, wherein the communication connection request comprises an identity certificate to be authenticated corresponding to the data cluster; and authenticating the identity certificate to be authenticated, and if the authentication passes, establishing communication connection with the data cluster.

In some embodiments, after assigning the subtasks in the set of subtasks to the data cluster, the method further comprises: acquiring fault information sent by a data cluster, wherein the fault information carries processing information of subtasks for processing faults in the data cluster; determining reason information for triggering subtask processing faults to generate fault information according to the processing information; acquiring a standby data cluster corresponding to a subtask with a fault based on the reason information; the failed subtasks are assigned to the spare data clusters such that the spare data clusters reprocess the failed subtasks.

In some embodiments, the number of data clusters is a plurality; the method for obtaining the target task processing result corresponding to the task to be processed based on the subtask processing result fed back by the data cluster comprises the following steps: acquiring subtasks corresponding to the subtask processing results fed back by the data clusters, and modifying the task state of the subtasks into a completion state; if the task states of all the subtasks in the subtask set are detected to be the completion states, the subtask processing results fed back by the data clusters are converged to obtain target task processing results.

According to an aspect of the embodiments of the present application, there is provided a task processing device including: the task acquisition module is configured to acquire a task to be processed; the task splitting module is configured to split the task to be processed to obtain a subtask set; the task allocation module is configured to allocate the subtasks in the subtask set to the data cluster so that the data cluster processes the subtasks to obtain subtask processing results corresponding to the subtasks; the result acquisition module is configured to acquire a target task processing result corresponding to the task to be processed based on the subtask processing result fed back by the data cluster.

According to an aspect of an embodiment of the present application, there is provided a task processing system, including a task processing control platform and a plurality of data clusters, where the task processing control platform is communicatively connected to the data clusters through an intermediate proxy node; wherein: the task processing control platform is used for acquiring a task to be processed, splitting the task to be processed to obtain a subtask set, and respectively distributing the subtasks in the subtask set to the data cluster; the data cluster is used for processing the subtasks to obtain subtask processing results corresponding to the subtasks, and the subtask processing results are sent to the task processing control platform; the task processing control platform is also used for obtaining a target task processing result corresponding to the task to be processed based on the subtask processing result fed back by the data cluster.

According to an aspect of an embodiment of the present application, there is provided an electronic device including: a memory storing computer readable instructions; a processor reads the computer readable instructions stored in the memory to process the steps in the task processing method as described above.

According to an aspect of the embodiments of the present application, there is provided a computer-readable storage medium having stored thereon computer-readable instructions which, when processed by a processor of a computer, cause the computer to process steps in a task processing method as above.

In the technical scheme provided by the embodiment of the application, the task to be processed is acquired; task splitting is carried out on the task to be processed to obtain a subtask set; the subtasks in the subtask set are distributed to the data clusters, so that the data clusters process the subtasks to obtain subtask processing results corresponding to the subtasks, and therefore data of each data cluster are not required to be concentrated; and then, based on the subtask processing results fed back by the data clusters, obtaining target task processing results corresponding to the tasks to be processed, so that the data clusters only need to upload the subtask processing results, and then, the target task processing results can be obtained by summarizing the subtask processing results, thereby reducing the cost of data transmission of the data clusters and further improving the task completion efficiency.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description, serve to explain the principles of the application. It is apparent that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained from these drawings without inventive effort for a person of ordinary skill in the art. In the drawings:

FIG. 1 is a schematic illustration of an implementation environment to which the present application relates;

FIG. 2 is a flow chart of a task processing method shown in an exemplary embodiment of the present application;

FIG. 3 is a schematic diagram of acquiring a task to be processed in an exemplary embodiment of the present application;

FIG. 4 is a flow chart of a task processing method shown in another exemplary embodiment of the present application;

FIG. 5 is a flow chart of a task processing method shown in another exemplary embodiment of the present application;

FIG. 6 is a flow chart of a task processing method shown in another exemplary embodiment of the present application;

FIG. 7 is a schematic diagram of a task processing system shown in an exemplary embodiment of the present application;

FIG. 8 is a block diagram of a task processing device shown in an exemplary embodiment of the present application;

fig. 9 shows a schematic diagram of a computer system suitable for use in implementing the electronic device of the embodiments of the present application.

Detailed Description

Exemplary embodiments will now be described in detail, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present application as detailed in the accompanying claims.

The block diagrams depicted in the figures are merely functional entities and do not necessarily correspond to physically separate entities. That is, the functional entities may be implemented in software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.

The flow diagrams depicted in the figures are exemplary only, and not necessarily all of the elements and operations/steps are included or may be processed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the order of actual processing may be changed according to actual situations.

Reference to "a plurality" in this application means two or more than two. "and/or" describes an association relationship of an association object, meaning that there may be three relationships, e.g., a and/or B may represent: a exists alone, A and B exist together, and B exists alone. The character "/" generally indicates that the context-dependent object is an "or" relationship.

In recent years, big data processing is well known, and a new application scenario, namely cross-region big data processing, is presented, the traditional big data processing aims at data in a single cluster, and the cross-region big data processing is used for processing big data distributed in clusters in different regions. With the development of the internet and globalization, many companies or organizations may build service end clusters in different regions to process corresponding services according to own service types or region-oriented differences. The business operations in these clusters generate a large amount of data every day, which may come from different sources such as terminal devices, sensors, etc., and the analysis processing of the large amount of data is an operation that needs to be performed every day. For example, companies need to detect the presence of network attacks by analyzing the network traffic in each cluster, and such work must uniformly analyze the log data in all clusters to get more accurate results.

The existing big data processing system is basically in a centralized processing mode, namely, when cross-region data is processed, the data of each cluster must be first centralized into one cluster. Such operations necessarily result in a large amount of data transfer between clusters, with consequent time and cost, and even though time and cost considerations are not considered, many countries or regions now prohibit such trans-regional data transfer in order to protect data privacy.

Based on this, in order to ensure efficiency of processing tasks involving data clusters of a plurality of regions, embodiments of the present application provide a task processing method, apparatus, system, computer-readable storage medium, electronic device, and computer program product.

The task processing method provided in the embodiment of the present application is described below, where fig. 1 is a schematic diagram of an implementation environment related to the task processing method in the present application. As shown in fig. 1, the implementation environment includes a terminal 110 and a server 120, where the terminal 110 and the server 120 may be directly or indirectly connected through a wired or wireless communication manner, which is not limited herein.

The terminal 110 may be, but is not limited to, a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart watch, etc. Terminal 110 may refer broadly to one of a plurality of terminals, with the present embodiment being illustrated only by terminal 110. Those skilled in the art will recognize that the number of terminals may be greater or lesser. For example, the number of the terminals may be only one, or the number of the terminals may be tens or hundreds, or more, where the implementation environment of the task processing method further includes other terminals. The number of terminals and the device type are not limited in the embodiment of the present application.

The server 120 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server that provides cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDNs, basic cloud computing services such as big data and artificial intelligence platforms. The server 120 is configured to provide background services for an application running on the terminal 110.

Alternatively, the wireless network or wired network described above uses standard communication techniques and/or protocols. The network is typically the Internet, but may be any network including, but not limited to, a local area network (Local Area Network, LAN), metropolitan area network (Metropolitan Area Network, MAN), wide area network (Wide Area Network, WAN), mobile, wired or wireless network, private network, or any combination of virtual private networks. In some embodiments, data exchanged over the network is represented using techniques and/or formats including HyperText Mark-up Language (HTML), extensible markup Language (Extensible Markup Language, XML), and the like. All or some of the links may also be encrypted using conventional encryption techniques such as secure socket layer (Secure Socket Layer, SSL), transport layer security (Transport Layer Security, TLS), virtual private network (Virtual Private Network, VPN), internet protocol security (Internet Protocol Security, IPsec), and the like. In other embodiments, custom and/or dedicated data communication techniques may also be used in place of or in addition to the data communication techniques described above.

Optionally, the server 120 performs primary task processing, and the terminal 110 performs secondary task processing; alternatively, the server 120 performs the secondary task processing, and the terminal 110 performs the primary task processing; alternatively, the server 120 or the terminal 110 may each independently undertake task processing.

Illustratively, the terminal 110 sends a task processing instruction to the server 120, and the server 120 receives the task processing instruction sent by the terminal 110 and obtains a task to be processed; the server 120 splits the task to be processed to obtain a subtask set; distributing the subtasks in the subtask set to the data cluster so that the data cluster processes the subtasks to obtain subtask processing results corresponding to the subtasks; and obtaining a target task processing result corresponding to the task to be processed based on the subtask processing result fed back by the data cluster.

Referring to fig. 2, fig. 2 is a flowchart illustrating a task processing method according to an exemplary embodiment of the present application. The task processing method can be applied to the implementation environment shown in fig. 1, and is specifically processed by the server 120 in the implementation environment. It should be understood that the method may be adapted to other exemplary implementation environments and be specifically processed by devices in other implementation environments, and the embodiment is not limited to the implementation environments to which the method is adapted.

The task processing method provided in the embodiment of the present application is described in detail below with the server side as a specific processing main body.

As shown in fig. 2, in an exemplary embodiment, the task processing method at least includes steps S210 to S240, which are described in detail below:

step S210, acquiring a task to be processed.

The task to be processed refers to task data to be processed currently.

It is understood that there may be multiple tasks on the server, and one task may be randomly determined from the multiple tasks as a task to be processed. Optionally, according to the time sequence of task submission, determining the task with the earliest submission time from a plurality of tasks as the task to be processed; optionally, the task with the highest priority may be determined from the plurality of tasks according to the priority arrangement sequence of the tasks in the plurality of tasks as the task to be processed, which is not limited in this application.

For example, referring to fig. 3, fig. 3 is a schematic diagram illustrating acquiring a task to be processed according to an exemplary embodiment of the present application. As shown in fig. 3, the server side is provided with a user interaction interface, and the user interaction interface obtains task data input by a user by detecting task configuration operation of the user. For example, a file uploading component is set on the user interaction interface, and the user uploads the file containing the task data by triggering the file uploading component, so as to obtain the task data, such as a program code file containing task content, a configuration file containing task processing flow, and the like, according to the file uploaded by the user. For example, a task data configuration component may be provided on the user interaction interface, so as to obtain task data by detecting an interaction operation between a user and the task data configuration component, where the task data configuration component includes a task type configuration component, a task priority configuration component, and the like, the user selects a task type of the task through the task type configuration component, and the priority of the task is set through the task priority configuration component. It can be understood that the method for acquiring the task data is not limited in this application, for example, the task data may also be stored in a database, and the task data is obtained by querying the database. Further, the obtained task data is input into a task queue, and the server selects the task data to be processed currently from N task data contained in the task queue so as to obtain a task to be processed.

The server side may actively acquire task data when detecting that a preset event is triggered, for example, when detecting that the current time reaches the preset time, the server side may acquire task data from the database; the server may also obtain the task data corresponding to the task processing request when receiving the task processing request, which is not limited in this application.

And step S220, splitting the task to be processed to obtain a subtask set.

The task splitting operation is used for refining a complete task to be processed into a plurality of subtasks, and the subtasks form a subtask set.

For example, the tasks to be processed may be split according to the resource condition of the data cluster. For example, according to the current data structure information of each data cluster, the task to be processed is split into sub-task sets. If the task to be processed is to count the number of the first type data and the second type data, the first type data in the data cluster A is obtained according to the data structure information of the data cluster A, and the second type data in the data cluster B is obtained according to the data structure information of the data cluster B, the task to be processed needs to be subdivided into a first subtask and a second subtask, wherein the first subtask is to count the number of the first type data, and the second subtask is to count the number of the second type data.

For example, the task to be processed may be split according to workflow information of the task to be processed, where the workflow information refers to processing rules between steps in the task to be processed. The workflow information may be obtained according to a workflow input operation of a user, or may be obtained by analyzing task data by a server. Each step may be subdivided into one subtask, or a plurality of steps may be subdivided into one subtask according to the dependency relationship between the respective steps.

For example, the task to be processed may be split by combining the resource condition of the data cluster and the workflow information of the task to be processed, which is not limited in this application.

And step S230, distributing the subtasks in the subtask set to the data cluster so that the data cluster processes the subtasks to obtain subtask processing results corresponding to the subtasks.

The data clusters store data needed to be used for processing the task to be processed, and the data needed to be used for processing the task to be processed can be stored in a plurality of data clusters respectively and can be processed only by combining the data needed to be used for processing the task to be processed contained in the data clusters. Therefore, in order to avoid the data clusters from uploading the data needed by the included tasks to be processed in a concentrated manner, the tasks to be processed are thinned into a plurality of subtasks, and the subtasks are issued to the data clusters corresponding to the subtasks for local calculation.

It will be appreciated that multiple subtasks in a subtask set may be allocated to the same data cluster, and then processing rules configured corresponding to the multiple subtasks need to be sent to the data cluster at the same time, that is, processing logic information between each subtask is sent, so that the data cluster processes the multiple subtasks according to the processing rules between the multiple subtasks.

For example, the data required by the subtask processing may be obtained, whether each data cluster meets the subtask processing condition is determined according to the data required by the subtask processing, if the data cluster has the data required by the subtask processing, the data cluster meets the subtask processing condition of the subtask, that is, the subtask is allocated to the data cluster.

For example, there may be multiple data clusters that meet the subtask processing conditions of the same subtask, and the multiple data clusters may be ordered to select a data cluster that is ultimately matched by the subtask according to the ordering result, so as to send the subtask to the data cluster that is ultimately matched by the subtask.

Based on the above embodiment, the number of the tasks to be processed may be multiple, and the server may process the multiple tasks to be processed in parallel, that is, obtain subtask sets of multiple tasks to be processed, where the subtasks in each subtask set belong to the same task to be processed, and may add task identifiers to each subtask set according to different tasks to be processed to which the subtask set belongs, so as to further distinguish the subtask sets of different tasks to be processed. And then, distributing each subtask according to the priorities of the subtasks in the plurality of subtask sets at the same time, so that the server side can process a plurality of tasks to be processed at the same time. The subtasks with higher priority in each task to be processed can be processed in sequence according to the priority sequence of the subtasks in each task to be processed, so that the overall processing time for processing the subtasks of the plurality of tasks to be processed is shorter than the processing time for processing each task to be processed in sequence, and the overall task processing efficiency is improved.

Further, the data cluster processes the received subtasks to obtain subtask processing results. Optionally, the data cluster may refine the subtasks again, so as to respectively allocate the refined subtasks to each computing device in the data cluster for processing, and then summarize the processing results processed by each computing device, so as to obtain the subtask processing results.

Step S240, a subtask processing result based on the feedback of the data cluster is obtained, and a target task processing result corresponding to the task to be processed is obtained.

The target task processing result is a task processing result corresponding to the task to be processed, and is obtained according to the subtask processing result corresponding to each subtask of the task to be processed.

When it is detected that all subtasks corresponding to the task to be processed have obtained subtask processing results, the subtask processing results are converged, the converged subtask processing results are calculated to obtain target task processing results, and the target task processing results are stored.

When processing cross-region data, the data of each cluster needs to be concentrated into one cluster, so that a large amount of data transmission exists among the clusters, and further the time cost and the cost of the data transmission are improved; and then, based on the subtask processing results fed back by the data clusters, obtaining target task processing results corresponding to the tasks to be processed, so that the data clusters only need to upload the subtask processing results, and then, the target task processing results can be obtained by summarizing the subtask processing results, thereby reducing the cost of data transmission of the data clusters and further improving the task completion efficiency.

Referring to fig. 4, in an exemplary embodiment, the number of data clusters is plural, and the task processing method includes steps S410 to S460, which are described in detail below:

step S410, obtaining a task to be processed.

The specific embodiment of step S410 is identical to step S210 in fig. 2, and will not be described herein.

And step S420, splitting the task to be processed to obtain a subtask set.

The specific embodiment of step S420 is identical to step S220 in fig. 2, and will not be described herein.

Step S430, task parameters of sub-tasks in the sub-task set are obtained; wherein the task parameters are data required for processing the subtasks.

The task parameter refers to data needed when processing the subtask, for example, if the subtask is counting the number of the first type of data, the task parameter of the subtask is the first type of data.

It may be understood that the task parameters of the subtasks include direct data and indirect data, where the direct data refers to data that can be directly acquired by the server, and the indirect data refers to data that cannot be directly acquired by the server. For example, the data that the server may directly obtain may be data in a local database of the server, or may be data input by a user, or may be data obtained by calculating data in the local database of the server or data input by the user, or the like, that is, data that may be obtained without performing a data obtaining operation on other data clusters; the data that the server side cannot directly acquire may be data in other data clusters, or may be data obtained by calculating the data in other data clusters, that is, data that can be obtained only by executing data acquisition operation on other data clusters.

The task parameter table is set corresponding to each subtask, the task parameter of each subtask is added to the task parameter table corresponding to each subtask, and the task parameters in the task parameter table are marked to indicate that the task parameters corresponding to the mark are direct data or indirect data through the mark.

Step S440, matching the plurality of data clusters based on the task parameters of the subtasks to obtain the data clusters to be allocated corresponding to each subtask.

The task parameter matching method comprises the steps of matching a plurality of data clusters based on a task parameter of a subtask, wherein the purpose of matching the plurality of data clusters is to detect whether the data clusters meet the processing conditions of the subtask, and if the data clusters meet the processing conditions of the subtask, the data clusters are used as the data clusters to be distributed of the subtask; if the data cluster does not meet the processing conditions of the subtask, skipping the data cluster to identify the next data cluster.

Optionally, in the above exemplary embodiment, the embodiment of the present application may further implement step S440 by a method as follows, with details as follows: acquiring metadata of each data cluster; based on the task parameters of the subtasks and the metadata of each data cluster, detecting whether each data cluster accords with the processing conditions of the subtasks; and taking the data cluster meeting the processing conditions of the subtasks as the data cluster to be allocated corresponding to the subtasks.

The metadata is data (data about data) describing data, mainly describing information of data attribute (property), and is used for supporting functions such as indicating storage location, historical data, resource searching, file recording, and the like.

For example, the server side stores metadata information of each data cluster in communication connection with the server side, and the data clusters in communication connection with the server side periodically report current metadata to the server side, so that the server side updates the metadata information of each stored data cluster according to the metadata reported by each data cluster. Then, the server side can obtain metadata of each data cluster according to the metadata information of each data cluster stored locally.

For example, when the subtasks are distributed, the server may send a metadata acquisition request to each data cluster communicatively connected to the server, so that each data cluster reports current metadata of the server to the server according to the metadata acquisition request, and the server obtains metadata fed back by each data cluster.

Further, indirect data corresponding to the subtask is obtained according to the task parameters of the subtask, metadata of each data cluster are matched according to data information of the indirect data, if all the indirect data of the subtask are stored in the data clusters, the data clusters are indicated to accord with processing conditions of the subtask, and the data clusters which accord with the processing conditions of the subtask are used as data clusters to be distributed corresponding to the subtask.

And S450, distributing the subtasks to the data clusters to be distributed corresponding to the subtasks, so that the data clusters to be distributed process the subtasks to obtain subtask processing results corresponding to the subtasks.

For example, the server side generates a task processing request according to the subtask, so as to issue the task processing request to a data cluster to be allocated corresponding to the subtask, and then the data cluster to be allocated processes the subtask according to the task processing request. The task processing request may include at least one subtask and a processing rule between subtasks, and the task processing request may further include information such as a priority of the subtask, a processing time of the subtask, and the like, which is not limited in this application.

Optionally, in the above exemplary embodiment, the number of data clusters to be allocated is multiple, and the embodiment of the present application may further implement step S450 by using a method as described below, with details as follows: respectively obtaining current data processing capability values of a plurality of data clusters to be distributed; sorting the plurality of data clusters to be distributed based on the current data processing capacity value to obtain a sorting result; and determining a target data cluster to be allocated based on the sorting result, and allocating the subtasks to the target data cluster to be allocated.

The current data processing capability value refers to the capability of the data cluster to be allocated to process the task, and for the same subtask, the higher the current data processing capability value of the data cluster to be allocated is, the shorter the time required for the data cluster to be allocated to complete the subtask is, otherwise, the lower the current data processing capability value of the data cluster to be allocated is, the longer the time required for the data cluster to be allocated to complete the subtask is.

The current data processing capacity value of each data set to be allocated is calculated according to the information such as the resource usage condition of each data set to be allocated, the data transmission time of each data set to be allocated, the data transmission cost of each data set to be allocated and the like. The resource usage of the data cluster is the number of idle resources of the data cluster, and for the same subtask, the smaller the number of idle resources of the data cluster is, the longer the data cluster needs to complete the subtask, otherwise, the larger the number of idle resources of the data cluster is, the shorter the data cluster needs to complete the subtask. The data transmission time of the data cluster refers to the time required for one data transmission between the data cluster and the server. The data transmission cost of the data cluster refers to the cost required for one data transmission between the data cluster and the server.

In the embodiment of the present application, weighting calculation may be performed on information such as resource usage of each data cluster, data transmission time of each data cluster, and data transmission cost of each data cluster, so as to obtain a current data processing capability value of each data set to be allocated. The weight for performing the weighting calculation may be preset, or may be dynamically determined according to the subtasks, for example, dynamically determined according to the information such as the specified completion time and priority of the subtasks.

And after the current data processing capacity value of each data set to be allocated is obtained, sorting the plurality of data sets to be allocated based on the current data processing capacity value to obtain a sorting result, determining a target data set to be allocated based on the sorting result, and allocating the subtasks to the target data set to be allocated. For example, the data set to be allocated with the highest current data processing capacity value in the sorting result may be used as the target data set to be allocated.

For example, the subtask matching value may be calculated according to the priority of the subtask, the calculated amount of the subtask and the current data processing capability value of each data set to be allocated, so as to take the data set to be allocated with the highest subtask matching value of the subtask as the target data set to be allocated. The priority of a subtask refers to the urgency of the subtask, for example, the higher the priority of the subtask, the more urgent the subtask needs to be allocated to the data cluster with more idle resources, whereas the lower the priority of the subtask, the less urgent the subtask, the subtask may be allocated to the data cluster with less idle resources. The calculation amount of the subtasks refers to calculation resources required for completing the subtasks, and for the same data cluster, the larger the calculation amount of the subtasks is, the longer the time required for the data cluster to complete the subtasks is, whereas the smaller the calculation amount of the subtasks is, the shorter the time required for the data cluster to complete the subtasks is.

In the embodiment of the present application, the priority of the subtask, the calculated amount of the subtask, and the current data processing capability value of each data set to be allocated may be weighted, so as to obtain the subtask matching value of each data set to be allocated. The weight for performing the weighting calculation may be preset, or may be dynamically determined according to the subtasks, for example, dynamically determined according to the information such as the specified completion time and priority of the subtasks.

By comprehensively considering the information of each subtask and the information of each data cluster, each subtask is ensured to be distributed to a proper data cluster, so that the completion efficiency of each task to be processed is ensured.

Step S460, obtaining a target task processing result corresponding to the task to be processed based on the subtask processing result fed back by the data cluster to be distributed.

The specific embodiment of step S460 is identical to step S240 in fig. 2, and will not be described herein.

Referring to fig. 5, in an exemplary embodiment, the number of data clusters is plural, and the task processing method includes steps S510 to S580, which are described in detail below:

step S510, receiving a registration request sent by the data cluster, and sending an identity certificate corresponding to the data cluster based on the registration request.

The registration request can carry central processing unit (Central Processing Unit, CPU) information, memory information and the like of the data cluster, and the data cluster sends resource information of the data cluster to the server through the registration request; the identity certificate is used to uniquely identify the identity information of the data cluster.

The data cluster may send a registration request to the server when the data cluster performs the first communication with the server, or the data cluster needs to periodically send a registration request to the server, which is not limited in this application.

In this embodiment of the present application, after receiving a registration request of a data cluster, a server generates a unique identity certificate for the data cluster, and sends the identity certificate to the data cluster.

The encrypted identity certificate may be sent to the data cluster, and the encryption method used in the encryption includes, but is not limited to, a symmetric encryption method, an asymmetric encryption method, and the like, which is not limited in this application. It can be understood that a decryption method corresponding to the encryption method is required to decrypt the encrypted identity certificate, so as to obtain the identity certificate through decryption, thereby improving the security of information transmission.

Step S520, a communication connection request sent by the data cluster is received, where the communication connection request includes an identity certificate to be authenticated corresponding to the data cluster, and the identity certificate to be authenticated is authenticated.

The communication connection request sent by the data cluster is used for requesting to establish communication connection to the server side, so that after the communication connection is established, the data cluster and the server side perform data transmission. Therefore, the communication connection request includes the identity certificate to be authenticated, so that the server side authenticates the identity certificate to be authenticated.

Step S530, judging whether the authentication is passed, if so, establishing communication connection with the data cluster, and executing step S540; if the authentication is not passed, the process is ended.

The server side authenticates an identity certificate to be authenticated of the data cluster, if the identity certificate passes, the server side indicates that the data cluster can be trusted, and communication connection is established with the data cluster; if the authentication is not passed, the data cluster is not trusted, and the communication connection with the data cluster is refused to be established.

Optionally, the server calculates the identity certificate according to a preset algorithm to determine whether the calculated result meets a preset rule, and further determine whether the data cluster corresponding to the identity certificate to be authenticated is trusted, where the preset algorithm is an algorithm agreed by the server and the trusted data cluster together; the database of the server may also store the identity certificate of the trusted data cluster, and the database is queried to determine whether the data cluster corresponding to the identity certificate to be authenticated is trusted, which is not limited in the application.

Step S540, the server performs metadata synchronization with each data cluster.

For example, the data clusters communicatively connected to the server may periodically report the current metadata of the data clusters to the server, so that the server updates the metadata information of each stored data cluster according to the metadata reported by each data cluster. Then, the server side can obtain metadata of each data cluster according to the metadata information of each data cluster stored locally. When the subtasks are distributed, the server side can send metadata acquisition requests to each data cluster in communication connection with the server side, so that each data cluster can report current metadata of the server side to the server side according to the metadata acquisition requests, and the server side can obtain the metadata fed back by each data cluster.

Step S550, obtaining a task to be processed.

The specific embodiment of step S550 is identical to step S210 in fig. 2, and will not be described herein.

Step S560, task splitting is carried out on the tasks to be processed to obtain a subtask set, subtasks in the subtask set are distributed to the data clusters, so that the data clusters process the subtasks to obtain subtask processing results corresponding to the subtasks.

The specific embodiment of step S560 is identical to step S220 and step S230 in fig. 2, and will not be described herein.

Step 570, obtaining a subtask corresponding to the subtask processing result fed back by the data cluster, so as to modify the task state of the subtask into a completion state.

Each subtask in the subtask set of the server corresponds to a task state, wherein the task state comprises, but is not limited to, an unassigned state, an unfinished state, a completed state and the like, the fact that the subtask is in the unassigned state indicates that the subtask is not assigned to the data cluster, the fact that the subtask is assigned to the data cluster but the data cluster does not feed back a subtask processing result is indicated by the fact that the subtask is in the unfinished state, the fact that the subtask is assigned to the data cluster and the fact that the subtask processing result is fed back by the data cluster is indicated by the fact that the subtask is in the completed state.

Step S580, judging whether the task states of all the subtasks in the subtask set are all the completion states, if so, converging the subtask processing results fed back by each data cluster to obtain target task processing results; if not, the steps continue to step S560.

And judging whether all subtasks of the task to be processed are processed. And if the processing is completed, converging the subtask processing results fed back by each data cluster to obtain a target task processing result. For example, each task to be processed is correspondingly provided with a subtask processing result set, and the subtask processing results fed back by each data cluster are added to the subtask processing result set, wherein if calculation operation is needed among a plurality of subtask processing results in the subtask processing result set, corresponding calculation is performed, and the calculated target task processing result is obtained. Alternatively, the calculation may be performed once each time the subtask processing result fed back by the data cluster is received, or may be performed once after all the subtasks have been processed, which is not limited in this application.

If any subtask is not in the completion state, the step S560 is continuously executed until all subtasks are in the completion state, and the target task processing result is obtained, so as to store the target task processing result or display the target task processing result by a user.

Referring to fig. 6, in an exemplary embodiment, the task processing method includes steps S610 to S680, which are described in detail as follows:

step S610, obtaining a task to be processed.

The specific embodiment of step S610 is identical to step S210 in fig. 2, and will not be described herein.

And step S620, splitting the task to be processed to obtain a subtask set.

The specific embodiment of step S620 is identical to step S220 in fig. 2, and will not be described herein.

Step S630, the subtasks in the subtask set are distributed to the data clusters, so that the data clusters process the subtasks to obtain subtask processing results corresponding to the subtasks.

The specific embodiment of step S630 is identical to step S230 in fig. 2, and will not be described herein.

Step S640, obtaining fault information sent by the data cluster, where the fault information carries processing information of a subtask that has a processing fault in the data cluster.

The fault information may be log information in the processing procedure of the subtask, so as to reflect the specific processing situation of the subtask in the data cluster.

The data cluster may send a heartbeat data packet to the server at preset time intervals, and information such as a completion status of each subtask in the data cluster may be carried in the heartbeat data packet, where the completion status of the subtask includes, but is not limited to, completion progress information of the subtask, failure information of the subtask, and the like, so that the server may monitor a processing condition of each subtask in real time.

The data cluster may send a heartbeat data packet to the intermediate proxy node at preset time intervals, and information such as the completion status of each subtask in the data cluster may be carried in the heartbeat data packet, so that the intermediate proxy node of the data cluster monitors the processing condition of each subtask in the local data cluster in real time, and then the intermediate proxy node sends fault information to the server when monitoring that the subtask with a processing fault exists, so as to reduce signaling interaction between the data cluster and the server and reduce data transmission cost.

Step S650, determining the reason information for generating fault information by subtask processing fault triggering according to the fault information.

The reason information refers to a condition for causing faults in the subtask processing process.

For example, keyword recognition can be performed on the fault information to obtain keyword information corresponding to the fault information, so that the fault information is subjected to fault classification according to the keyword information, and the reason information for triggering the subtask processing fault to generate the fault information is determined.

Optionally, including but not limited to the following reason information:

(1) The task processing times out. In the execution process of each subtask, the processing time of each subtask is fixed, so that the subtask cannot be infinitely executed due to abnormality, a user cannot timely acquire the abnormal state of the task, timeout time can be configured for each task, and if the subtask cannot be completed within the appointed time, the subtask is considered to be failed. The reasons for the timeout of the operation may be that a certain subtask cannot end later due to unstable software of the computing device itself in the data cluster, or that a certain subtask takes longer to complete than usual due to insufficient computing resources of the computing device in the data cluster. When the task processing is overtime, the fault information contains obvious characteristic words, such as built time out, marking the build as failed and the like;

(2) The operating environment is abnormal. The ability of a subtask to handle normal processing generally places certain demands on the environment in which it is run. For example, a MapReduce job depends on the execution environment of Hadoop, so a Hadoop program must be installed at the execution node of the job. Usually, the online of a subtask needs to be configured by designating each running node of the subtask, including installing some programs, adding environment variables, adding local dependency libraries, applying access rights of the operation nodes to some databases to the DBA, and the like. However, the environment of the execution node may be abnormal for some reasons, such as that an operation and maintenance person newly increases the operation node but omits some configurations, or some dependency files are deleted by mistake due to program bug, and then the operation of the job may be in error. For the case of a local dependency library miss or an environment variable miss, a job cannot find a specified file or program path and an error occurs, and the fault information includes obvious feature words, such as problem accessing file, command not found, etc. For example, if the cluster access right or the database access right is missing, the fault information contains obvious feature words, such as non-authorized, access authorized, and the like.

(3) Program bug. The newly online subtask may have a bug due to insufficient testing, and a failure condition may occur during processing. The subtasks already running on the line may be modified due to the change of the requirement, and bug is introduced, so that the subtasks which can normally run originally also fail. In this case, the subtask is typically aborted due to the program being thrown runtime exception, and the faulty information is typically printed out with abnormal stack information.

Step S660, obtaining the standby data cluster corresponding to the sub-task with the fault based on the reason information.

The method of step S230 in fig. 2 may be adopted to select an initial backup data cluster corresponding to the sub-task with a fault, and then filter the initial backup data cluster according to the cause information to obtain the backup data cluster corresponding to the sub-task with a fault. For example, if the reason information indicates that the subtask processing is overtime due to insufficient computing resources of the computing devices in the data cluster, the data cluster with computing resources meeting the subtask with the fault is obtained as a standby data cluster; for example, if the cause information is a result indicating that the subtask processing fails due to an operating environment error of the computing device in the data cluster, the data cluster in which the operating environment satisfies the faulty subtask is acquired as the standby data cluster.

Step S670, assigning the subtasks with faults to the standby data clusters, so that the standby data clusters reprocess the subtasks with faults.

And distributing the subtasks with faults to the standby data clusters so that the standby data clusters reprocess the subtasks with faults, and further timely processing the subtasks with anomalies in the tasks to be processed, thereby ensuring the processing progress of the tasks to be processed.

Step 680, obtaining a target task processing result corresponding to the task to be processed based on the subtask processing result fed back by the data cluster.

The specific embodiment of step S680 is identical to step S240 in fig. 2, and will not be described herein.

Referring to fig. 7, fig. 7 is a schematic diagram of a task processing system according to an exemplary embodiment of the present application. The task processing system comprises a task processing control platform and a plurality of data clusters, wherein the task processing control platform is in communication connection with the data clusters through an intermediate proxy node. Wherein: the task processing control platform is used for acquiring a task to be processed, splitting the task to be processed to obtain a subtask set, and respectively distributing the subtasks in the subtask set to the data cluster; the data cluster is used for processing the subtasks to obtain subtask processing results corresponding to the subtasks, and the subtask processing results are sent to the task processing control platform; the task processing control platform is also used for obtaining a target task processing result corresponding to the task to be processed based on the subtask processing result fed back by the data cluster.

As shown in fig. 7, the task processing control platform is configured in a server and includes a user interaction layer and a core layer, where the user interaction layer is used for providing a user interaction interface for a user; the core layer is used for executing the task processing method. The core layer comprises a task scheduling module, a metadata management module, a security authentication module and a resource registration module, wherein the task scheduling module is used for task splitting of a task to be processed, subtask allocation, subtask monitoring, target task processing result acquisition and the like, so that the subtask of the task to be processed is ensured to be executed correctly according to a preset sequence and flow, and abnormality of task execution is found in time; the metadata management module is used for storing metadata of each data cluster so as to store a data catalog and data details of each data cluster; the safety authentication module is used for carrying out identity authentication on each data cluster and ensuring the credibility and safety of a communication channel; the resource registration module is used for storing resource information of each data cluster, such as CPU information, memory information and the like of each data cluster.

In the embodiment of the application, the data cluster is in communication connection with the task processing control platform through the intermediate proxy node, and the data cluster comprises computing resources and storage resources. The intermediate agent node can be an agent and comprises a big data interaction layer, a workflow engine and a control interaction layer, wherein the big data interaction layer is used for communicating with a local data cluster; the control interaction layer is used for communicating with the task processing control platform; the workflow engine processes the received subtasks, such as assigning the subtasks to specific computing devices in the local data cluster, further splitting the subtasks, monitoring the execution of the subtasks, and the like. The intermediate proxy node is introduced to facilitate management of each data cluster, and the intermediate proxy node integrates data transmission between the data clusters and the task processing control platform, so that the data transmission between the data clusters and the task processing control platform can be further reduced, and the coupling between each data cluster is reduced.

In some implementations, the task processing control platform can also include a big data interaction layer to communicate with the local data clusters through the big data interaction layer.

FIG. 8 is a block diagram of a task processing device as shown in an exemplary embodiment of the present application. The task processing device can be applied to the implementation environment shown in fig. 1. The task processing device may also be adapted to other exemplary implementation environments and be specifically configured in other devices, and the embodiment is not limited to the implementation environment to which the device is adapted.

As shown in fig. 8, the exemplary task processing device 800 includes: a task acquisition module 810, a task splitting module 820, a task allocation module 830, and a result acquisition module 840. Specific:

the task acquisition module 810 is configured to acquire a task to be processed.

The task splitting module 820 is configured to split tasks to be processed to obtain a subtask set.

The task allocation module 830 is configured to allocate the subtasks in the subtask set to the data cluster, so that the data cluster processes the subtasks to obtain a subtask processing result corresponding to the subtasks.

The result obtaining module 840 is configured to obtain a target task processing result corresponding to the task to be processed based on the subtask processing result fed back by the data cluster.

Based on the above exemplary embodiment, the task allocation module 830 further includes a task parameter obtaining unit, a data cluster matching unit, and an allocation unit. Specific:

the task parameter acquisition unit is configured to acquire task parameters of the subtasks in the subtask set; wherein the task parameters are data required for processing the subtasks.

The data cluster matching unit is configured to match the plurality of data clusters based on the task parameters of the subtasks to obtain the data clusters to be allocated corresponding to the subtasks.

The allocation unit is configured to allocate the subtasks to the data clusters to be allocated corresponding to the subtasks.

On the basis of the above exemplary embodiment, the data cluster matching unit further includes a metadata acquisition unit, a data detection unit, and a data cluster to be allocated acquisition unit. Specific:

and the metadata acquisition unit is configured to acquire metadata of each data cluster.

And the data detection unit is configured to detect whether each data cluster accords with the processing condition of the subtask based on the task parameters of the subtask and the metadata of each data cluster.

The to-be-allocated data cluster acquisition unit is configured to take the data clusters meeting the processing conditions of the subtasks as to-be-allocated data clusters corresponding to the subtasks.

On the basis of the above exemplary embodiment, the number of data clusters to be allocated is a plurality; the allocation unit further comprises a current data processing capability value acquisition unit, a sorting unit and a target allocation unit. Specific:

and the current data processing capability value acquisition unit is configured to acquire the current data processing capability values of the plurality of data clusters to be allocated respectively.

And the sorting unit is configured to sort the plurality of data clusters to be distributed based on the current data processing capacity value to obtain a sorting result.

And the target allocation unit is configured to determine a target data cluster to be allocated based on the sorting result and allocate the subtasks to the target data cluster to be allocated.

On the basis of the above-described exemplary embodiment, the task processing device 800 further includes a registration request receiving unit, an identity certificate transmitting unit, a connection request receiving unit, and a communication connection establishing unit. Specific:

and the registration request receiving unit is configured to receive the registration request sent by the data cluster.

And the identity certificate sending unit is configured to send the identity certificate corresponding to the data cluster based on the registration request.

The connection request receiving unit is configured to receive a communication connection request sent by the data cluster, wherein the communication connection request comprises an identity certificate to be authenticated corresponding to the data cluster.

The communication connection establishment unit is configured to authenticate the identity certificate to be authenticated, and if the authentication passes, the communication connection establishment unit establishes communication connection with the data cluster.

On the basis of the above-described exemplary embodiments, the task processing device 800 further includes a failure information acquisition unit, a cause information confirmation unit, a backup data cluster acquisition unit, and a failure processing unit. Specific:

the fault information acquisition unit is configured to acquire fault information sent by the data cluster, wherein the fault information carries processing information of subtasks for processing faults in the data cluster.

And the reason information confirming unit is configured to confirm the reason information which causes the subtask processing fault trigger to generate fault information according to the fault information.

And the standby data cluster acquisition unit is configured to acquire the standby data cluster corresponding to the subtask with the fault based on the reason information.

And the fault processing unit is configured to allocate the subtasks with faults to the standby data clusters so that the standby data clusters reprocess the subtasks with faults.

Based on the above exemplary embodiments, the number of data clusters is plural; the result acquisition module 840 further includes a state modification unit and a target task processing result acquisition unit. Specific:

And the state modifying unit is used for acquiring subtasks corresponding to the subtask processing results fed back by the data clusters so as to modify the task states of the subtasks into finished states.

The target task processing result acquisition unit is configured to aggregate the subtask processing results fed back by each data cluster to obtain a target task processing result if the task states of all the subtasks in the subtask set are detected to be the completion states.

It should be noted that, the task processing device provided in the foregoing embodiment and the task processing method provided in the foregoing embodiment belong to the same concept, and specific manners of processing operations of each module and unit have been described in detail in the method embodiment, which is not repeated herein. In practical application, the task processing device provided in the above embodiment may distribute the functions to be completed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules to complete all or part of the functions described above, which is not limited herein.

It should be noted that, the computer system 900 of the electronic device shown in fig. 9 is only an example, and should not impose any limitation on the functions and the application scope of the embodiments of the present application.

As shown in fig. 9, the electronic device 900 is embodied in the form of a general purpose computing device. Components of electronic device 900 may include, but are not limited to: the at least one processing unit 910, the at least one storage unit 920, a bus 930 connecting the different system components (including the storage unit 920 and the processing unit 910), and a display unit 940.

Wherein the storage unit stores program code that is executable by the processing unit 910 such that the processing unit 910 performs steps according to various exemplary embodiments of the present disclosure described in the above-described "exemplary methods" section of the present specification.

The storage unit 920 may include readable media in the form of volatile storage units, such as Random Access Memory (RAM) 921 and/or cache memory 922, and may further include Read Only Memory (ROM) 923.

The storage unit 920 may also include a program/utility 924 having a set (at least one) of program modules 925, such program modules 925 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment.

The bus 930 may be one or more of several types of bus structures including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local bus using any of a variety of bus architectures.

The electronic device 900 may also communicate with one or more external devices 970 (e.g., keyboard, pointing device, bluetooth device, etc.), one or more devices that enable a user to interact with the electronic device 900, and/or any device (e.g., router, modem, etc.) that enables the electronic device 900 to communicate with one or more other computing devices. Such communication may occur through an input/output (I/O) interface 950. Also, electronic device 900 may communicate with one or more networks such as a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the Internet, through network adapter 960. As shown, the network adapter 960 communicates with other modules of the electronic device 900 over the bus 930. It should be appreciated that although not shown, other hardware and/or application modules may be used in connection with electronic device 900, including, but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, data backup storage systems, and the like.

In particular, according to embodiments of the present application, the processes described above with reference to flowcharts may be implemented as computer applications. For example, embodiments of the present application include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising a computer program for performing the method shown in the flowchart. When being executed by the processing unit 910, performs the various functions defined in the system of the present application.

It should be noted that, the computer readable medium shown in the embodiments of the present application may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium may be, for example, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-Only Memory (ROM), an erasable programmable read-Only Memory (Erasable Programmable Read Only Memory, EPROM), flash Memory, an optical fiber, a portable compact disc read-Only Memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present application, however, a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with a computer-readable computer program embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. A computer program embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.

The units referred to in the embodiments of the present application may be implemented by an application program or by hardware, and the described units may also be provided in a processor. Wherein the names of the units do not constitute a limitation of the units themselves in some cases.

Another aspect of the present application also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements a task processing method as before. The computer-readable storage medium may be included in the electronic device described in the above embodiment or may exist alone without being incorporated in the electronic device.

Another aspect of the present application also provides a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions so that the computer device performs the task processing methods provided in the respective embodiments described above.

The foregoing is merely a preferred exemplary embodiment of the present application and is not intended to limit the embodiments of the present application, and those skilled in the art may make various changes and modifications according to the main concept and spirit of the present application, so that the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A method of task processing, the method comprising:

acquiring a task to be processed;

carrying out task splitting on the task to be processed to obtain a subtask set;

distributing the subtasks in the subtask set to a data cluster so that the data cluster processes the subtasks to obtain a subtask processing result corresponding to the subtasks;

based on the subtask processing result fed back by the data cluster, obtaining a target task processing result corresponding to the task to be processed;

the number of the data clusters is a plurality; the assigning the subtasks in the subtask set to a data cluster includes:

acquiring task parameters of sub-tasks in the sub-task set; wherein the task parameters are data required for processing the subtasks;

matching a plurality of data clusters based on the task parameters of the subtasks to obtain data clusters to be allocated corresponding to the subtasks;

Distributing the subtasks to the data clusters to be distributed corresponding to the subtasks;

wherein the number of the data clusters is a plurality of; the assigning the subtasks to the data clusters to be assigned corresponding to the subtasks includes:

respectively obtaining current data processing capability values of a plurality of data clusters to be distributed; sorting the plurality of data clusters to be distributed based on the current data processing capacity value to obtain a sorting result;

determining a target data cluster to be allocated based on the sorting result, and allocating the subtasks to the target data cluster to be allocated;

the step of respectively obtaining the current data processing capability values of the plurality of data clusters to be distributed comprises the following steps:

weighting calculation is carried out on the resource use condition, the data transmission time and the data transmission cost corresponding to each of the plurality of data sets to be allocated so as to obtain the current data processing capacity value corresponding to each of the plurality of data sets to be allocated; and the weighted calculated weight is dynamically determined according to the task parameters of the subtasks.

2. The method according to claim 1, wherein the matching the plurality of data clusters based on the task parameters of the subtasks to obtain the data clusters to be allocated corresponding to the respective subtasks includes:

Acquiring metadata of each data cluster;

detecting whether each data cluster accords with the processing condition of the subtask or not based on the task parameters of the subtask and the metadata of each data cluster;

and taking the data cluster meeting the processing conditions of the subtasks as the data cluster to be distributed corresponding to the subtasks.

3. The method according to any one of claims 1 and 2, wherein prior to said assigning a subtask of said set of subtasks to said data cluster, the method further comprises:

receiving a registration request sent by the data cluster;

based on the registration request, sending an identity certificate corresponding to the data cluster;

receiving a communication connection request sent by the data cluster, wherein the communication connection request comprises an identity certificate to be authenticated corresponding to the data cluster;

and authenticating the identity certificate to be authenticated, and if the authentication is passed, establishing communication connection with the data cluster.

4. The method according to any one of claims 1 and 2, wherein after said assigning a subtask of said set of subtasks to said data cluster, the method further comprises:

Acquiring fault information sent by the data cluster, wherein the fault information carries processing information of subtasks with processing faults in the data cluster;

determining cause information for triggering the subtask processing fault to generate the fault information according to the fault information;

acquiring a standby data cluster corresponding to the subtask with the fault based on the reason information;

and distributing the sub-task with the fault to a standby data cluster so that the standby data cluster reprocesses the sub-task with the fault.

5. The method according to any one of claims 1 and 2, wherein the number of data clusters is a plurality; the sub-task processing result based on the data cluster feedback, to obtain a target task processing result corresponding to the task to be processed, includes:

acquiring a subtask corresponding to a subtask processing result fed back by the data cluster, so as to modify the task state of the subtask into a completion state;

and if the task states of all the subtasks in the subtask set are detected to be the completion states, gathering subtask processing results fed back by all the data clusters to obtain the target task processing results.

6. A task processing device, the device comprising:

the task acquisition module is configured to acquire a task to be processed;

the task splitting module is configured to split the task to be processed to obtain a subtask set;

the task allocation module is configured to allocate the subtasks in the subtask set to a data cluster so that the data cluster processes the subtasks to obtain subtask processing results corresponding to the subtasks; the number of the data clusters is a plurality; the assigning the subtasks in the subtask set to a data cluster includes: acquiring task parameters of sub-tasks in the sub-task set; wherein the task parameters are data required for processing the subtasks; matching a plurality of data clusters based on the task parameters of the subtasks to obtain data clusters to be allocated corresponding to the subtasks; distributing the subtasks to the data clusters to be distributed corresponding to the subtasks;

weighting calculation is carried out on the resource use condition, the data transmission time and the data transmission cost corresponding to each of the plurality of data sets to be allocated so as to obtain the current data processing capacity value corresponding to each of the plurality of data sets to be allocated; wherein, the weight value of the weighted calculation is dynamically determined according to the task parameters of the subtasks; the result acquisition module is configured to obtain a target task processing result corresponding to the task to be processed based on the subtask processing result fed back by the data cluster.

7. The task processing system is characterized by comprising a task processing control platform and a plurality of data clusters, wherein the task processing control platform is in communication connection with the data clusters through an intermediate proxy node; wherein:

the task processing control platform is used for acquiring a task to be processed, splitting the task to be processed to obtain a subtask set, and respectively distributing subtasks in the subtask set to the data cluster;

The data cluster is used for processing the subtasks to obtain subtask processing results corresponding to the subtasks, and sending the subtask processing results to the task processing control platform;

the task processing control platform is also used for obtaining a target task processing result corresponding to the task to be processed based on the subtask processing result fed back by the data cluster;

8. An electronic device, comprising:

a processor; and

a memory for storing one or more programs that, when processed by the one or more processors, cause the one or more processors to implement the task processing method of any of claims 1 to 5.