CN107959705B - Distribution method of streaming computing task and control server - Google Patents

Distribution method of streaming computing task and control server Download PDF

Info

Publication number
CN107959705B
CN107959705B CN201610908946.7A CN201610908946A CN107959705B CN 107959705 B CN107959705 B CN 107959705B CN 201610908946 A CN201610908946 A CN 201610908946A CN 107959705 B CN107959705 B CN 107959705B
Authority
CN
China
Prior art keywords
streaming
server cluster
cluster
streaming computing
computing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610908946.7A
Other languages
Chinese (zh)
Other versions
CN107959705A (en
Inventor
张钊
李名浩
胡四海
陈友林
王光炼
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Tmall Technology Co Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201610908946.7A priority Critical patent/CN107959705B/en
Priority to TW106127334A priority patent/TWI755417B/en
Priority to PCT/CN2017/105360 priority patent/WO2018072618A1/en
Publication of CN107959705A publication Critical patent/CN107959705A/en
Application granted granted Critical
Publication of CN107959705B publication Critical patent/CN107959705B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/40Support for services or applications
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0654Management of faults, events, alarms or notifications using network fault recovery
    • H04L41/0663Performing the actions predefined by failover planning, e.g. switching to standby network elements
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/10Active monitoring, e.g. heartbeat, ping or trace-route
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/16Threshold monitoring
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1001Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
    • H04L67/1004Server selection for load balancing
    • H04L67/1008Server selection for load balancing based on parameters of servers, e.g. available memory or workload
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1001Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
    • H04L67/1034Reaction to server failures by a load balancer
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1095Replication or mirroring of data, e.g. scheduling or transport for data synchronisation between network nodes
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Cardiology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Hardware Design (AREA)
  • General Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Hardware Redundancy (AREA)

Abstract

The application provides a distribution method of a streaming computing task and a control server, wherein the distribution method of the streaming computing task is applied to the control server connected with a streaming computing center server cluster and a streaming computing unit server cluster; the method comprises the following steps: distributing the streaming computing task to a target streaming computing center server cluster or a target streaming computing unit server cluster; and judging whether the target streaming computation center server cluster or the target streaming computation unit server cluster is abnormal or not, and if so, distributing the tasks which are not executed in the streaming computation tasks to the candidate streaming computation center server cluster. By adopting the embodiment of the application, when one streaming computation center server cluster or streaming computation unit server cluster is abnormal, tasks which are not completely executed can be continuously executed on other normal streaming computation center server clusters, and smooth execution of streaming computation tasks is ensured.

Description

Distribution method of streaming computing task and control server
Technical Field
The present application relates to the field of streaming computing technologies, and in particular, to a method for allocating streaming computing tasks and a control server, a method for executing streaming computing tasks and a streaming computing center server cluster, and a streaming computing system and a remote multi-active system.
Background
In the streaming calculation, the arrival time and the arrival sequence of data cannot be determined, and all data cannot be stored, so that the related server does not store streaming data any more, but directly calculates the data in real time in a memory after the streaming data arrives. With the rapid development of streaming computing in the internet big data era, the real-time performance, quality, service stability and availability of streaming data have higher and higher requirements, and therefore, the traditional distributed web service system is also a challenge. Due to the huge amount of data processed by the streaming computing system for real-time computing and reading, the streaming computing task is difficult to distribute in multiple places, for example, the remote real-time combination of duplicate removal statistical results, how to ensure the data consistency of multiple places, the regions of data sources are uncontrollable, and the like, so how to realize the multi-region cooperation of streaming computing, and real-time disaster tolerance is very necessary.
In the prior art, when streaming tasks are allocated, a cold backup in different areas is usually adopted, that is, an idle server is deployed in another area, so that when the service of one area is unavailable, streaming computing tasks are temporarily restored to the idle server in another area. However, the idle server is idle for a long time at ordinary times, which causes a problem of wasting a lot of system resources. In yet another way, the server may be deployed in a single computer room or multiple computer rooms in the same region, and data of multiple computer rooms are stored in one storage system at the same time to implement streaming computing. However, this may also result in that once the network of the area is unavailable (for example, an unexpected situation occurs, the optical cable is cut by the engineering machine), the storage system of the area is unavailable, or the machine resources of the area have reached the upper expansion limit and cannot be expanded continuously, and so on, the streaming computing system is unavailable, and smooth allocation and subsequent execution of the streaming computing task cannot be ensured.
Disclosure of Invention
Based on this, the application provides a distribution method of streaming computation tasks and an execution method of streaming computation tasks, which are used for executing different streaming computation tasks by using a control server to uniformly distribute the streaming computation tasks, wherein streaming computation center server clusters and streaming computation unit server clusters which are deployed in multiple places are used for executing different streaming computation tasks, preset computation resources are reserved in the streaming computation center server clusters, data synchronization is performed among the central storage clusters, and data in the unit storage clusters of the streaming computation unit server clusters are respectively synchronized to the central storage clusters. Based on this, when an abnormality occurs in a certain streaming computing unit server cluster or a streaming computing center server cluster, the part of tasks which are not executed yet by the streaming computing task being executed can be redistributed to a certain streaming computing center server cluster at other places for execution, so that the streaming computing task can be quickly recovered and normally executed in different places, an idle server does not need to be configured, and system resources are also saved.
The application also provides a control server, a streaming computing center server cluster and a streaming computing system, which are used for ensuring the realization and application of the method in practice.
In order to solve the above problems, the present application discloses a method for allocating computing tasks, which is applied to a control server connected to a streaming computing center server cluster and a streaming computing unit server cluster, wherein computing resources with preset proportions are reserved in the streaming computing center server cluster; the method comprises the following steps:
in response to receiving a streaming computing task, assigning the streaming computing task to a target streaming computing center server cluster or a target streaming computing unit server cluster;
and in the process that the target streaming computation center server cluster or the target streaming computation unit server cluster executes the streaming computation task, judging whether the target streaming computation center server cluster or the target streaming computation unit server cluster has an abnormal condition, and if so, distributing the task which is not executed in the streaming computation task to a candidate streaming computation center server cluster.
Wherein, the method also comprises:
the control server periodically sends heartbeat messages to the streaming computation center server cluster and the streaming computation element server cluster respectively, where the heartbeat messages are used to: detecting whether communication is possible between the control server and the cluster of streaming computing center servers, and whether communication is possible between the control server and the cluster of streaming computing unit servers;
correspondingly, the determining whether the target streaming computing center server cluster or the target streaming computing unit server cluster is abnormal specifically includes:
and judging whether the target streaming computation center server cluster or the target streaming computation unit server cluster does not feed back the heartbeat response within the preset feedback time.
Wherein, the allocating the tasks that are not executed in the streaming computing tasks to the candidate streaming computing center server cluster includes:
the control server acquires the load condition of the streaming computation center server cluster in real time;
and the control server distributes the tasks which are not executed in the streaming computing tasks to the streaming computing center server cluster with the minimum current load according to the load condition.
The stream type computation center server cluster is provided with a center storage cluster, center storage clusters among the stream type computation center server clusters synchronize intermediate state data and intermediate result data, and each stream type computation unit server cluster synchronizes the intermediate state data and the intermediate result data to the center storage cluster of each stream type computation center server cluster; the method further comprises the following steps:
the control server stores the execution state and the configuration information of each streaming computing task into a control database; the execution state is used to represent: each stream type calculation task is executed on the corresponding stream type calculation center server cluster or the stream type calculation unit server cluster; the configuration information is used for representing: the corresponding relation between each streaming computing task and a streaming computing center server cluster executing the streaming computing task, or the corresponding relation between each streaming computing task and a streaming computing unit server cluster executing the streaming computing task;
correspondingly, the allocating the tasks that are not executed in the streaming computing tasks to the streaming computing center server cluster with the smallest current load includes:
the control server calculates tasks which are not completely executed in the streaming calculation tasks according to the execution state and the configuration information stored in the control database;
and the control server distributes the unexecuted tasks to the streaming computing center server cluster with the minimum current load.
The present application further provides a method for executing a streaming computing task, which is applied to any current streaming computing center server cluster in a streaming computing system, where preset computing resources are reserved, and the streaming computing system includes: the system comprises a streaming computation center server cluster, a streaming computation unit server cluster and a control server; the streaming computation center server cluster is provided with center storage clusters, intermediate state data and intermediate result data are synchronized among the center storage clusters, and unit storage clusters of the streaming computation unit server clusters synchronize the intermediate state data and the intermediate result data to the center storage clusters; the method comprises the following steps:
responding to tasks which are not executed in the redistributed streaming computing tasks when the control server is in an abnormal condition in other streaming computing center server clusters or streaming computing unit server clusters in the streaming computing system, wherein the current streaming computing center server cluster acquires intermediate state data and intermediate result data required by executing the tasks which are not executed from a center storage cluster;
and the current streaming computing center server cluster executes the tasks which are not executed completely by utilizing the preset computing resources, the intermediate state data and the intermediate result data.
Wherein, the method also comprises:
responding to the periodic heartbeat message sent by the control server, and periodically feeding back a heartbeat response to the control server by the current streaming computing center server cluster; the heartbeat message is used for detecting whether communication can be carried out between the control server and the current streaming computation center server cluster.
Wherein, the method also comprises:
and the current streaming computation center server cluster detects whether the continuous times of heartbeat response failure feedback to the control server exceed a preset time threshold value, and if so, the current streaming computation center server cluster stops the execution of the unexecuted task.
The application also provides a control server, wherein the control server is connected with the streaming computation center server cluster and the streaming computation unit server cluster, and computation resources with preset proportions are reserved in the streaming computation center server cluster; the control server includes:
a first allocation unit, configured to, in response to receiving a streaming computation task, allocate the streaming computation task to a target streaming computation center server cluster or a target streaming computation unit server cluster;
a judging unit, configured to judge whether the target streaming computing center server cluster or the target streaming computing unit server cluster is abnormal or not in a process that the target streaming computing center server cluster or the target streaming computing unit server cluster executes the streaming computing task;
and the second distribution unit is used for distributing the tasks which are not executed in the streaming computing tasks to the candidate streaming computing center server cluster under the condition that the result of the judgment unit is yes.
Wherein, this control server still includes:
a sending unit, configured to periodically send heartbeat messages to the streaming computation center server cluster and the streaming computation unit server cluster, respectively, where the heartbeat messages are used to: detecting whether communication is possible between the control server and the cluster of streaming computing center servers, and whether communication is possible between the control server and the cluster of streaming computing unit servers;
correspondingly, the determining unit is specifically configured to: and judging whether the target streaming computation center server cluster or the target streaming computation unit server cluster does not feed back the heartbeat response within the preset feedback time.
Wherein the second dispensing unit comprises:
the acquisition load sub-unit is used for acquiring the load conditions of the streaming computation center server cluster and the streaming computation unit server cluster in real time;
and the first allocating subunit is used for allocating tasks which are not executed in the streaming computing tasks to the streaming computing center server cluster with the minimum current load according to the load condition of each streaming computing center server cluster.
The stream type computation center server cluster is provided with a center storage cluster, center storage clusters among the stream type computation center server clusters synchronize intermediate state data and intermediate result data, and each stream type computation unit server cluster synchronizes the intermediate state data and the intermediate result data to the center storage cluster of each stream type computation center server cluster; the server further comprises:
the storage unit is used for storing the execution state and the configuration information of each streaming computing task into the control database; the execution state is used to represent: each stream type calculation task is executed on the corresponding stream type calculation center server cluster or the stream type calculation unit server cluster; the configuration information is used for representing: the corresponding relation between each streaming computing task and a streaming computing center server cluster executing the streaming computing task, or the corresponding relation between each streaming computing task and a streaming computing unit server cluster executing the streaming computing task;
the first allocation subunit comprising:
the computation subunit is used for computing tasks which are not executed in the streaming computation tasks according to the execution state and the configuration information stored in the control database;
and the second distribution subunit is used for distributing the tasks which are not completely executed to the streaming computing center server cluster with the minimum current load.
The application also provides a streaming computation center server cluster, preset computation resources are reserved in the streaming computation center server cluster, the streaming computation center server cluster is connected with a control server, and the control server is also connected with a streaming computation unit server cluster; the streaming computation center server cluster is provided with a center storage cluster, intermediate state data and intermediate result data are synchronized among the center storage clusters, and the unit storage cluster of the streaming computation unit server cluster synchronizes the intermediate state data and the intermediate result data to the center storage cluster; the method comprises the following steps:
the data acquisition unit is used for responding to tasks which are not executed in the redistributed streaming computing tasks when the control server is abnormal in other streaming computing center server clusters or streaming computing unit server clusters in the streaming computing system, and acquiring intermediate state data and intermediate result data required by executing the tasks which are not executed from a center storage cluster;
and the task execution unit is used for executing the tasks which are not completely executed by utilizing the preset computing resources, the intermediate state data and the intermediate result data.
Wherein, the cluster of streaming computing center servers further comprises:
the feedback unit is used for responding to the heartbeat messages sent by the control server periodically and feeding back heartbeat responses to the control server periodically; the heartbeat message is used for detecting whether communication can be carried out between the control server and the current streaming computation center server cluster.
Wherein the streaming computing center service cluster further comprises:
the detection unit is used for detecting whether the continuous times of the heartbeat response failure sent to the control server exceed a preset time threshold value;
and a stopping unit, configured to stop execution of the unexecuted task if a result of the detecting unit is yes.
The present application also provides a streaming computing system, comprising: the system comprises a streaming computing center server cluster, a streaming computing unit server cluster and a control server; and the number of the first and second groups,
a central storage cluster corresponding to the streaming computing central server cluster, a control database corresponding to the control server, and a unit storage cluster corresponding to the streaming computing unit server cluster.
The application also provides a remote multi-live system, which comprises: the system comprises a first stream type computing center server cluster, a plurality of stream type computing unit server clusters and a control server; the first streaming computing center server cluster is the streaming computing center server cluster, and the control server is the control server; and the number of the first and second groups,
the plurality of streaming computing unit server clusters are respectively deployed at a plurality of second geographic positions correspondingly; the first cluster of streaming computing center servers is deployed at a first geographic location, and the second geographic location is a different geographic location than the first geographic location. Wherein, the allopatric multi-active system further comprises: a second cluster of streaming computing hub servers deployed at a different first geographic location than the first cluster of streaming computing hub servers.
The application also provides a remote multi-live system, comprising:
the first flow type computation center server is at least used for providing computation resources to the outside, wherein the first flow type computation center server comprises a first center storage unit;
the second streaming computing center server is at least used for providing computing resources to the outside, and comprises a second center storage unit;
the first streaming computing center server and the second streaming computing center server finish load balancing based on a uniform load balancing strategy, and the first center storage unit and the second center storage unit are mutually hot-standby;
wherein, for a first streaming computing task running on the first streaming computing center server, when the first streaming computing center server fails to provide computing resources to the outside, the running on the first streaming computing center server is terminated, and the first streaming computing task continues to run on a second streaming computing center server based on intermediate state data and intermediate result data of a second center storage unit of the second streaming computing center server.
Compared with the prior art, the method has the following advantages:
in the embodiment of the application, tasks executed by streaming computation center server clusters and streaming computation unit server clusters deployed in multiple places are uniformly distributed through one control server, so that uniform scheduling and distribution of streaming computation tasks are realized, and functions of simultaneously computing each part of the same streaming computation task or different streaming computation tasks of the same streaming computation task by using the streaming computation center server clusters or the streaming computation unit server clusters deployed in the multiple places are realized in a manner of synchronizing data among the center storage clusters in real time. By adopting the embodiment of the application, when the streaming computation center server cluster or the streaming computation unit server cluster in one place is abnormal, the executing streaming computation task can be quickly recovered from the streaming computation center server cluster in another place, so that not only can the system resources not be vacant at ordinary times, but also the different places and multiple activities of the streaming computation task are ensured, namely, the streaming computation task can be quickly recovered in another place under the condition that the local abnormality occurs, and the high availability of the streaming computation service is achieved.
Of course, it is not necessary for any product to achieve all of the above-described advantages at the same time for the practice of the present application.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive labor.
FIG. 1 is a diagram of the scene architecture after the application is in practice;
FIG. 2 is a flow chart of an embodiment of a method for distributing streaming computing tasks of the present application;
FIG. 3 is a flow diagram of a method embodiment of execution of a streaming computing task of the present application;
FIG. 4 is a method flow diagram of a specific example of the present application;
FIG. 5 is a block diagram of an embodiment of a control server of the present application;
fig. 6 is a block diagram of a streaming computing center server cluster embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
To facilitate further understanding of the technical terms used in the present application by those skilled in the art, the technical terms are explained and presented below.
Server clustering means that one or more servers are collected together to perform the same service, and appears to a client as if there is only one server. The server cluster can utilize a plurality of computers to perform parallel computation so as to obtain high computation speed, and also can use a plurality of computers to perform backup so as to ensure that any one computer breaks down the whole server cluster or can normally run.
The streaming computing center server cluster refers to a server cluster for executing a streaming computing task, and the server cluster needs to reserve preset computing resources and store intermediate result data and intermediate state data generated in the process of executing the streaming computing task into a center storage cluster.
The streaming computing unit server cluster also refers to a server cluster for executing a streaming computing task, and stores intermediate result data and intermediate state data generated in the process of executing the streaming computing task into the unit storage cluster, except that preset computing resources may not be reserved in the server clusters.
The storage cluster is a storage pool which can provide a uniform access interface and a management interface for the server cluster by aggregating the storage space in one or more storage devices, and the server cluster can transparently access and utilize the disks on all the storage devices through the uniform access interface, so that the storage cluster can fully exert the performance and the utilization rate of the disks of the storage devices.
The central storage cluster is used for providing storage space for the streaming computing central server cluster; the unit storage cluster is used for providing storage space for the stream computing unit server cluster.
Referring to fig. 1, a scene architecture diagram of a practical application of the distribution method for streaming computing tasks in the present application is shown. In a streaming computing system shown in fig. 1, one control server 101, m streaming computing center server clusters 102, and n streaming computing unit server clusters 103 may be configured. Wherein m and n are integers greater than 1, respectively. Preferably, two streaming center server clusters 102 are configured. Based on that, when one of the streaming computing center server clusters 102 or the streaming computing unit server cluster 103 in the streaming computing system is abnormal, the control server 101 may detect the abnormality and reallocate the task that the abnormal streaming computing center server cluster 102 or the streaming computing unit server cluster 103 has not completed execution to another normal candidate streaming computing center server cluster 102 for execution. It should be noted that, because each of the streaming calculation unit server clusters 103 does not reserve computing resources, the control server 101 selects only the normal streaming calculation center server cluster 102 and does not select the streaming calculation unit server cluster 103 as a candidate streaming calculation center server cluster when redistributing tasks that have not been executed.
Furthermore, in fig. 1, in order to ensure that the streaming computation task can be synchronously executed when switching between different streaming computation center server clusters 102 or from the streaming computation unit server cluster 103 to the streaming computation center server cluster 102, synchronization of the intermediate state data and the intermediate result data needs to be performed between the center storage clusters 104 connected to each streaming computation center server cluster 102, that is, the intermediate state data and the intermediate result data are synchronized in real time between the center storage clusters 104. The unit storage clusters 105 connected to the streaming computing unit server cluster 103 need to synchronize the intermediate state data and the intermediate result data to the central storage clusters 104, and may synchronize only to the central storage clusters 104 without synchronizing between the unit storage clusters, so that resources consumed when the intermediate state data and the intermediate result data are synchronized between the unit storage clusters 105 are reduced. The control server 101 is also connected to a control database, which may store configuration information of the control server 101 at the time of assigning a task and an execution state generated at the time of executing the task. The execution state may indicate an executed part that has been executed when each streaming computing task is executed on the corresponding streaming computing center server cluster or streaming computing unit server cluster; the configuration information may indicate: and the corresponding relation between each streaming computing task and the streaming computing center server cluster executing the streaming computing task, or the corresponding relation between each streaming computing task and the streaming computing unit server cluster executing the streaming computing task.
It is to be understood that each of the streaming central server clusters 102 may be deployed in the same first geographic location, and preferably, may also be deployed in different first geographic locations. The first geographic location may be a city, including a direct prefecture city, a provincial meeting city, a prefecture city, a county level city, etc., such as beijing, hangzhou, nanjing, etc. For example, one streaming computing center server is deployed in hangzhou, and another six pieces of center servers are also deployed in hangzhou, or one streaming computing center server cluster is deployed in hangzhou, and another streaming computing center server cluster is deployed in a geographic location different from hangzhou, such as Nanjing or Shanghai. Each streaming compute unit server cluster 103 may also be deployed in a different second geographic location, including a prefecture city, a provincial city, a prefecture city, a county city, etc., e.g., suzhou, xiamen, shenzhen, etc. Wherein the first geographic location is used to represent a geographic location at which the cluster of streaming computing center servers 102 is deployed and the second geographic location is used to represent a geographic location at which the cluster of streaming computing unit servers is deployed. In practical applications, no matter which different geographical locations each of the streaming computing center server cluster and the streaming computing unit server cluster are deployed in, the control server 101 allocates streaming computing tasks to them.
After the application scenario is introduced, referring to fig. 2, a flow of an embodiment of a method for performing streaming computation task allocation based on the application scenario shown in fig. 1 is shown, where the embodiment is applied to the control server in fig. 1, and the embodiment may include the following steps:
step 201: and the control server periodically sends heartbeat messages to the streaming computation center server cluster and the streaming computation unit server cluster respectively.
In this embodiment, the control server is connected to each of the streaming computation center server clusters and each of the streaming computation unit server clusters, and a heartbeat message feedback mechanism is established between the control server and each of the streaming computation center server clusters and between the control server and each of the streaming computation unit server clusters. Based on this, the control server periodically sends heartbeat messages to each streaming computation center server cluster and each streaming computation unit server cluster, where the heartbeat messages are used to detect whether normal communication can be performed between the control server and the streaming computation center server cluster, and detect whether normal communication can be performed between the control server and the streaming computation unit server cluster. Whether the heartbeat response is fed back normally by each streaming computation center server cluster and each streaming computation unit server cluster can confirm whether each streaming computation center server cluster and each streaming computation unit server cluster can normally communicate, if the normal communication cannot be realized, the streaming computation center server cluster or the streaming computation unit server cluster is in an abnormal condition under a normal condition, and a task can not be normally executed.
Specifically, if the control server can normally receive the heartbeat response fed back by each streaming computation center server cluster or streaming computation element server cluster, the streaming computation center server cluster and the streaming computation element server cluster are considered to be able to normally communicate with the control server, that is, no abnormal condition occurs, otherwise, the streaming computation center server cluster and the streaming computation element server cluster are considered to be unable to normally communicate with the control server, that is, an abnormal condition occurs. The period for sending the heartbeat message may be a heartbeat duration, for example, 1 second. Of course, the heartbeat duration can be set autonomously by a person skilled in the art.
Step 202: in response to receiving the streaming computing task, the control server allocates the streaming computing task to a target streaming computing center server cluster or a target streaming computing unit server cluster.
In practical application, the control server may be controlled by a system administrator, the control server may provide a human-computer interaction interface, the system administrator inputs a task instruction, and sends the streaming computing task to a streaming computing center server cluster or a streaming computing center (i.e., a target streaming computing center server cluster or a target streaming computing unit server cluster) designated by the system administrator according to the task instruction input by the system administrator. Of course, in practical applications, other manners may also be adopted to determine the target streaming computing center server cluster or the target streaming computing unit server cluster, for example, the control server randomly determines one streaming computing center server cluster as the target streaming computing center server cluster according to a round-robin manner, or randomly determines one streaming computing unit server cluster as the target streaming computing unit server cluster.
Between step 202 and step 204, optionally, step 203 may also be performed:
step 203: and the control server stores the execution state and the configuration information of each streaming computing task into a control database.
In this embodiment, optionally, after the streaming computing tasks are allocated, the control server may store the configuration information of each streaming computing task in a control database connected to the control server, for example, a correspondence between each streaming computing task and a streaming computing center server cluster executing the streaming computing task, or a correspondence between each streaming computing task and a streaming computing unit server cluster executing the streaming computing task. Furthermore, the control server may further store an execution state of each streaming computation task on the streaming computation center server cluster or the streaming computation unit server cluster in the control database, where the execution state may represent: each streaming computing task has already executed a completed executed portion when executed on a corresponding streaming computing center server cluster or streaming computing unit server cluster.
Step 204: in the process that the target streaming computation center server cluster or the target streaming computation unit server cluster executes the streaming computation task, whether the target streaming computation center server cluster or the target streaming computation unit server cluster is abnormal or not is judged, if so, the step 205 is executed, and if not, the step is continuously executed for judgment.
After the control server distributes the streaming computation task, in the process that the target streaming computation center server cluster or the target streaming computation unit server cluster executes the streaming computation task, the control server detects whether the connection between the control server and the target streaming computation center server cluster or the target streaming computation unit server cluster is normal in real time, and if the connection is normal, the control server indicates that the target streaming computation center server cluster or the target streaming computation unit server cluster has no abnormal condition. If the connection is not normal, for example, the control server does not receive the heartbeat response fed back by the target streaming computing center server cluster or the target streaming computing unit server cluster within the preset feedback time, it indicates that the connection is not normal, in this case, an abnormal condition may occur in the target streaming computing center server cluster or the target streaming computing unit server cluster.
It will be appreciated that if the target cluster of streaming compute unit servers includes only one streaming compute unit server, then an exception to that streaming compute unit server will be required to proceed to step 205; in the case that the target stream-oriented computing unit server cluster includes a plurality of stream-oriented computing unit servers, the connection between the control server and the target stream-oriented computing unit server cluster is broken only when all the stream-oriented computing unit servers of the target stream-oriented computing unit server cluster are abnormal, and it is determined in this step that the abnormal condition occurs in the whole stream-oriented computing center unit server cluster. For example, in practical applications, a power failure or a fire occurs in a computer room where the target streaming computing unit server cluster is located. In practice, there is a possibility that only a part of the stream-oriented computing unit servers in the target stream-oriented computing unit server cluster are abnormal, for example, the stream-oriented computing unit servers are down, and in this case, the unexecuted part of the tasks being executed on the abnormal stream-oriented computing unit servers will be switched to other normal stream-oriented computing unit servers, so that the tasks executed by the whole stream-oriented computing unit server cluster can be executed smoothly, and the stream-oriented computing unit server cluster is ensured to be in a normal operating state as a whole.
Of course, the control server may determine whether the target streaming computing center server cluster or the target streaming computing unit server cluster is abnormal by determining whether the heartbeat response can be received within the preset feedback time after the heartbeat message is sent in step 201, for example, if the heartbeat response fed back by the target streaming computing center server cluster or the target streaming computing unit server cluster is not received within one continuous minute, it is determined that the target streaming computing center server cluster or the target streaming computing unit server cluster is abnormal, and then the process may enter step 205; if the heartbeat response fed back by the target streaming computation center server cluster or the target streaming computation unit server cluster is received within one minute, it is determined that the target streaming computation center server cluster or the target streaming computation unit server cluster is not abnormal, and the step 204 may be continuously executed to perform real-time judgment.
It can be understood that, in the case that one of the streaming computing center server clusters or the streaming computing unit server cluster is abnormal, the control server may give an alarm or the like to a system administrator, and the system administrator may perform a repair operation or the like when determining that an abnormal condition, for example, a network outage or a power outage, does occur in a certain streaming computing center server cluster or streaming computing unit server cluster. After the abnormal streaming computation center server cluster or streaming computation unit server cluster is successfully repaired, the abnormal streaming computation center server cluster or streaming computation unit server cluster can be used as a normal streaming computation center server cluster or streaming computation unit server cluster to distribute streaming computation tasks for the normal streaming computation center server cluster or streaming computation unit server cluster.
Step 205: and distributing the tasks which are not executed in the streaming computing tasks to candidate streaming computing center server clusters.
In this step, the tasks that are not executed may be: the remaining tasks of the streaming computing tasks except for the target streaming computing center server cluster or the target streaming computing unit server cluster already performing the tasks.
Specifically, in order to ensure that tasks that are not executed in the streaming computing tasks can be executed quickly, the tasks that are not executed can be allocated to the streaming computing center server cluster with the smallest current load to be executed continuously. Accordingly, step 205 may include:
step A1: and the control server acquires the load conditions of the plurality of streaming computing center server clusters in real time.
In step a1, the control server may obtain the load conditions of each streaming computation center server cluster and each streaming computation unit server cluster in real time. The load condition may be a utilization rate of the CPU, a memory reading speed, a disk input/output I/O performance, and other hardware parameter values, and the load conditions of each streaming computation center server cluster and the streaming computation unit server cluster may be determined by the hardware parameter values, so that when a task needs to be redistributed in the future, the task may be distributed to the streaming computation center server cluster or the streaming computation unit server cluster with a smaller load.
It will be appreciated that in practical applications, the streaming computing unit server cluster does not need to reserve computing resources, whereas the streaming computing center server cluster needs to reserve computing resources. Assuming that the number of the streaming computing center server clusters is N, where N is an integer greater than 1, the reserved computing resources may be "N × 10%", so that it can be ensured as much as possible that when an abnormal condition occurs in another streaming computing center server cluster or a streaming computing unit server cluster, a normal streaming computing center server cluster has enough computing resources to execute a task reallocated by the control server. The computing resource may be a hardware resource such as a CPU, a memory, and a disk. For example, when performing tasks assigned by the control server, the streaming computing center server cluster may always have 20% of the computing resources free, and this free 20% of the computing resources may be used to perform tasks that are not performed on other streaming computing center server clusters or streaming computing unit server clusters.
Step A2: and the control server distributes the tasks which are not executed in the streaming computing tasks to the streaming computing center server cluster with the minimum current load.
The control server allocates the tasks that have not been executed to the streaming computation center server cluster with the smallest current load, which is determined according to the load condition of each streaming computation center server cluster in step a 1.
Specifically, according to the execution status and the configuration information in step 203, step a2 may include:
step A21: and the control server calculates tasks which are not completely executed in the streaming calculation tasks according to the execution state and the configuration information stored in the control database.
When a certain target streaming computing center server cluster or a target streaming computing unit server cluster is abnormal, the control server can determine the streaming computing task which is being executed according to the configuration information, and can determine the executed part of the streaming computing task according to the execution state, so as to calculate the task which is not executed in the streaming computing task.
Step A22: and the control server distributes the unexecuted tasks to the streaming computing center server cluster with the minimum current load.
The control server then redistributes the tasks which are not executed to the streaming computing center server cluster with the smallest current load for execution.
It will be appreciated that after the unexecuted tasks are redistributed in step 205, the control server can return to step 202 to then distribute the currently received streaming computing task.
In this embodiment, a control server is used to uniformly distribute streaming computation tasks executed by streaming computation center server clusters and streaming computation unit server clusters deployed in multiple locations, so as to implement uniform scheduling and distribution of the streaming computation tasks, and a manner of synchronizing data among the center storage clusters in real time is used to implement functions of simultaneously computing different parts of the same streaming computation task or different streaming computation tasks by the streaming computation center server clusters or the streaming computation unit server clusters deployed in multiple locations High availability of computing services.
Referring to fig. 3, a flowchart illustrating an embodiment of a method for executing a streaming computing task according to the present application is shown, where the method is applied to any one of the current streaming computing center server clusters shown in fig. 1, and the streaming computing system may include: the system comprises a plurality of streaming computing center server clusters, a plurality of streaming computing unit server clusters and a control server; the stream type computation center server cluster is provided with a center storage cluster, the center storage clusters among the stream type computation center server clusters synchronize intermediate state data and intermediate result data, and each stream type computation unit server cluster synchronizes the intermediate state data and the intermediate result data to the center storage cluster of each stream type computation center server cluster. Specifically, the present embodiment may include:
step 301: responding to tasks which are not executed in the redistributed streaming computing tasks when the control server is abnormal in other streaming computing center server clusters or streaming computing unit server clusters in the streaming computing system, wherein the current streaming computing center server cluster acquires intermediate state data and intermediate result data which are required by executing the tasks which are not executed from the connected center storage clusters.
In this embodiment, assuming that the control server detects that an abnormality occurs in another streaming computation center server cluster or a streaming computation unit server cluster, the streaming computation center server cluster is reallocated for the task that is being executed by the abnormal streaming computation center server cluster or the streaming computation unit server cluster according to the embodiment shown in fig. 2. In this case, the current streaming computing center server cluster acquires intermediate state data and intermediate result data required to execute an unexecuted task from the connected storage clusters. Wherein, the intermediate state data may be: the abnormal streaming computation center server cluster or the streaming computation unit server cluster executes the task state generated by the streaming computation task before the abnormal situation occurs, for example, which parts of the streaming computation task have been executed; and the intermediate result data may be: result data generated by the portion of the task that has been executed, and so on. Based on this, the current streaming computing center server cluster can execute the part of the task that has not been executed according to the intermediate state data and the intermediate result data without repeatedly executing the part of the streaming computing task that has been executed.
Step 302: and the current streaming computing center server cluster executes the tasks which are not executed by utilizing the intermediate state data and the intermediate result data.
The current cluster of streaming computing center servers then refers to the intermediate state data and the intermediate result data to execute the re-allocated unexecuted task.
After step 302, the method may further include:
step 303: responding to the control server to send heartbeat messages periodically, and the current streaming computing center server cluster feeds back heartbeat responses to the control server periodically.
Under the condition that a heartbeat mechanism is established between a control server and a streaming computation center server cluster, if the control server periodically sends a heartbeat message to the current streaming computation center server cluster, and the heartbeat message is used for detecting whether the control server and the current streaming computation center server cluster can communicate or not, the current streaming computation center server cluster can periodically feed back a heartbeat response to the control server.
After step 303, the method may further include:
step 304: and the current streaming computation center server cluster detects whether the continuous times of heartbeat response failure fed back to the control server exceed a preset time threshold value, and if so, the current streaming computation center server cluster stops the execution of the streaming computation task.
The current streaming computing center server cluster may also detect whether a heartbeat mechanism between the current streaming computing center server cluster and the control server is normal in real time, for example, detect whether a consecutive number of times of failure in heartbeat response fed back to the control server exceeds a preset number threshold, for example, whether the heartbeat response failure is fed back to the control server for 10 consecutive times, if so, the current streaming computing center server cluster is abnormal, and may stop the execution of the streaming computing task. If not, it indicates that the current streaming computing center server cluster is normal, step 303 may be continuously performed, and then a heartbeat response is periodically fed back to the control server.
It can be seen that, in the embodiment of the present application, tasks executed by streaming computing center server clusters and streaming computing unit server clusters deployed in multiple locations are uniformly distributed through one control server, so as to implement uniform scheduling and distribution of streaming computing tasks, and a manner of synchronizing data among the central storage clusters in real time is utilized to implement a function of simultaneously computing different parts of the same streaming computing task or different streaming computing tasks by the streaming computing center server clusters or the streaming computing unit server clusters deployed in multiple locations.
In order to facilitate the understanding of the implementation process of the present application for a person skilled in the art, a specific example is given below to illustrate the implementation of the present application in detail, and the example may include the following steps:
step 401: the control server sends heartbeat messages to streaming compute hub server clusters 1 and 2, and streaming compute unit server clusters 1 and 2.
In this example, assuming that there are two streaming computing center server clusters, including the streaming computing center server cluster 1 and the streaming computing center server cluster 2, and there are two streaming computing unit server clusters, including the streaming computing unit server cluster 1 and the streaming computing unit server cluster 2, the control server and each streaming computing center server cluster or each streaming computing unit server cluster all send heartbeat messages with a heartbeat duration of 1 second. The streaming computing center server clusters 1 and 2 may be deployed in different places in the hangzhou city, or may be deployed in different cities, where the streaming computing unit server cluster 1 is deployed in the hangzhou city, and the streaming computing unit server cluster 2 is deployed in the Nanjing city.
Step 402: the streaming computation center server clusters 1 and 2 and the streaming computation unit server clusters 1 and 2 respectively feed back heartbeat responses to the control server.
Step 403: the control server distributes the streaming computing task to the streaming computing unit server cluster 1 for execution.
The system administrator triggers a streaming computing task to the control server, for example, counts the transaction amount of the Hangzhou city in 2016, 8 and 15, and distributes the streaming computing task to the streaming computing unit server cluster 1 deployed in the Hangzhou city for execution. The control server allocates the task of counting the transaction amount to the streaming computing unit server cluster 1 according to the instruction of the system administrator and triggers the streaming computing unit server cluster 1 to start counting the transaction amount. In this example, the streaming computing center server cluster 1 has its own center storage cluster 1, the streaming computing center server cluster 2 has its own center storage cluster 2, the streaming computing unit server cluster 1 has its own unit storage cluster 1, and the streaming computing unit server cluster 2 has its own unit storage cluster 2. In practical application, the unit storage clusters 1 and 2 do not need to synchronize intermediate state data and intermediate result data, only the respective intermediate state data and intermediate result data need to be synchronized to the central storage clusters 1 and 2 respectively, and the central storage clusters 1 and 2 also need to synchronize the intermediate state data and the intermediate result data.
Specifically, in the process of performing the transaction amount statistics, the cluster of streaming computing unit servers 1 may acquire source data required for the transaction amount statistics from a data source, for example, order information with an IP address in the state of hangzhou, and count the transaction amount according to the source data. The local data sources of all the places can be synchronized to the central data source corresponding to the streaming computing central server cluster, and the streaming computing central server cluster and the streaming computing unit server clusters of all the places can pull source data from the central data source.
Step 404: in the process of executing the streaming computing task by the streaming computing unit server cluster 1, the unit storage cluster 1 connected with the streaming computing unit server cluster 1 synchronizes the intermediate state and the intermediate result data generated in the executing process to the central storage cluster 1 and the central storage cluster 2, and meanwhile, the control server stores the executing state and the configuration information of the streaming computing task into the control database.
In the process of the streaming computing unit server cluster 1 executing the task, the streaming computing unit server cluster 1 stores the intermediate state data and the intermediate result data generated in real time to the unit storage cluster 1, and the unit storage cluster 1 synchronizes the generated intermediate state data and the intermediate result data to the central storage cluster 1 and the central storage cluster 2 in real time. Meanwhile, the control server may obtain the execution state of the task in real time, and store the execution state and the configuration information for allocating the streaming computation task to the streaming computation unit server cluster 1 for execution in the control database. For example, the execution state may indicate that, at a certain current time, the cluster of streaming computing unit servers acquires 10000 pieces of source data information, has counted 4000 pieces of source data information, has not counted the other 6000 pieces of source data, and the like. Of course, the execution state may also be represented in other ways.
Step 405: the streaming calculation unit server cluster 1 detects whether the number of consecutive times of heartbeat response failure feedback to the control server exceeds a preset number threshold, if so, the streaming calculation unit server cluster stops the execution of the streaming calculation task, and if not, step 405 is executed.
In the process that the streaming computing unit server cluster 1 executes the task, whether heartbeat response fed back to the control server by the streaming computing unit server cluster 1 fails or not is detected in real time, if the heartbeat response fails, the number of continuous failures is counted, and if the number of continuous failures exceeds a preset number threshold, for example, 10 times, it indicates that the connection between the streaming computing unit server cluster 1 and the control server cannot normally communicate, in this case, it may be that an abnormal condition such as network failure or power failure of the streaming computing unit server cluster 1 occurs, and the streaming computing unit server cluster 1 exits the flow of counting the transaction amount.
Step 406: the control server judges whether the streaming computing unit server cluster 1 feeds back the heartbeat response within the preset feedback time, if not, the step 407 is entered, and if yes, the step 406 is continuously executed.
The control server may also determine in real time whether the streaming computing unit server cluster 1 feeds back a heartbeat response within a preset feedback time, for example, 1 minute, and if the heartbeat response fed back by the streaming computing unit server cluster 1 is not received, it indicates that the streaming computing unit server cluster cannot normally execute the task, otherwise, the control server continues to monitor the heartbeat response and execute the step.
Step 407: and the control server acquires the load condition of each streaming computation center server cluster in real time and determines the tasks of the streaming computation tasks which are not completely executed according to the execution state and the configuration information.
The control server may also obtain the load conditions of the streaming computation center server clusters 1 and 2 in real time, so as to determine that the load of the streaming computation center server cluster 1 is that the CPU utilization rate is 40%, and the load of the streaming computation center server cluster 2 is that the CPU utilization rate is 60%, in which case, the load of the streaming computation center server cluster 1 is smaller. Meanwhile, the control server also determines that the task of counting the transaction amount is executed by 40% according to the execution state and the configuration information stored in the control database, and 6000 pieces of source data are not counted.
Step 408: and the control server distributes the tasks which are not executed to the streaming computing center server cluster with the minimum current load for execution.
Step 409: the streaming computation center server cluster 1 continues to execute tasks that are not executed according to the intermediate state data and the intermediate result data synchronized in the center storage cluster 1.
The control server allocates the remaining 60% of tasks that have not been executed to the streaming computing center server cluster 1 for execution, and because the intermediate state data and the central result data stored in the central storage cluster 1 are synchronized by the unit storage clusters 1 and 2 in real time, the streaming computing center server cluster 1 can directly obtain the intermediate state data and the intermediate result data of the task of the statistical transaction amount from the central storage cluster 1, and then continue to execute the remaining 60% of tasks according to the intermediate state data and the intermediate result data, without repeatedly executing the executed 40% of tasks.
For simplicity of explanation, the foregoing method embodiments are described as a series of acts or combinations, but those skilled in the art will appreciate that the present application is not limited by the order of acts, as some steps may occur in other orders or concurrently with other steps based on the disclosure herein. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required in this application.
Corresponding to the method provided by the foregoing embodiment of the method for allocating a streaming computation task in the present application, referring to fig. 5, the present application further provides an embodiment of a control server, where the control server is respectively connected to a plurality of streaming computation center server clusters and a plurality of streaming computation unit server clusters, and a computation resource with a preset proportion is reserved in the streaming computation center server clusters; in this embodiment, the control server may include:
a first allocating unit 501, configured to, in response to receiving a streaming computation task, allocate the streaming computation task to a target streaming computation center server cluster or a target streaming computation unit server cluster.
A determining unit 502, configured to determine whether an abnormal condition occurs in the target streaming computing center server cluster or the target streaming computing unit server cluster in a process that the target streaming computing center server cluster or the target streaming computing unit server cluster executes the streaming computing task.
A second allocating unit 503, configured to allocate unexecuted tasks of the streaming computing tasks to candidate streaming computing center server clusters; the tasks which are not completely executed are as follows: the remaining tasks of the streaming computing tasks except for the target streaming computing center server cluster or the target streaming computing unit server cluster already performing the tasks.
The second allocating unit 503 may specifically include:
the acquisition load sub-unit is used for acquiring the load conditions of the plurality of streaming computing center server clusters and the plurality of streaming computing unit server clusters in real time;
and the first allocating subunit is used for allocating tasks which are not executed in the streaming computing tasks to the streaming computing center server cluster with the minimum current load according to the load condition of each streaming computing center server cluster.
Wherein, the control server may further include:
a sending unit, configured to periodically send heartbeat messages to the streaming computation center server cluster and the streaming computation unit server cluster, respectively, where the heartbeat messages are used to: detecting whether communication is possible between the control server and the cluster of streaming computing center servers, and whether communication is possible between the control server and the cluster of streaming computing unit servers;
correspondingly, the determining unit 502 is specifically configured to: and judging whether the target streaming computation center server cluster or the target streaming computation unit server cluster does not feed back the heartbeat response within the preset feedback time.
The streaming computing center server cluster is provided with storage clusters, intermediate state data and intermediate result data are synchronized among the storage clusters among the streaming computing center server clusters, and the streaming computing unit server clusters synchronize the intermediate state data and the intermediate result data to the central storage clusters; the server may further include:
the storage unit is used for storing the execution state and the configuration information of each streaming computing task into the control database; the execution state is used to represent: each stream type calculation task is executed on the corresponding stream type calculation center server cluster or the stream type calculation unit server cluster; the configuration information is used for representing: the corresponding relation between each streaming computing task and a streaming computing center server cluster executing the streaming computing task, or the corresponding relation between each streaming computing task and a streaming computing unit server cluster executing the streaming computing task;
correspondingly, the first allocating subunit may specifically include:
the computation subunit is used for computing tasks which are not executed in the streaming computation tasks according to the execution state and the configuration information stored in the control database;
and the second distribution subunit is used for distributing the tasks which are not completely executed to the streaming computing center server cluster with the minimum current load.
The control server of this embodiment can uniformly distribute the tasks executed by the streaming computation center server clusters and the streaming computation unit server clusters deployed in multiple places, so as to implement uniform scheduling and distribution of the streaming computation tasks, and realizes the function of simultaneously calculating different parts of the same stream type calculation task or different stream type calculation tasks of the stream type calculation center server cluster or the stream type calculation unit server cluster which are deployed in multiple places by utilizing the mode of synchronizing data among the center storage clusters in real time, when an anomaly occurs in one cluster of streaming compute central servers or a cluster of streaming compute unit servers, can quickly recover the executing streaming computing task from the remote streaming computing center server cluster, therefore, system resources are not vacant at ordinary times, and the streaming computing task can be rapidly recovered under the abnormal condition, so that the high availability of the streaming computing service is achieved.
Corresponding to the method provided by the embodiment of the method for executing a streaming computing task in the present application, referring to fig. 6, the present application further provides an embodiment of a streaming computing center server cluster, in this embodiment, a plurality of streaming computing center server clusters are reserved in a streaming computing system and are provided with preset computing resources, the plurality of streaming computing center server clusters are respectively connected to a control server, and the control server is further connected to a plurality of streaming computing unit server clusters; the streaming computation center server cluster is provided with a center storage cluster, the center storage clusters of each streaming computation center server cluster synchronize intermediate state data and intermediate result data, and the unit storage cluster of each streaming computation unit server cluster synchronizes the intermediate state data and the intermediate result data to the storage cluster of each streaming computation center server cluster; the cluster of streaming computing center servers may include:
the data obtaining unit 601 is configured to, in response to a task that is not executed in the redistributed streaming computing tasks when the control server is in an abnormal condition in another streaming computing center server cluster or a streaming computing unit server cluster in the streaming computing system, obtain, from the central storage cluster, intermediate state data and intermediate result data that are required to execute the task that is not executed.
And a task execution unit 602, configured to execute the unexecuted task by using the preset computing resource, the intermediate state data, and the intermediate result data.
Wherein, the cluster of streaming computing center servers may further include:
the feedback unit is used for responding to the heartbeat messages sent by the control server periodically and feeding back heartbeat responses to the control server periodically; the heartbeat message is used for detecting whether communication can be carried out between the control server and the current streaming computation center server cluster.
Wherein, the cluster of streaming computing center servers may further include:
the detection unit is used for detecting whether the continuous times of the heartbeat response failure sent to the control server exceed a preset time threshold value; and a stopping unit configured to stop execution of the unexecuted task if a result of the detecting unit is yes.
The streaming computing center server cluster behind this embodiment can receive the streaming computing tasks uniformly distributed by the control server to execute, and utilize a way of synchronizing data among the center storage clusters in real time, to implement a function of simultaneously computing different parts of the same streaming computing task or different streaming computing tasks of a streaming computing center server cluster or a streaming computing unit server cluster deployed in multiple places, when one streaming computing center server cluster or streaming computing unit server cluster is abnormal, the currently executed streaming computing task can be quickly recovered from the streaming computing center server cluster in another place, so that not only is the system resources normally not vacant, but also the streaming computing task can be quickly recovered under abnormal conditions, thereby achieving high availability of the streaming computing service.
An embodiment of the present application further provides a system for distributing and executing a streaming computation task, where the system may include a control server shown in fig. 5, a plurality of streaming computation center server clusters shown in fig. 6, and a plurality of streaming computation unit server clusters, where each streaming computation center server cluster has a respective center storage cluster, each streaming computation unit server cluster has a respective unit storage cluster, and the control server has its own control database, and a structural block diagram of the system may refer to that shown in fig. 1, where unnecessary portions of the system need to be described in detail with reference to the foregoing embodiment, and details are not described here again.
The embodiment of the present application further provides a remote multi-live system, including: the system comprises a first stream type computing center server cluster, a second stream type computing center server cluster, a plurality of stream type computing unit server clusters and a control server; wherein, the first streaming computing center server cluster and the second streaming computing center server cluster are streaming computing center server clusters shown in fig. 6, and the control server may refer to fig. 5; the plurality of streaming computing unit server clusters are respectively deployed in a plurality of second geographic positions correspondingly; the first stream computing center server cluster and the second stream computing center server cluster are respectively deployed at the same or different first geographic positions.
In this embodiment, the streaming computation center server cluster and the streaming computation unit server cluster are respectively deployed in the first geographic location and the second geographic location, so when an abnormality occurs in a certain streaming computation unit server cluster, the streaming computation task that the abnormal streaming computation unit server cluster is executing can be recovered on the first or second streaming computation center server cluster in a different place, and the unexecuted part of the streaming computation task is continuously executed on the streaming computation center server cluster in a different place, so as to implement a function of multiple activities in a different place. In addition, when the first stream computing center server cluster and the second stream computing center server cluster are deployed at different first geographic locations and one of the stream computing center server clusters is abnormal, the other stream computing center server in a different place can recover the stream computing task which is executed by the abnormal stream computing unit server, and the unexecuted part is continuously executed on the other stream computing center server cluster in the different place, so that the function of multiple activities in the different place can be realized.
The application also provides a remote multi-activity system, which specifically comprises: the first flow type computation center server is at least used for providing computation resources to the outside, wherein the first flow type computation center server comprises a first center storage unit; the second streaming computing center server is at least used for providing computing resources to the outside, and comprises a second center storage unit; the first streaming computing center server and the second streaming computing center server finish load balancing based on a uniform load balancing strategy, and the first center storage unit and the second center storage unit are mutually hot-standby; wherein, for a first streaming computing task running on the first streaming computing center server, when the first streaming computing center server fails to provide computing resources to the outside, the running on the first streaming computing center server is terminated, and the first streaming computing task continues to run on a second streaming computing center server based on intermediate state data and intermediate result data of a second center storage unit of the second streaming computing center server.
It should be noted that, in the present specification, the embodiments are all described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments may be referred to each other. For the device-like embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The distribution method and the control server of the streaming computing task, the execution method of the streaming computing task, the streaming computing center server cluster, the streaming computing system, and the remote multi-active system provided by the present application are described in detail above, and a specific example is applied in the present application to explain the principle and the implementation manner of the present application, and the description of the above embodiment is only used to help understand the method and the core idea of the present application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims (13)

1. A calculation task allocation method is characterized in that the method is applied to a control server connected with a streaming calculation center server cluster and a streaming calculation unit server cluster, and the streaming calculation center server cluster is reserved with calculation resources with a preset proportion; the streaming computing unit server cluster does not reserve computing resources; the method comprises the following steps:
in response to receiving a streaming computing task, assigning the streaming computing task to a target streaming computing center server cluster or a target streaming computing unit server cluster;
and in the process that the target streaming computation center server cluster or the target streaming computation unit server cluster executes the streaming computation task, judging whether the target streaming computation center server cluster or the target streaming computation unit server cluster has an abnormal condition, and if so, distributing the task which is not executed in the streaming computation task to a candidate streaming computation center server cluster.
2. The method of claim 1, further comprising:
the control server periodically sends heartbeat messages to the streaming computation center server cluster and the streaming computation element server cluster respectively, where the heartbeat messages are used to: detecting whether communication is possible between the control server and the cluster of streaming computing center servers, and whether communication is possible between the control server and the cluster of streaming computing unit servers;
correspondingly, the determining whether the target streaming computing center server cluster or the target streaming computing unit server cluster is abnormal specifically includes:
and judging whether the target streaming computation center server cluster or the target streaming computation unit server cluster does not feed back the heartbeat response within the preset feedback time.
3. The method of claim 1, wherein the assigning the unexecuted one of the streaming computing tasks to a candidate streaming computing center server cluster comprises:
the control server acquires the load condition of the streaming computation center server cluster in real time;
and the control server distributes the tasks which are not executed in the streaming computing tasks to the streaming computing center server cluster with the minimum current load according to the load condition.
4. The method of claim 3, wherein the streaming compute hub server clusters have hub storage clusters, the hub storage clusters between each streaming compute hub server cluster synchronizing intermediate state data and intermediate result data, each streaming compute unit server cluster synchronizing intermediate state data and intermediate result data to the hub storage clusters of each streaming compute hub server cluster; the method further comprises the following steps:
the control server stores the execution state and the configuration information of each streaming computing task into a control database; the execution state is used to represent: each stream type calculation task is executed on the corresponding stream type calculation center server cluster or the stream type calculation unit server cluster; the configuration information is used for representing: the corresponding relation between each streaming computing task and a streaming computing center server cluster executing the streaming computing task, or the corresponding relation between each streaming computing task and a streaming computing unit server cluster executing the streaming computing task;
correspondingly, the allocating the tasks that are not executed in the streaming computing tasks to the streaming computing center server cluster with the smallest current load includes:
the control server calculates tasks which are not completely executed in the streaming calculation tasks according to the execution state and the configuration information stored in the control database;
and the control server distributes the unexecuted tasks to the streaming computing center server cluster with the minimum current load.
5. A method for executing a streaming computing task is applied to any current streaming computing center server cluster reserved with preset computing resources in a streaming computing system, and the streaming computing system comprises: the system comprises a streaming computation center server cluster, a streaming computation unit server cluster and a control server; the streaming computing unit server cluster does not reserve computing resources; the streaming computation center server cluster is provided with a center storage cluster, intermediate state data and intermediate result data are synchronized among the center storage clusters, and the unit storage cluster of the streaming computation unit server cluster synchronizes the intermediate state data and the intermediate result data to the center storage cluster; the method comprises the following steps:
responding to tasks which are not executed in the redistributed streaming computing tasks when the control server is in an abnormal condition in other streaming computing center server clusters or streaming computing unit server clusters in the streaming computing system, wherein the current streaming computing center server cluster acquires intermediate state data and intermediate result data required by executing the tasks which are not executed from a center storage cluster;
and the current streaming computing center server cluster executes the tasks which are not executed completely by utilizing the preset computing resources, the intermediate state data and the intermediate result data.
6. The method of claim 5, further comprising:
responding to the periodic heartbeat message sent by the control server, and periodically feeding back a heartbeat response to the control server by the current streaming computing center server cluster; the heartbeat message is used for detecting whether communication can be carried out between the control server and the current streaming computation center server cluster.
7. The method of claim 6, further comprising:
and the current streaming computation center server cluster detects whether the continuous times of heartbeat response failure feedback to the control server exceed a preset time threshold value, and if so, the current streaming computation center server cluster stops the execution of the unexecuted task.
8. A control server is characterized in that the control server is connected with a stream type computation center server cluster and a stream type computation unit server cluster, and computation resources with preset proportions are reserved in the stream type computation center server cluster; the streaming computing unit server cluster does not reserve computing resources; the control server includes:
a first allocation unit, configured to, in response to receiving a streaming computation task, allocate the streaming computation task to a target streaming computation center server cluster or a target streaming computation unit server cluster;
a judging unit, configured to judge whether the target streaming computing center server cluster or the target streaming computing unit server cluster is abnormal or not in a process that the target streaming computing center server cluster or the target streaming computing unit server cluster executes the streaming computing task;
and the second distribution unit is used for distributing the tasks which are not executed in the streaming computing tasks to the candidate streaming computing center server cluster under the condition that the result of the judgment unit is yes.
9. A flow type computation center server cluster is characterized in that preset computation resources are reserved in the flow type computation center server cluster, the flow type computation center server cluster is connected with a control server, and the control server is also connected with a flow type computation unit server cluster; the streaming computing unit server cluster does not reserve computing resources; the streaming computation center server cluster is provided with a center storage cluster, and intermediate state data and intermediate result data are synchronized among the center storage clusters; the streaming computing unit server cluster is provided with a unit storage cluster, and the unit storage cluster synchronizes intermediate state data and intermediate result data to a central storage cluster; the method comprises the following steps:
the data acquisition unit is used for responding to tasks which are not executed in the redistributed streaming computing tasks when the control server is abnormal in other streaming computing center server clusters or streaming computing unit server clusters in the streaming computing system, and acquiring intermediate state data and intermediate result data required by executing the tasks which are not executed from a center storage cluster;
and the task execution unit is used for executing the tasks which are not completely executed by utilizing the preset computing resources, the intermediate state data and the intermediate result data.
10. A streaming computing system, the streaming computing system comprising: the streaming computing center server cluster and streaming computing unit server cluster of claim 9, the control server of claim 8; and the number of the first and second groups,
a central storage cluster corresponding to the streaming computing central server cluster, a control database corresponding to the control server, and a unit storage cluster corresponding to the streaming computing unit server cluster.
11. A displaced multi-campaign system, comprising: the system comprises a first stream type computing center server cluster, a plurality of stream type computing unit server clusters and a control server; wherein the first streaming computing center server cluster is the streaming computing center server cluster of claim 9, and the control server is the control server of claim 8; the streaming computing unit server cluster does not reserve computing resources;
and the number of the first and second groups,
the plurality of streaming computing unit server clusters are respectively deployed at a plurality of second geographic positions correspondingly; the first cluster of streaming computing center servers is deployed at a first geographic location.
12. The system of claim 11, wherein the displaced multi-campaign system further comprises: a second cluster of streaming computing hub servers deployed at a different first geographic location than the first cluster of streaming computing hub servers.
13. A displaced multi-campaign system, comprising:
the first flow type computation center server is at least used for providing computation resources to the outside, wherein the first flow type computation center server comprises a first center storage unit;
the second streaming computing center server is at least used for providing computing resources to the outside, and comprises a second center storage unit;
the first streaming computing center server and the second streaming computing center server finish load balancing based on a uniform load balancing strategy, and the first center storage unit and the second center storage unit are mutually hot-standby;
wherein, for a first streaming computing task running on the first streaming computing center server, when the first streaming computing center server fails to provide computing resources to the outside, the running on the first streaming computing center server is terminated, and the first streaming computing task continues to run on a second streaming computing center server based on intermediate state data and intermediate result data of a second center storage unit of the second streaming computing center server.
CN201610908946.7A 2016-10-18 2016-10-18 Distribution method of streaming computing task and control server Active CN107959705B (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
CN201610908946.7A CN107959705B (en) 2016-10-18 2016-10-18 Distribution method of streaming computing task and control server
TW106127334A TWI755417B (en) 2016-10-18 2017-08-11 Computing task allocation method, execution method of stream computing task, control server, stream computing center server cluster, stream computing system and remote multi-active system
PCT/CN2017/105360 WO2018072618A1 (en) 2016-10-18 2017-10-09 Method for allocating stream computing task and control server

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610908946.7A CN107959705B (en) 2016-10-18 2016-10-18 Distribution method of streaming computing task and control server

Publications (2)

Publication Number Publication Date
CN107959705A CN107959705A (en) 2018-04-24
CN107959705B true CN107959705B (en) 2021-08-20

Family

ID=61954266

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610908946.7A Active CN107959705B (en) 2016-10-18 2016-10-18 Distribution method of streaming computing task and control server

Country Status (3)

Country Link
CN (1) CN107959705B (en)
TW (1) TWI755417B (en)
WO (1) WO2018072618A1 (en)

Families Citing this family (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108737270B (en) * 2018-05-07 2021-01-26 北京京东尚科信息技术有限公司 Resource management method and device for server cluster
CN109358983A (en) * 2018-09-04 2019-02-19 深圳市宝德计算机系统有限公司 Server data processing method, device and storage medium
CN111090502B (en) * 2018-10-24 2024-05-17 阿里巴巴集团控股有限公司 Stream data task scheduling method and device
CN109656782A (en) * 2018-12-24 2019-04-19 成都四方伟业软件股份有限公司 Visual scheduling monitoring method, device and server
CN112148439B (en) * 2019-06-28 2024-03-08 浙江宇视科技有限公司 Task processing method, device, equipment and storage medium
CN111092931B (en) * 2019-11-15 2021-08-06 中国科学院计算技术研究所 Method and system for rapidly distributing streaming data of online super real-time simulation of power system
CN111124812A (en) * 2019-12-02 2020-05-08 深圳市智微智能软件开发有限公司 Server monitoring method and system
CN112732491B (en) * 2021-01-22 2024-03-12 中国人民财产保险股份有限公司 Data processing system and business data processing method based on data processing system
CN113190364A (en) * 2021-04-30 2021-07-30 平安壹钱包电子商务有限公司 Remote call management method and device, computer equipment and readable storage medium
CN113283803B (en) * 2021-06-17 2024-04-23 金蝶软件(中国)有限公司 Method for making material demand plan, related device and storage medium
CN113391902B (en) * 2021-06-22 2023-03-31 未鲲(上海)科技服务有限公司 Task scheduling method and device and storage medium
CN113472662B (en) * 2021-07-09 2022-10-04 武汉绿色网络信息服务有限责任公司 Path redistribution method and network service system
WO2023077451A1 (en) * 2021-11-05 2023-05-11 中国科学院计算技术研究所 Stream data processing method and system based on column-oriented database
CN114884946B (en) * 2022-04-28 2024-01-16 抖动科技(深圳)有限公司 Remote multi-activity implementation method based on artificial intelligence and related equipment
CN115242648B (en) * 2022-07-19 2024-05-28 北京百度网讯科技有限公司 Expansion and contraction capacity discrimination model training method and operator expansion and contraction capacity method

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102158387A (en) * 2010-02-12 2011-08-17 华东电网有限公司 Protection fault information processing system based on dynamic load balance and mutual hot backup
CN103763378A (en) * 2014-01-24 2014-04-30 中国联合网络通信集团有限公司 Task processing method and system and nodes based on distributive type calculation system

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6779016B1 (en) * 1999-08-23 2004-08-17 Terraspring, Inc. Extensible computing system
CN102929659B (en) * 2005-10-07 2016-05-04 茨特里克斯系统公司 The method of selecting between manner of execution for the predetermined quantity in application program
WO2009134772A2 (en) * 2008-04-29 2009-11-05 Maxiscale, Inc Peer-to-peer redundant file server system and methods
CN101483673B (en) * 2009-02-20 2013-02-13 杭州华三通信技术有限公司 Implementation method and system for heat backup at different sites
CN103973725B (en) * 2013-01-28 2018-08-24 阿里巴巴集团控股有限公司 A kind of distributed cooperative algorithm and synergist
CN103703830B (en) * 2013-05-31 2017-11-17 华为技术有限公司 A kind of physical resource adjustment, device and controller
US9785480B2 (en) * 2015-02-12 2017-10-10 Netapp, Inc. Load balancing and fault tolerant service in a distributed data system
CN104683488B (en) * 2015-03-31 2018-03-30 百度在线网络技术(北京)有限公司 Streaming computing system and its dispatching method and device

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102158387A (en) * 2010-02-12 2011-08-17 华东电网有限公司 Protection fault information processing system based on dynamic load balance and mutual hot backup
CN103763378A (en) * 2014-01-24 2014-04-30 中国联合网络通信集团有限公司 Task processing method and system and nodes based on distributive type calculation system

Also Published As

Publication number Publication date
WO2018072618A1 (en) 2018-04-26
CN107959705A (en) 2018-04-24
TWI755417B (en) 2022-02-21
TW201816616A (en) 2018-05-01

Similar Documents

Publication Publication Date Title
CN107959705B (en) Distribution method of streaming computing task and control server
US11307943B2 (en) Disaster recovery deployment method, apparatus, and system
CN113014634B (en) Cluster election processing method, device, equipment and storage medium
WO2017067484A1 (en) Virtualization data center scheduling system and method
US9641449B2 (en) Variable configurations for workload distribution across multiple sites
CN107453929B (en) Cluster system self-construction method and device and cluster system
CN105939389A (en) Load balancing method and device
US9058304B2 (en) Continuous workload availability between sites at unlimited distances
CN111459642B (en) Fault processing and task processing method and device in distributed system
CN109802986B (en) Equipment management method, system, device and server
WO2020119060A1 (en) Method and system for scheduling container resources, server, and computer readable storage medium
CN110300188B (en) Data transmission system, method and device
CN114070739B (en) Cluster deployment method, device, equipment and computer readable storage medium
US9047126B2 (en) Continuous availability between sites at unlimited distances
CN112631764A (en) Task scheduling method and device, computer equipment and computer readable medium
CN112333249A (en) Business service system and method
CN104484228B (en) Distributed parallel task processing system based on Intelli DSC
CN114338670B (en) Edge cloud platform and network-connected traffic three-level cloud control platform with same
CN111092754B (en) Real-time access service system and implementation method thereof
CN112631756A (en) Distributed regulation and control method and device applied to space flight measurement and control software
CN104486447A (en) Large platform cluster system based on Big-Cluster
CN103973811A (en) High-availability cluster management method capable of conducting dynamic migration
CN110519393B (en) Self-service equipment supervision method, device, equipment, server and medium
CN113923222A (en) Data processing method and device
CN104462581A (en) Micro-channel memory mapping and Smart-Slice based ultrafast file fingerprint extraction system and method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20211110

Address after: Room 507, floor 5, building 3, No. 969, Wenyi West Road, Wuchang Street, Yuhang District, Hangzhou City, Zhejiang Province

Patentee after: Zhejiang tmall Technology Co., Ltd

Address before: P.O. Box 847, 4th floor, capital building, Grand Cayman, British Cayman Islands

Patentee before: Alibaba Group Holdings Limited