WO2012024937A1 - Procédé et système de réalisation de traitement parallèle - Google Patents

Procédé et système de réalisation de traitement parallèle Download PDF

Info

Publication number
WO2012024937A1
WO2012024937A1 PCT/CN2011/072818 CN2011072818W WO2012024937A1 WO 2012024937 A1 WO2012024937 A1 WO 2012024937A1 CN 2011072818 W CN2011072818 W CN 2011072818W WO 2012024937 A1 WO2012024937 A1 WO 2012024937A1
Authority
WO
WIPO (PCT)
Prior art keywords
node
information
task
worker node
master node
Prior art date
Application number
PCT/CN2011/072818
Other languages
English (en)
Chinese (zh)
Inventor
周扬
胡媛
张艺夕
李桂萍
黄翔
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2012024937A1 publication Critical patent/WO2012024937A1/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1479Generic software techniques for error detection or fault masking
    • G06F11/1482Generic software techniques for error detection or fault masking by means of middleware or OS functionality
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1415Saving, restoring, recovering or retrying at system level
    • G06F11/1438Restarting or rejuvenating
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements

Definitions

  • the present invention relates to the field of cloud computing, and more particularly to a method and system for implementing parallel computing. Background technique
  • MapReduce was first proposed by Google engineers. It is a system architecture that can process massive amounts of data in parallel.
  • the MapReduce system works by: automatically breaking a task into multiple subtasks and then executing these subtasks in parallel, when all subtasks are executed. After the completion, the processing results will be summarized.
  • FIG. 1 shows the architecture of the existing MapReduce system.
  • MapReduce divides the data processing into two phases: the Map phase and the Reduce phase.
  • the MapReduce system mainly includes: a client (Client), a host (Master) node, and a worker (Worker) node; wherein, the Client is used to submit a MapReduce task, and the Master node is used to automatically decompose the MapReduce task into a Map task and a Reduce task, and then These tasks are scheduled to be executed on the Worker node. After the Worker node receives the Map or Reduce task request from the Master node, it performs the task in the request.
  • the MapReduce system automatically implements parallel processing, distributed data, fault tolerance, and balanced load.
  • the present invention provides a method for implementing parallel computing, the method comprising:
  • the new worker node When the worker node performing the task fails, the new worker node obtains the log information of the recorded faulty worker node, and continues to process the business process of the faulty worker node according to the log information from the breakpoint at the time of the fault; and/or, when When the master node that performs the task fails, the new master node starts to obtain the log information of the faulty master node, and continues to process the service flow of the faulty master node from the breakpoint when the fault occurs.
  • the new worker node obtains log information of the faulty worker node
  • the new Worker node After receiving the information, the new Worker node sends the query request information to the global information monitoring function entity;
  • the global information monitoring function entity After receiving the query request information, the global information monitoring function entity searches the log information of the faulty worker node saved by itself according to the query request information, and returns the log information of the faulty worker node to the new worker node.
  • the new master node obtains the log information of the faulty master node, which is:
  • the global information monitoring function entity After receiving the query request information, the global information monitoring function entity searches for the log information of the faulty Master node saved according to the query request information, and returns the log information of the faulty Master node to the new Master node.
  • the method before recording the log information of the Master node and the Worker node, the method further includes:
  • a node is selected as the master node for executing the task, and then the input data source to be processed is sent to the selected master node of the execution task;
  • the master node After receiving the input data source to be processed, the master node performing the task performs segmentation processing on the input data source;
  • the master performing the task selects a worker node that executes the task, and assigns a task to be executed to each worker node that performs the task;
  • the worker node performing the task reads the divided data block and performs the assigned task.
  • the log information of the Worker node and the Master node that record the task is:
  • the worker node and the master node performing the task upload their own log information to the global information monitoring function entity in real time;
  • the global information monitoring function entity saves the log information of the Worker node and the Master node that perform the task.
  • the method further includes:
  • the global information monitoring function entity After receiving the log information uploaded by the worker node, the global information monitoring function entity determines whether the identity information of the node carried in the information of the worker node is consistent with the identity information of the saved worker node, and when the consistency is determined, the worker node is saved. Log information, indeed When the inconsistency is determined, the log information of the worker node is discarded.
  • the present invention also provides a method for obtaining log information, the method comprising:
  • the log information of the saved faulty worker node is searched according to the query request information, and the faulty worker node is returned to the new Worker node.
  • the master node performing the task fails and receives the query request information sent by the new master node, searches for the saved fault information of the master node according to the query request information, and The new Master node returns the fault information of the faulty Master node.
  • the method before saving the log information of the master node and the worker node performing the task in real time, the method further includes:
  • the consistency is determined, the information of the worker node is saved, and when the inconsistency is determined, the log information of the worker node is discarded.
  • the present invention also provides a global information monitoring entity that obtains log information, where the global information monitoring entity includes: a storage module and a query module;
  • the storage module is configured to save the log information uploaded by the master node and the worker node performing the task in real time after the whole task is started;
  • the query module is configured to: when the worker node performing the task fails and after receiving the query request information sent by the new worker node, search for the log information of the faulty worker node saved by the storage module according to the query request information, and send the log information to the new
  • the worker node returns the log information of the faulty worker node; and/or, when the master node performing the task fails and receives the query request information sent by the new master node, searches for the faulty master node saved by the storage module according to the query request information. Log information, and return the fault master to the new master node Log information for the node.
  • the global information monitoring entity further includes: a determining module, configured to
  • the worker node When the worker node uploads the log information, it determines whether the identity information of the node carried in the log information of the worker node is consistent with the identity information of the saved worker node. When the consistency is determined, the log information of the worker node is saved, and when the inconsistency is determined, Discard the log information of the Worker node.
  • the storage module is further configured to save the identity information of the worker node.
  • the present invention also provides a system for implementing parallel computing, the system comprising: a global information monitoring function entity, a first worker node, and/or a first master node;
  • the global information monitoring function entity is configured to record the log information of the worker node and the master node performing the task after the overall task is started;
  • the first worker node is configured to: when the worker node performing the task fails, obtain the log information of the faulty worker node from the global information monitoring function entity, and continue to process the service of the faulty worker node according to the log information from the breakpoint at the time of the fault occurrence.
  • the first master node is configured to: when the master node performing the task fails, obtain the log information of the faulty master node from the global information monitoring function entity after the self-starting, and according to the log information, the fault occurs. The breakpoint continues to process the business process of the failed master node.
  • the system further includes: a User Program unit, a second Master node, and a second Worker node; wherein
  • the User Program unit is configured to select the second master node as the master node for executing the task after initiating the overall task by calling the client library, and send the input data source to be processed to the second master node;
  • the second master node is set to receive the input that needs to be processed sent by the User Program unit. After entering the data source, the input data source is divided, and then the worker node that performs the task is selected, and the task that needs to be executed is assigned to each worker node that performs the task;
  • the second worker node is configured to perform the assigned task after receiving the task assigned by the second master node.
  • the second master node is further configured to: when the second worker node fails, send information about performing the task to the first worker node;
  • the first worker node is configured to: after receiving the information sent by the second master node, send the query request information to the global information monitoring function entity, and receive the log information of the second worker node returned by the global information monitoring function entity;
  • the global information monitoring function entity is further configured to: after receiving the query request information sent by the first worker node, search for the information of the second worker node saved by the first worker node according to the query request information, and return the second work to the first worker node. Log information of the Worker node.
  • the first master node is configured to: when the second master node fails, send query request information to the global information monitoring function entity, and receive log information of the second master node returned by the global information monitoring function entity. ;
  • the global information monitoring function entity is further configured to: after receiving the query request information sent by the first master node, search for the information of the second master node saved by the first master node according to the query request information, and return the second information to the first master node. Log information of the master node.
  • the second worker node is further configured to upload its own log information to the global information monitoring function entity in real time after the whole task is started;
  • the second master node is further configured to upload its own log information to the global information monitoring function entity in real time after the overall task is started;
  • the global information monitoring function entity is further configured to save log information of the second worker node and the second master node.
  • the global information monitoring function entity is further set to save the second Worker. Before the log information of the node and the second master node is determined, it is determined whether the identity information of the node carried in the log information of the second worker node is consistent with the identity information of the saved worker node, and when the consistency is determined, the log of the second worker node is saved. When the information is determined to be inconsistent, the log information of the second worker node is discarded.
  • the new worker node obtains the log information of the recorded faulty worker node, and continues to process the business process of the faulty worker node from the breakpoint at the time of the fault according to the log information; and/or
  • the new master obtains the log information of the faulty master node, and continues to process the service flow of the faulty master node from the breakpoint at the time of the fault according to the log information, so that when the node fails, the fault occurs at the moment of the fault. Continue to perform tasks at the point, thereby improving data processing efficiency, saving system resources, and improving user experience.
  • FIG. 1 is a schematic structural diagram of an existing MapReduce system
  • FIG. 2 is a schematic flowchart of a method for implementing parallel computing according to an embodiment of the present invention
  • FIG. 3 is a schematic flowchart of a method before recording log information of a Master node and a Worker node according to an embodiment of the present invention
  • FIG. 4 is a schematic structural diagram of a system for implementing parallel computing according to an embodiment of the present invention. detailed description
  • the method for implementing parallel computing according to the present invention includes the following steps:
  • Step 201 After the whole task is started, record the log information of the worker node and the master node that execute the task;
  • Step 301 After the user program starts the overall task by calling the client program library, selecting a node as the master node for executing the task, and then sending the input data source to be processed to the selected master node performing the task;
  • Step 302 After the master node performing the task receives the input data source to be processed, the input data source is segmented, and then step 303 is performed;
  • the Master node can call the split function in the User Program to divide the input data source; the User Program can inform the Master node of the calling program parameters in advance, or can send the calling function to the Master node in advance by means of a message.
  • Step 303 The node performing the task selects the Worker node that executes the task, and assigns a task to be executed to each Worker node that executes the task;
  • Step 304 The worker node performing the task reads the divided data block and performs the assigned task.
  • the step 30 is the same as the existing process, and is not described here.
  • the log information includes: status information of the node operation and status and key data of the service process flow; wherein, the status information of the node operation , which may be: network status, CPU, memory, disk space, execution status of a Map task or a Reduce task, etc.; the status and key data of the business process flow are related to the specific business process being processed, for example, A business process of using MapReduce to send short messages of weather forecast to 100,000 mobile phone users in parallel, the state and key data of the business process flow include phone number information of the mobile phone user; in actual application, it may be added in the MapReduce system.
  • a global information monitoring function entity records the log information of the master node and the worker node by the global information monitoring function entity, and configures the identity information of the global information monitoring function entity on all nodes in the MapReduce system in advance, and the global information monitoring function Entity identity Information may be interconnection between network protocols (IP, Internet Protocol) address, all the identification number (ID, Identity) can show information such as the identity of the entity of the global information monitoring function;
  • MapReduce system in the A node may upload its own log information to the global information monitoring function entity according to the identity information of the global information monitoring function entity; after the overall task is started, the master node and the worker node upload their own log information to the global information monitoring function in real time. entity;
  • the master node assigns the overall task to which worker nodes to execute, and sends the identity information of the worker nodes to the global information monitoring function entity, and the global information monitoring function entity receives and The identifier information of the worker node is saved. If the worker node uploads the log information, the global information monitoring function entity determines whether to save the log information of the worker node according to the saved identity information of the worker node, specifically, the log information of the worker node.
  • the identity information of the worker node refers to Information that identifies the identity of the Worker node, such as: IP address, machine name, or ID;
  • the specific form of the global information monitoring function entity may be a log database, or may be an aggregate composed of one or more nodes;
  • the Worker node refers to a collection of all Worker nodes that perform the task.
  • Step 202 When the worker node performing the task fails, the new worker node obtains the log information of the faulty worker node, and continues to process the business process of the faulty worker node according to the log information from the breakpoint when the fault occurs; and / Or, when the master node performing the task fails, the new master node starts, obtains the recorded fault information of the master node, and continues to process the faulty master node service from the breakpoint when the fault occurs according to the information. Process;
  • the master node can know that the worker node performing the task is faulty through the heartbeat detection between itself and the worker node; after the worker node performing the task fails, the master node can be based on the load of other nodes in the MapReduce system, namely: existing Automatic load balancing processing in the MapReduce system, selecting a node as a new Worker node; the new Worker node may be a healthy Worker node that is performing the task, or may be a healthy Worker node that does not perform the task. ;
  • the User Program of the MapReduce system starts a timer. After the timer expires, the task execution result returned by the master node has not been received.
  • the master node is considered to be faulty. You need to select a new node as the master.
  • the node when selected, can be based on the load of other nodes in the MapReduce system, that is, the automatic load balancing processing in the existing MapReduce system, and select a node as the new master node; the new master node can be executed.
  • the master node of the task may also be another master node that does not perform the task;
  • the new worker node obtains the log information of the faulty worker node, where specifically: the master node sends information about performing the task to the new worker node;
  • the new Worker node After receiving the information, the new Worker node sends the query request information to the global information monitoring function entity;
  • the global information monitoring function entity After receiving the query request information, the global information monitoring function entity searches for the fault information of the faulty worker node saved by itself according to the query request information, and returns the log information of the faulty worker node to the new worker node;
  • the information about the execution task includes a task data source, a task ID, and identity information of the faulty worker node;
  • the query request information includes a task ID, a node identifier information of the fault worker, and the like, and the node identifier information of the fault worker may be information such as an IP address, a machine name, an ID, and the like, which can identify the identity of the faulty worker node;
  • the new master node obtains the log information of the faulty master node, specifically: the new master node sends the query request information to the global information monitoring function entity; after receiving the query request information, the global information monitoring function entity according to the query request information Find Log information of the faulty master node saved by itself, and returning log information of the faulty master node to the new master node;
  • the query request information includes information such as identity information or task ID information of the faulty master node, which can identify the log record of the faulty master node; the identity information of the faulty master node may be an IP address, a machine name, an ID, and the like. Everything that identifies the identity of the failed Master node.
  • the external interface is called to upload its own log information to the global information monitoring function entity, and the master node is notified that the task it is responsible for has been processed. After receiving the notification, the master node will The task of the Worker node is marked as completed. After receiving notifications from all the Worker nodes that the processing has been completed, the Master node ends the overall task.
  • the present invention further provides a global information monitoring entity that obtains log information, where the global information monitoring entity includes: a storage module and a query module;
  • a storage module configured to save log information uploaded by the master node and the worker node performing the task in real time after the whole task is started;
  • a query module configured to: when a worker node performing a task fails and after receiving the query request information sent by the new worker node, search for log information of the faulty worker node saved by the storage module according to the query request information, and send the log information to the new
  • the worker node returns the log information of the faulty worker node; and/or, when the master node performing the task fails and receives the query request information sent by the new master node, searches for the faulty master node saved by the storage module according to the query request information.
  • the message information is returned to the new Master node and the log information of the failed Master node is returned.
  • the global information monitoring entity may further include a determining module, configured to: when the worker node uploads the information, determine whether the identity information of the node carried in the information of the worker node and the identity information of the saved worker node are Consistent, when determining consistency, The log information of the worker node is saved. Otherwise, the log information of the worker node is discarded.
  • the storage module is further configured to save identity information of the Worker node.
  • the present invention further provides a system for implementing parallel computing.
  • the system includes: a global information monitoring function entity 41, a first worker node 42, and/or a first master node 43;
  • the global information monitoring function entity 41 is configured to record log information of the worker node and the master node that perform the task after the whole task is started;
  • the first worker node 42 is configured to: when the worker node performing the task fails, obtain the log information of the faulty worker node from the global information monitoring function entity 41, and continue to process the faulty worker node from the breakpoint when the fault occurs according to the log information.
  • the first master node 43 is configured to: when the master node performing the task fails, obtain the log information of the faulty master node from the global information monitoring function entity 41 after the self-starting, and according to the log information The processing of the faulty Master node is continued from the breakpoint at the time of the fault.
  • the first worker node 42 may be a healthy worker node that is performing the task, and may also be a healthy worker node that does not perform the task;
  • the first master node 43 may be a master node that performs the task. It can also be another Master node that does not perform this task.
  • the system may further include a User Program unit, a second Master node, and a second Worker node;
  • a User Program unit configured to start a whole task by calling a client library, select a second master node as a master node for performing a task, and send an input data source to be processed to the second master node;
  • the second master node is configured to receive the input data source to be processed sent by the User Program unit, perform the segmentation process on the input data source, and then select the worker node that performs the task, And assign a task to be executed to each worker node that performs the task;
  • the second worker node is configured to perform the assigned task after receiving the task assigned by the second master node.
  • the second worker node may be a collection of more than one worker node performing the task.
  • the second master node is further configured to send information about performing a task to the first worker node 42 when the second worker node fails;
  • the first worker node is specifically configured to: after receiving the information sent by the second master node, send the query request information to the global information monitoring function entity 41, and receive the log of the second worker node returned by the global information monitoring function entity 41.
  • the global information monitoring function entity 41 is further configured to: after receiving the query request information sent by the first worker node 42, search for the information of the second worker node saved by the first worker node according to the query request information, and send the information to the first worker node 41. Returns the information of the second Worker node.
  • the first master node 42 is specifically configured to: when the second master node fails, send the query request information to the global information monitoring function entity 41, and receive the second master node returned by the global information monitoring function entity 41.
  • Log information is specifically configured to: when the second master node fails, send the query request information to the global information monitoring function entity 41, and receive the second master node returned by the global information monitoring function entity 41.
  • the global information monitoring function entity 41 is further configured to: after receiving the query request information sent by the first master node 43 , search for the information of the second master node saved by the first master node 43 according to the query request information, and send the information to the first master node 43 Returns the information of the second master node.
  • the second worker node is further configured to upload its own log information to the global information monitoring function entity 41 in real time after the overall task is started;
  • the second master node is further configured to upload its own log information to the global information monitoring function entity 41 in real time after the overall task is started;
  • the global information monitoring function entity 41 is further configured to save log information of the second worker node and the second master node.
  • the global information monitoring function entity 41 is further configured to determine the identity information of the node carried in the log information of the second worker node and the saved Worker before saving the log information of the second worker node and the second master node. If the identity information of the node is consistent and the consistency is determined, the log information of the second worker node is saved, and the log information of the second worker node is discarded.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

L'invention porte sur un procédé de réalisation de traitement parallèle. Le procédé consiste à : enregistrer des informations de journal de nœuds de travail et de nœuds maîtres exécutant des tâches après qu'une tâche globale a été lancée ; lorsqu'une défaillance se produit sur un nœud de travail exécutant une tâche, obtenir les informations de journal enregistrées du nœud de travail défaillant et continuer à traiter le flux d'opérations du nœud de travail défaillant à partir du point d'interruption au niveau duquel la défaillance s'est produite conformément aux informations de journal par un nouveau nœud de travail ; et/ou lorsqu'une défaillance se produit sur un nœud maître exécutant une tâche, après qu'un nouveau nœud maître a été établi, obtenir les informations de journal enregistrées du nœud maître défaillant et continuer le traitement du flux d'opérations du nœud maître défaillant à partir du point d'interruption au niveau duquel la défaillance s'est produite conformément aux informations de journal. En outre, l'invention porte sur un système de réalisation de traitement parallèle. Lorsqu'une défaillance se produit sur un nœud, le procédé et le système permettent de continuer à exécuter des tâches à partir du point d'interruption au niveau duquel la défaillance s'est produite.
PCT/CN2011/072818 2010-08-27 2011-04-14 Procédé et système de réalisation de traitement parallèle WO2012024937A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201010269332.1A CN102385536B (zh) 2010-08-27 2010-08-27 一种实现并行计算的方法及系统
CN201010269332.1 2010-08-27

Publications (1)

Publication Number Publication Date
WO2012024937A1 true WO2012024937A1 (fr) 2012-03-01

Family

ID=45722853

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2011/072818 WO2012024937A1 (fr) 2010-08-27 2011-04-14 Procédé et système de réalisation de traitement parallèle

Country Status (2)

Country Link
CN (1) CN102385536B (fr)
WO (1) WO2012024937A1 (fr)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103136363A (zh) * 2013-03-14 2013-06-05 曙光信息产业(北京)有限公司 查询处理方法和集群数据库系统
CN104461752B (zh) * 2014-11-21 2018-09-18 浙江宇视科技有限公司 一种两级故障容错的多媒体分布式任务处理方法
CN106789141B (zh) * 2015-11-24 2020-12-11 阿里巴巴集团控股有限公司 一种网关设备故障处理方法及装置
CN107644382A (zh) * 2016-07-22 2018-01-30 平安科技(深圳)有限公司 保单信息统计方法和装置
CN108959063A (zh) * 2017-05-25 2018-12-07 北京京东尚科信息技术有限公司 一种程序执行的方法和装置
CN108600008B (zh) * 2018-04-24 2021-12-17 致云科技有限公司 服务器管理方法、服务器管理装置及分布式系统
CN110673936B (zh) * 2019-09-18 2022-05-17 平安科技(深圳)有限公司 编排业务的断点续作方法、装置、存储介质及电子设备
CN113596148A (zh) * 2021-07-27 2021-11-02 上海商汤科技开发有限公司 一种数据传输方法、系统、装置、计算设备和存储介质

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060085792A1 (en) * 2004-10-15 2006-04-20 Microsoft Corporation Systems and methods for a disaster recovery system utilizing virtual machines running on at least two host computers in physically different locations
CN101145946A (zh) * 2007-09-17 2008-03-19 中兴通讯股份有限公司 一种基于消息日志的容错集群系统和方法
CN101770402A (zh) * 2008-12-29 2010-07-07 中国移动通信集团公司 MapReduce系统中的Map任务调度方法、设备及系统

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2007172334A (ja) * 2005-12-22 2007-07-05 Internatl Business Mach Corp <Ibm> 並列型演算システムの冗長性を確保するための方法、システム、およびプログラム
US8230070B2 (en) * 2007-11-09 2012-07-24 Manjrasoft Pty. Ltd. System and method for grid and cloud computing
CN101764835B (zh) * 2008-12-25 2012-09-05 华为技术有限公司 基于MapReduce编程架构的任务分配方法及装置

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060085792A1 (en) * 2004-10-15 2006-04-20 Microsoft Corporation Systems and methods for a disaster recovery system utilizing virtual machines running on at least two host computers in physically different locations
CN101145946A (zh) * 2007-09-17 2008-03-19 中兴通讯股份有限公司 一种基于消息日志的容错集群系统和方法
CN101770402A (zh) * 2008-12-29 2010-07-07 中国移动通信集团公司 MapReduce系统中的Map任务调度方法、设备及系统

Also Published As

Publication number Publication date
CN102385536A (zh) 2012-03-21
CN102385536B (zh) 2014-06-11

Similar Documents

Publication Publication Date Title
WO2012024937A1 (fr) Procédé et système de réalisation de traitement parallèle
US10805363B2 (en) Method, device and system for pushing file
TWI728036B (zh) 資訊處理方法、裝置和系統
US10511480B2 (en) Message flow management for virtual networks
CN110311831B (zh) 基于容器云的系统资源监控方法及相关设备
WO2017181783A1 (fr) Procédé et dispositif de surveillance de service de virtualisation
WO2017162011A1 (fr) Procédé et dispositif de traitement de données de performances d&#39;élément de réseau, et nms
WO2017107900A1 (fr) Procédé de reprise de machine virtuelle et dispositif de gestion de machine virtuelle
CN110309161B (zh) 一种数据同步方法、装置及服务器
TW201737126A (zh) 執行資料恢復操作的方法及裝置
JP2014500559A5 (fr)
CN105302676A (zh) 一种分布式文件系统的主备机制数据传输方法及装置
US20130219224A1 (en) Job continuation management apparatus, job continuation management method and job continuation management program
WO2020232871A1 (fr) Procédé et dispositif d&#39;analyse de dépendance de microservice
CN106452836B (zh) 主节点设置方法及装置
TW201732654A (zh) 異步服務處理方法及其伺服器
TW201738781A (zh) 資料表連接方法及裝置
CN110971702A (zh) 服务调用方法、装置、计算机设备及存储介质
CN107391303B (zh) 数据处理方法、装置、系统、服务器及计算机存储介质
CN111352716A (zh) 一种基于大数据的任务请求方法、装置、系统及存储介质
JP2018072944A (ja) プログラム、システム及び情報処理方法
CN114238703A (zh) 事件流程编排方法、装置及应用
CN104407942A (zh) 一种基于异地存储的Linux操作系统备份恢复方法
WO2016095716A1 (fr) Procédé de traitement d&#39;informations de défaillance et dispositif correspondant
CN112035062B (zh) 云计算的本地存储的迁移方法、计算机设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 11819312

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 11819312

Country of ref document: EP

Kind code of ref document: A1