JP6934754B2 - 分散処理システム、分散処理システムの管理方法、及び分散処理システム管理プログラム - Google Patents

分散処理システム、分散処理システムの管理方法、及び分散処理システム管理プログラム Download PDF

Info

Publication number
JP6934754B2
JP6934754B2 JP2017117659A JP2017117659A JP6934754B2 JP 6934754 B2 JP6934754 B2 JP 6934754B2 JP 2017117659 A JP2017117659 A JP 2017117659A JP 2017117659 A JP2017117659 A JP 2017117659A JP 6934754 B2 JP6934754 B2 JP 6934754B2
Authority
JP
Japan
Prior art keywords
information processing
information
operating
configuration information
configuration
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
JP2017117659A
Other languages
English (en)
Japanese (ja)
Other versions
JP2019004327A5 (pt
JP2019004327A (ja
Inventor
宏明 郡浦
宏明 郡浦
木下 雅文
雅文 木下
伸之 茶木
伸之 茶木
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hitachi Ltd
Original Assignee
Hitachi Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hitachi Ltd filed Critical Hitachi Ltd
Priority to JP2017117659A priority Critical patent/JP6934754B2/ja
Priority to PCT/JP2018/020582 priority patent/WO2018230332A1/ja
Priority to US16/494,601 priority patent/US11010269B2/en
Publication of JP2019004327A publication Critical patent/JP2019004327A/ja
Publication of JP2019004327A5 publication Critical patent/JP2019004327A5/ja
Application granted granted Critical
Publication of JP6934754B2 publication Critical patent/JP6934754B2/ja
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/202Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant
    • G06F11/2035Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant without idle spare hardware
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3006Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system is distributed, e.g. networked systems, clusters, multiprocessor systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1415Saving, restoring, recovering or retrying at system level
    • G06F11/142Reconfiguring to eliminate the error
    • G06F11/1425Reconfiguring to eliminate the error by reconfiguration of node membership
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/202Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant
    • G06F11/2023Failover techniques
    • G06F11/2025Failover techniques using centralised failover control functionality
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/202Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant
    • G06F11/2023Failover techniques
    • G06F11/2028Failover techniques eliminating a faulty processor or activating a spare
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/2053Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where persistent mass storage functionality or persistent mass storage control functionality is redundant
    • G06F11/2094Redundant storage or storage space
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Hardware Redundancy (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)
JP2017117659A 2017-06-15 2017-06-15 分散処理システム、分散処理システムの管理方法、及び分散処理システム管理プログラム Active JP6934754B2 (ja)

Priority Applications (3)

Application Number Priority Date Filing Date Title
JP2017117659A JP6934754B2 (ja) 2017-06-15 2017-06-15 分散処理システム、分散処理システムの管理方法、及び分散処理システム管理プログラム
PCT/JP2018/020582 WO2018230332A1 (ja) 2017-06-15 2018-05-29 分散処理システム、及び分散処理システムの管理方法
US16/494,601 US11010269B2 (en) 2017-06-15 2018-05-29 Distributed processing system and method for management of distributed processing system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP2017117659A JP6934754B2 (ja) 2017-06-15 2017-06-15 分散処理システム、分散処理システムの管理方法、及び分散処理システム管理プログラム

Publications (3)

Publication Number Publication Date
JP2019004327A JP2019004327A (ja) 2019-01-10
JP2019004327A5 JP2019004327A5 (pt) 2020-03-12
JP6934754B2 true JP6934754B2 (ja) 2021-09-15

Family

ID=64660933

Family Applications (1)

Application Number Title Priority Date Filing Date
JP2017117659A Active JP6934754B2 (ja) 2017-06-15 2017-06-15 分散処理システム、分散処理システムの管理方法、及び分散処理システム管理プログラム

Country Status (3)

Country Link
US (1) US11010269B2 (pt)
JP (1) JP6934754B2 (pt)
WO (1) WO2018230332A1 (pt)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11042443B2 (en) * 2018-10-17 2021-06-22 California Institute Of Technology Fault tolerant computer systems and methods establishing consensus for which processing system should be the prime string

Family Cites Families (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4914657A (en) * 1987-04-15 1990-04-03 Allied-Signal Inc. Operations controller for a fault tolerant multiple node processing system
US6363497B1 (en) * 1997-05-13 2002-03-26 Micron Technology, Inc. System for clustering software applications
US6108699A (en) * 1997-06-27 2000-08-22 Sun Microsystems, Inc. System and method for modifying membership in a clustered distributed computer system and updating system configuration
US6401120B1 (en) * 1999-03-26 2002-06-04 Microsoft Corporation Method and system for consistent cluster operational data in a server cluster using a quorum of replicas
US6615366B1 (en) * 1999-12-21 2003-09-02 Intel Corporation Microprocessor with dual execution core operable in high reliability mode
US6915391B2 (en) * 2000-12-15 2005-07-05 International Business Machines Corporation Support for single-node quorum in a two-node nodeset for a shared disk parallel file system
US7296268B2 (en) * 2000-12-18 2007-11-13 Microsoft Corporation Dynamic monitor and controller of availability of a load-balancing cluster
JP2005055995A (ja) 2003-08-07 2005-03-03 Hitachi Ltd ストレージ制御方法、および、冗長化機能を有するサーバシステム
JP4089569B2 (ja) 2003-09-19 2008-05-28 日立工機株式会社 圧縮空気ねじ締め機
JP4611922B2 (ja) * 2006-03-28 2011-01-12 富士通株式会社 制御プログラム、制御方法および制御装置
JP5211766B2 (ja) * 2008-03-10 2013-06-12 富士通株式会社 資源割り当て装置及びプログラム
JP5368907B2 (ja) 2009-08-10 2013-12-18 株式会社エヌ・ティ・ティ・データ サーバ管理システム、サーバ管理方法、及びプログラム
JP2011159222A (ja) 2010-02-03 2011-08-18 Nec Corp サーバシステム及びサーバシステムの制御方法
US9086962B2 (en) * 2012-06-15 2015-07-21 International Business Machines Corporation Aggregating job exit statuses of a plurality of compute nodes executing a parallel application
US9032251B2 (en) * 2013-03-12 2015-05-12 Cray Inc. Re-forming an application control tree without terminating the application
US9372766B2 (en) * 2014-02-11 2016-06-21 Saudi Arabian Oil Company Circumventing load imbalance in parallel simulations caused by faulty hardware nodes
JP6558037B2 (ja) * 2015-04-10 2019-08-14 富士通株式会社 運用管理プログラム、運用管理方法、および運用管理装置

Also Published As

Publication number Publication date
US11010269B2 (en) 2021-05-18
WO2018230332A1 (ja) 2018-12-20
US20200089585A1 (en) 2020-03-19
JP2019004327A (ja) 2019-01-10

Similar Documents

Publication Publication Date Title
CN109857445B (zh) 存储系统和控制软件配置方法
JP4648447B2 (ja) 障害復旧方法、プログラムおよび管理サーバ
EP2273371B1 (en) Failover procedure for server system
US8423821B1 (en) Virtual recovery server
US9645900B2 (en) Warm standby appliance
JP6850771B2 (ja) 情報処理システム、情報処理システムの管理方法及びプログラム
US8977887B2 (en) Disaster recovery appliance
JP5352115B2 (ja) ストレージシステム及びその監視条件変更方法
JP2007226400A (ja) 計算機管理方法、計算機管理プログラム、実行サーバの構成を管理する待機サーバ及び計算機システム
WO2014076838A1 (ja) 仮想マシン同期システム
CN109446178A (zh) 一种Hadoop对象存储高可用方法、系统、装置及可读存储介质
JP2012173996A (ja) クラスタシステム、クラスタ管理方法、およびクラスタ管理プログラム
JP6934754B2 (ja) 分散処理システム、分散処理システムの管理方法、及び分散処理システム管理プログラム
CN116389233B (zh) 容器云管理平台主备切换系统、方法、装置和计算机设备
US11762741B2 (en) Storage system, storage node virtual machine restore method, and recording medium
JP6954693B2 (ja) フォールトトレラントシステム、サーバ、それらの運用方法、及びプログラム
JP5798056B2 (ja) 呼処理情報の冗長化制御システムおよびこれに利用する予備保守サーバ
CN106293501A (zh) 数据读写方法及装置
JP2008276281A (ja) データ同期システム、方法、及び、プログラム
JP6773345B1 (ja) フォールトトレラントシステム、サーバ、及びそれらの運用方法
JP6394212B2 (ja) 情報処理システム、ストレージ装置及びプログラム
KR20180018195A (ko) 공정 관리 장치, 이와 연동하는 데이터 서버를 포함하는 반도체 공정 관리 시스템 및 이를 이용한 반도체 공정 관리 방법
JP5947974B2 (ja) 情報処理装置及び情報処理装置の交換支援システム並びに交換支援方法
WO2016046951A1 (ja) 計算機システム及びそのファイル管理方法
JP2015005149A (ja) クラウドプリントにおけるプリントサーバ障害時のリカバリ方法

Legal Events

Date Code Title Description
A521 Written amendment

Free format text: JAPANESE INTERMEDIATE CODE: A523

Effective date: 20200131

A621 Written request for application examination

Free format text: JAPANESE INTERMEDIATE CODE: A621

Effective date: 20200131

A131 Notification of reasons for refusal

Free format text: JAPANESE INTERMEDIATE CODE: A131

Effective date: 20210126

A521 Written amendment

Free format text: JAPANESE INTERMEDIATE CODE: A523

Effective date: 20210326

A131 Notification of reasons for refusal

Free format text: JAPANESE INTERMEDIATE CODE: A131

Effective date: 20210511

A521 Written amendment

Free format text: JAPANESE INTERMEDIATE CODE: A523

Effective date: 20210608

TRDD Decision of grant or rejection written
A01 Written decision to grant a patent or to grant a registration (utility model)

Free format text: JAPANESE INTERMEDIATE CODE: A01

Effective date: 20210803

A61 First payment of annual fees (during grant procedure)

Free format text: JAPANESE INTERMEDIATE CODE: A61

Effective date: 20210824

R150 Certificate of patent or registration of utility model

Ref document number: 6934754

Country of ref document: JP

Free format text: JAPANESE INTERMEDIATE CODE: R150