JP6934754B2 - 分散処理システム、分散処理システムの管理方法、及び分散処理システム管理プログラム - Google Patents

分散処理システム、分散処理システムの管理方法、及び分散処理システム管理プログラム Download PDF

Info

Publication number
JP6934754B2
JP6934754B2 JP2017117659A JP2017117659A JP6934754B2 JP 6934754 B2 JP6934754 B2 JP 6934754B2 JP 2017117659 A JP2017117659 A JP 2017117659A JP 2017117659 A JP2017117659 A JP 2017117659A JP 6934754 B2 JP6934754 B2 JP 6934754B2
Authority
JP
Japan
Prior art keywords
information processing
information
operating
configuration information
configuration
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
JP2017117659A
Other languages
English (en)
Japanese (ja)
Other versions
JP2019004327A5 (enExample
JP2019004327A (ja
Inventor
宏明 郡浦
宏明 郡浦
木下 雅文
雅文 木下
伸之 茶木
伸之 茶木
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hitachi Ltd
Original Assignee
Hitachi Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hitachi Ltd filed Critical Hitachi Ltd
Priority to JP2017117659A priority Critical patent/JP6934754B2/ja
Priority to PCT/JP2018/020582 priority patent/WO2018230332A1/ja
Priority to US16/494,601 priority patent/US11010269B2/en
Publication of JP2019004327A publication Critical patent/JP2019004327A/ja
Publication of JP2019004327A5 publication Critical patent/JP2019004327A5/ja
Application granted granted Critical
Publication of JP6934754B2 publication Critical patent/JP6934754B2/ja
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/202Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant
    • G06F11/2035Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant without idle spare hardware
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3006Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system is distributed, e.g. networked systems, clusters, multiprocessor systems
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1415Saving, restoring, recovering or retrying at system level
    • G06F11/142Reconfiguring to eliminate the error
    • G06F11/1425Reconfiguring to eliminate the error by reconfiguration of node membership
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/202Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant
    • G06F11/2023Failover techniques
    • G06F11/2025Failover techniques using centralised failover control functionality
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/202Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant
    • G06F11/2023Failover techniques
    • G06F11/2028Failover techniques eliminating a faulty processor or activating a spare
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/2053Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where persistent mass storage functionality or persistent mass storage control functionality is redundant
    • G06F11/2094Redundant storage or storage space
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Hardware Redundancy (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)
JP2017117659A 2017-06-15 2017-06-15 分散処理システム、分散処理システムの管理方法、及び分散処理システム管理プログラム Active JP6934754B2 (ja)

Priority Applications (3)

Application Number Priority Date Filing Date Title
JP2017117659A JP6934754B2 (ja) 2017-06-15 2017-06-15 分散処理システム、分散処理システムの管理方法、及び分散処理システム管理プログラム
PCT/JP2018/020582 WO2018230332A1 (ja) 2017-06-15 2018-05-29 分散処理システム、及び分散処理システムの管理方法
US16/494,601 US11010269B2 (en) 2017-06-15 2018-05-29 Distributed processing system and method for management of distributed processing system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP2017117659A JP6934754B2 (ja) 2017-06-15 2017-06-15 分散処理システム、分散処理システムの管理方法、及び分散処理システム管理プログラム

Publications (3)

Publication Number Publication Date
JP2019004327A JP2019004327A (ja) 2019-01-10
JP2019004327A5 JP2019004327A5 (enExample) 2020-03-12
JP6934754B2 true JP6934754B2 (ja) 2021-09-15

Family

ID=64660933

Family Applications (1)

Application Number Title Priority Date Filing Date
JP2017117659A Active JP6934754B2 (ja) 2017-06-15 2017-06-15 分散処理システム、分散処理システムの管理方法、及び分散処理システム管理プログラム

Country Status (3)

Country Link
US (1) US11010269B2 (enExample)
JP (1) JP6934754B2 (enExample)
WO (1) WO2018230332A1 (enExample)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11042443B2 (en) * 2018-10-17 2021-06-22 California Institute Of Technology Fault tolerant computer systems and methods establishing consensus for which processing system should be the prime string

Family Cites Families (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4816989A (en) * 1987-04-15 1989-03-28 Allied-Signal Inc. Synchronizer for a fault tolerant multiple node processing system
US6363497B1 (en) * 1997-05-13 2002-03-26 Micron Technology, Inc. System for clustering software applications
US6108699A (en) * 1997-06-27 2000-08-22 Sun Microsystems, Inc. System and method for modifying membership in a clustered distributed computer system and updating system configuration
US6401120B1 (en) * 1999-03-26 2002-06-04 Microsoft Corporation Method and system for consistent cluster operational data in a server cluster using a quorum of replicas
US6615366B1 (en) * 1999-12-21 2003-09-02 Intel Corporation Microprocessor with dual execution core operable in high reliability mode
US6915391B2 (en) * 2000-12-15 2005-07-05 International Business Machines Corporation Support for single-node quorum in a two-node nodeset for a shared disk parallel file system
US7296268B2 (en) * 2000-12-18 2007-11-13 Microsoft Corporation Dynamic monitor and controller of availability of a load-balancing cluster
JP2005055995A (ja) 2003-08-07 2005-03-03 Hitachi Ltd ストレージ制御方法、および、冗長化機能を有するサーバシステム
JP4089569B2 (ja) 2003-09-19 2008-05-28 日立工機株式会社 圧縮空気ねじ締め機
JP4611922B2 (ja) * 2006-03-28 2011-01-12 富士通株式会社 制御プログラム、制御方法および制御装置
JP5211766B2 (ja) * 2008-03-10 2013-06-12 富士通株式会社 資源割り当て装置及びプログラム
JP5368907B2 (ja) 2009-08-10 2013-12-18 株式会社エヌ・ティ・ティ・データ サーバ管理システム、サーバ管理方法、及びプログラム
JP2011159222A (ja) 2010-02-03 2011-08-18 Nec Corp サーバシステム及びサーバシステムの制御方法
US9086962B2 (en) * 2012-06-15 2015-07-21 International Business Machines Corporation Aggregating job exit statuses of a plurality of compute nodes executing a parallel application
US9032251B2 (en) * 2013-03-12 2015-05-12 Cray Inc. Re-forming an application control tree without terminating the application
US9372766B2 (en) * 2014-02-11 2016-06-21 Saudi Arabian Oil Company Circumventing load imbalance in parallel simulations caused by faulty hardware nodes
JP6558037B2 (ja) * 2015-04-10 2019-08-14 富士通株式会社 運用管理プログラム、運用管理方法、および運用管理装置

Also Published As

Publication number Publication date
US20200089585A1 (en) 2020-03-19
US11010269B2 (en) 2021-05-18
WO2018230332A1 (ja) 2018-12-20
JP2019004327A (ja) 2019-01-10

Similar Documents

Publication Publication Date Title
JP4648447B2 (ja) 障害復旧方法、プログラムおよび管理サーバ
EP2273371B1 (en) Failover procedure for server system
US8423821B1 (en) Virtual recovery server
JP6850771B2 (ja) 情報処理システム、情報処理システムの管理方法及びプログラム
US9501374B2 (en) Disaster recovery appliance
US8745171B1 (en) Warm standby appliance
JP5352115B2 (ja) ストレージシステム及びその監視条件変更方法
JPWO2014076838A1 (ja) 仮想マシン同期システム
JP2012173996A (ja) クラスタシステム、クラスタ管理方法、およびクラスタ管理プログラム
JP6934754B2 (ja) 分散処理システム、分散処理システムの管理方法、及び分散処理システム管理プログラム
CN106293501A (zh) 数据读写方法及装置
US11762741B2 (en) Storage system, storage node virtual machine restore method, and recording medium
JP2008276281A (ja) データ同期システム、方法、及び、プログラム
CN113874842B (zh) 容错系统、服务器、容错系统的运行方法和服务器的运行方法
CN116389233B (zh) 容器云管理平台主备切换系统、方法、装置和计算机设备
JP5798056B2 (ja) 呼処理情報の冗長化制御システムおよびこれに利用する予備保守サーバ
JP6954693B2 (ja) フォールトトレラントシステム、サーバ、それらの運用方法、及びプログラム
JP2015005149A (ja) クラウドプリントにおけるプリントサーバ障害時のリカバリ方法
JP5947974B2 (ja) 情報処理装置及び情報処理装置の交換支援システム並びに交換支援方法
KR20180018195A (ko) 공정 관리 장치, 이와 연동하는 데이터 서버를 포함하는 반도체 공정 관리 시스템 및 이를 이용한 반도체 공정 관리 방법
KR20160101705A (ko) 공정 관리 장치, 이와 연동하는 데이터 서버를 포함하는 반도체 공정 관리 시스템 및 이를 이용한 반도체 공정 관리 방법
CN119966803B (zh) 故障节点切换方法、装置、设备和存储介质
JP2011159222A (ja) サーバシステム及びサーバシステムの制御方法
JP2015232812A (ja) サービス提供システム、サービス提供方法及びコンピュータプログラム
JP5876780B2 (ja) 加入者データベース切替制御装置、加入者データベースシステム、加入者データベース切替制御装置の制御プログラム、および加入者データベース切替制御装置の制御方法

Legal Events

Date Code Title Description
A521 Written amendment

Free format text: JAPANESE INTERMEDIATE CODE: A523

Effective date: 20200131

A621 Written request for application examination

Free format text: JAPANESE INTERMEDIATE CODE: A621

Effective date: 20200131

A131 Notification of reasons for refusal

Free format text: JAPANESE INTERMEDIATE CODE: A131

Effective date: 20210126

A521 Written amendment

Free format text: JAPANESE INTERMEDIATE CODE: A523

Effective date: 20210326

A131 Notification of reasons for refusal

Free format text: JAPANESE INTERMEDIATE CODE: A131

Effective date: 20210511

A521 Written amendment

Free format text: JAPANESE INTERMEDIATE CODE: A523

Effective date: 20210608

TRDD Decision of grant or rejection written
A01 Written decision to grant a patent or to grant a registration (utility model)

Free format text: JAPANESE INTERMEDIATE CODE: A01

Effective date: 20210803

A61 First payment of annual fees (during grant procedure)

Free format text: JAPANESE INTERMEDIATE CODE: A61

Effective date: 20210824

R150 Certificate of patent or registration of utility model

Ref document number: 6934754

Country of ref document: JP

Free format text: JAPANESE INTERMEDIATE CODE: R150