CN113220519A - High-availability host system - Google Patents

High-availability host system Download PDF

Info

Publication number
CN113220519A
CN113220519A CN202110591338.9A CN202110591338A CN113220519A CN 113220519 A CN113220519 A CN 113220519A CN 202110591338 A CN202110591338 A CN 202110591338A CN 113220519 A CN113220519 A CN 113220519A
Authority
CN
China
Prior art keywords
host
coupler
physical
host module
physical host
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110591338.9A
Other languages
Chinese (zh)
Other versions
CN113220519B (en
Inventor
曹杰瑞
滕腾
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Construction Bank Corp
Original Assignee
China Construction Bank Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Construction Bank Corp filed Critical China Construction Bank Corp
Priority to CN202110591338.9A priority Critical patent/CN113220519B/en
Publication of CN113220519A publication Critical patent/CN113220519A/en
Application granted granted Critical
Publication of CN113220519B publication Critical patent/CN113220519B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/22Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing
    • G06F11/26Functional testing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/04Generating or distributing clock signals or signals derived directly therefrom
    • G06F1/06Clock generators producing several clock signals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/38Information transfer, e.g. on bus
    • G06F13/40Bus structure
    • G06F13/4004Coupling between buses
    • G06F13/4027Coupling between buses using bus bridges
    • G06F13/4031Coupling between buses using bus bridges with arbitration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/20Administration of product repair or maintenance

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Computer Hardware Design (AREA)
  • Databases & Information Systems (AREA)
  • Quality & Reliability (AREA)
  • Entrepreneurship & Innovation (AREA)
  • General Business, Economics & Management (AREA)
  • Tourism & Hospitality (AREA)
  • Strategic Management (AREA)
  • Operations Research (AREA)
  • Data Mining & Analysis (AREA)
  • Marketing (AREA)
  • Economics (AREA)
  • Hardware Redundancy (AREA)

Abstract

The invention discloses a high-availability host system, which comprises a first host module and a second host module which are arranged in different buildings, wherein the first host module is communicated and interconnected with the second host module through a pre-established physical corridor and a physical optical fiber arranged in a vertical shaft by adopting a preset deployment mode, the first host module and the second host module form a set of parallel coupling body, and if any one of the first host module and the second host module fails, the host module which does not fail takes over the service operation work of the host module which fails. Through the scheme, when any one of the first host module and the second host module breaks down, a plurality of sets of parallel coupling body environments do not need to be built in different buildings to execute business operation work, and the host module which does not break down only needs to take over the business operation work of the host module which breaks down, so that the purposes of reducing operation and maintenance cost and meeting high-availability operation requirements in a fault scene are achieved.

Description

High-availability host system
Technical Field
The present invention relates to the technical field of International Business Machines (IBM) Z mainframe deployment, and more particularly, to a highly available host system.
Background
In recent years, with the rapid development of internet technology, the application of high-availability technology in various fields is more and more extensive, the scale of high-availability clusters is also larger, and the number of hosts of the high-availability clusters is increased from several to dozens or even hundreds.
In order to achieve service continuity, a conventional host generally adopts virtualization technology, parallel computing and load balancing technology, and the like for high-availability deployment. In a specific high availability deployment scheme, hardware devices of all host platforms are deployed in the same host module, or hardware devices of all host platforms are deployed in adjacent host modules in the same building machine room. However, such a deployment scheme cannot cope with a situation that a fault scene that cannot be repaired in a short period of time, such as an interruption of power of the whole building and a collapse of the building, cannot be met, and thus cannot meet a high available operation requirement. Although the continuity requirement of the service can be supported by constructing a plurality of sets of parallel coupling body environments in a plurality of sets of buildings, the operation and maintenance cost is high due to software and hardware purchase of a host platform and the like.
Therefore, the existing high-availability host deployment scheme cannot meet the high-availability operation requirement in the fault scene, and the operation and maintenance cost is high.
Disclosure of Invention
The embodiment of the invention discloses a high-availability host system, which achieves the purposes of reducing operation and maintenance cost and meeting high-availability operation requirements in a fault scene.
In order to achieve the purpose, the technical scheme is as follows:
the invention discloses a high-availability host system in a first aspect, which comprises a first host module and a second host module which are arranged in different buildings;
the first host module is in a preset deployment mode, and is in data interconnection with the second host module through a pre-established physical corridor and physical optical fibers arranged in a vertical shaft, the first host module and the second host module form a parallel coupling body, the first host module and the second host module are respectively deployed in two buildings which are separated by a distance of X, and the value of X is less than or equal to 150 meters;
if any one of the first host module and the second host module fails, the host module which does not fail takes over the service operation of the failed host module.
Preferably, the first host module adopts a preset deployment mode, and performs data interconnection with the second host module through a physical optical fiber arranged in a pre-established physical corridor and a pre-established physical optical fiber arranged in a vertical shaft, including:
the first host module is in a corridor top end deployment mode and a corridor bottom end deployment mode and is in data interconnection with the second host module through a physical optical fiber arranged in a pre-established physical corridor;
and the first host module is in a shaft deployment mode and is in data interconnection with the second host module through a physical optical fiber arranged in a pre-established shaft.
Preferably, the first host module comprises a first coupler, a first physical host, a third physical host, a K2 disk, a master clock server and a disk storage production master;
the master clock server is arranged on the first coupler;
the first coupler is respectively connected with the first physical host and the third physical host;
the K2 magnetic disk is connected with the first physical host;
and the magnetic disk storage production master is respectively connected with the first physical host and the third physical host.
Preferably, the second host module comprises a second coupler, a second physical host, a fourth physical host, a K1 magnetic disk, a standby clock server, an arbitration clock server and a magnetic disk storage production slave disk;
the standby clock server is arranged on the second coupler;
the arbitration clock server is arranged on the fourth physical host;
the second coupler is respectively connected with the second physical host and the fourth physical host;
the K1 magnetic disk is connected with the second physical host;
and the magnetic disk storage production slave disk is respectively connected with the second physical host and the fourth physical host.
Preferably, the first coupler is connected to the second coupler, and if the first coupler fails, the second coupler takes over service operation of the first coupler;
the first coupler is respectively connected with the first physical host, the second physical host, the third physical host and the fourth physical host;
the second coupler is respectively connected with the first physical host, the second physical host, the third physical host and the fourth physical host;
the disk storage production master is respectively connected with the first physical host, the second physical host, the third physical host and the fourth physical host;
the disk storage production slave disk is respectively connected with the first physical host, the second physical host, the third physical host and the fourth physical host;
the main clock server and the standby clock server are in data interconnection, and if the main clock server fails, the arbitration clock server sends a failure reminding instruction to the standby clock server to enable the standby clock server to take over the service operation work of the main clock server;
the disk storage production master is connected with the disk storage production slave through a disk mirror image, and if the disk storage production master fails, the disk storage production slave takes over the service operation work of the disk storage production master.
Preferably, the type structures of the first coupler and the second coupler are LOCK type structures, the first coupler is provided with a database global LOCK DB2 LOCK1 and a database component DB2 SCA, and the second coupler is provided with a global LOCK GRS.
Preferably, the type structures of the first coupler and the second coupler are both cache type structures, and the first coupler and the second coupler set up a plurality of database global buffer pools in an asynchronous duplex manner.
Preferably, the type structures of the first coupler and the second coupler are both list type structures, the first coupler sets a list type structure IXCSTR1, a list type structure IXCSTR3 and a list type structure IXCSTR5 in a symmetrical deployment manner, and the second coupler sets a list type structure IXCSTR2, a list type structure IXCSTR4 and a list type structure IXCSTR6 in a symmetrical deployment manner.
Preferably, a disk image hot-swap control system GDPS K2 is disposed in the first physical host.
Preferably, the second physical host is provided with a disk image hot-swap control system GDPS K1.
According to the technical scheme, the system comprises a first host module and a second host module which are arranged in different buildings, the first host module is in a preset deployment mode and is in communication interconnection with the second host module through a physical optical fiber arranged in a pre-established physical corridor and a vertical shaft, the first host module and the second host module form a set of parallel coupling body, and if any one of the first host module and the second host module fails, the host module which does not fail takes over the service operation work of the failed host module. Through the scheme, when any one of the first host module and the second host module breaks down, a plurality of sets of parallel coupling body environments do not need to be built in different buildings to execute business operation work, and the host module which does not break down only needs to take over the business operation work of the host module which breaks down, so that the purposes of reducing operation and maintenance cost and meeting high-availability operation requirements in a fault scene are achieved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
Fig. 1 is a schematic structural diagram of a high availability host system according to an embodiment of the present invention;
FIG. 2 is a schematic view of a first host module and a second host module deployed between buildings according to an embodiment of the present invention;
FIG. 3 is a schematic structural diagram of another high availability host system according to an embodiment of the present disclosure;
FIG. 4 is a schematic diagram of a first coupler and a second coupler arranged according to the type structure of the couplers according to the embodiment of the invention;
fig. 5 is a schematic structural diagram of another high availability host system according to an embodiment of the disclosure.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In this application, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The background art shows that the existing high-availability host deployment scheme cannot meet the high-availability operation requirement in fault scenes such as the whole building power interruption, the building collapse and the like, which cannot be repaired in a short period of time, and has high operation and maintenance cost.
In order to achieve the above object, an embodiment of the present invention discloses a high-availability host system, wherein when any one of a first host module and a second host module fails, a plurality of sets of parallel coupling environments do not need to be built in different buildings to execute service operation work, and only the host module which does not fail needs to take over the service operation work of the host module which fails, so as to achieve the purpose of reducing operation and maintenance cost and meeting high-availability operation requirements in a failure scene. The specific implementation is specifically illustrated by the following examples.
As shown in fig. 1, a schematic structural diagram of a high-availability host system disclosed in an embodiment of the present invention is shown, where the high-availability host system includes a first host module 1 and a second host module 2 disposed in different buildings, the first host module 1 and the second host module 2 are respectively disposed in two buildings separated by a distance X, a value of X is less than or equal to 150 meters, the first host module 1 is disposed in a first building 3, and the second host module is disposed in a second building 4.
The connection relationship and mutual data interaction process of the first host module 1 and the second host module 2 are described in detail as follows:
the first host module 1 is in a preset deployment mode, and is in data interconnection with the second host module 2 through a physical corridor established in advance and a physical optical fiber arranged in a vertical shaft, and the first host module 1 and the second host module 2 form a set of Parallel coupling bodies (Parallel SYPLEX).
The preset deployment mode is a corridor top deployment mode, a corridor bottom deployment mode and a vertical shaft deployment mode.
If any one of the first host module 1 and the second host module 2 fails, the host module which does not fail takes over the service operation of the failed host module.
The first host module 1 includes a first coupler 11, a first physical host 12, a third physical host 13, a K2 disk, a master clock server 14, and a disk storage production master 15, where PTS is the master clock server 14.
The first coupler 11, the first physical host 12, the third physical host 13, the K2 disk, the master clock server 14 and the disk storage production master 15 are connected as follows:
in the first host module 1, the first coupler 11 is in data connection with a first physical host 12 and a third physical host 13, respectively.
The first physical host 12 is connected to a K2 disk, a disk image hot-swap control system GDPS K2 is disposed in the first physical host 12, and the GDPS K2 is connected to the K2 disk in a data manner.
The z/OS operating system, DB2 database, and middleware CICS need to be installed on the logical partition of the first physical host 12.
The z/OS operating system, DB2 database, and middleware CICS need to be installed on the logical partition of the third physical host 13.
The master clock server 14 is arranged at the first coupler 11.
The disk storage production master 15 is connected to a first physical host 12, a second physical host 22, a third physical host 13, and a fourth physical host 23, respectively.
The second host module 2 includes a second coupler 21, a second physical host 22, a fourth physical host 23, a K1 magnetic disk, a standby clock server 24, a arbitrated clock server 25, and a disk storage production slave 26, where the BTS is the standby clock server 24.
In order to improve the availability of the clock servers required by the parallel coupling body, a main clock server 14 is arranged at a first coupler 11 in a first building 3, a standby clock server is arranged at a second physical host 22 in a second building 4, and an arbitration clock server 25 is arranged at a fourth physical host 23; the communication between the cross-building clock servers and the physical connection of the cross-building clock servers and the host modules are realized by adopting the same mechanism.
The second physical host 22 is connected with the K1 disk, a disk image hot-swap control system GDPS K1 is arranged in the second physical host 22, and GDPS K1 is in data connection with the K1 disk.
The z/OS operating system, DB2 database, and middleware CICS need to be installed on the logical partition of the second physical host 22.
The standby clock server 24 is provided to the second coupler 21.
The arbitrated clock server 25 is located at the fourth physical host 23.
The second coupler 21, the second physical host 22, the fourth physical host 23, the standby clock server 24, the arbitrated clock server 25, and the disk storage production slave 26 are connected as follows:
in the second host module 2, the second coupler 21 is in data connection with the second physical host 22 and the fourth physical host 23, respectively.
The z/OS operating system, DB2 database, and middleware CICS need to be installed on the logical partition of the fourth physical host 23.
The disk storage production slave 26 is connected to the first physical host 12, the second physical host 22, the third physical host 13, and the fourth physical host 23, respectively.
The two mutually-spare first couplers 11 and second couplers 21 are respectively installed in different host modules of two buildings, and then STRUCTUREs (STRUCTUREs) used for data interaction in cross-logic partitions of the parallel coupling body are uniformly distributed in the two couplers across the buildings according to loads and functions, so that the problem that physical hardware of the parallel coupling body is completely unavailable in a building-level fault scene is solved.
The disk storage production master 15 and the disk storage production slave 26 are respectively deployed in two different buildings, so that the unavailability of all disk storage devices when a building-level fault occurs is prevented. Normally, I/O and data access services are provided externally by the disk storage production master 15, and the disk storage production slave 26 is in a real-time mirroring state. When the disk storage production master 15 fails or the building where the disk storage production master 15 is located fails, the disk mirroring hot-switching system completes automatic switching between the disk storage production master 15 and the disk storage production slave 26, and the disk storage production slave 26 continues to provide data access service to the outside.
The connection relationship and data interaction process between each device in the first host module 1 and each device in the second host module 2 are as follows:
the first coupler 11 is connected to the second coupler 21, and if the first coupler 11 fails, the second coupler 21 takes over the service operation of the first coupler 11.
The first coupler 11 is connected to a first physical host 12, a second physical host 22, a third physical host 13, and a fourth physical host 23, respectively.
The second coupler is connected to the first physical host 12, the second physical host 22, the third physical host 13, and the fourth physical host 23, respectively.
When the type structures of the first coupler 11 and the second coupler 21 are both LOCK type structures, the database global LOCK DB2 LOCK1 and the database component DB2 SCA are set in the first coupler 11, and the operating system global LOCK (GRS) is set in the second coupler 21.
When the type structures of the first coupler 11 and the second coupler 21 are both buffer type structures, the first coupler 11 and the second coupler 21 set a plurality of data global buffer pools (DB2 global buffer pool, DB2 GBP) such as database global buffer pool DB2 GBP1, database global buffer pool DB2 GBP2, database global buffer pool DB2 GBP3, and the like, in an asynchronous Duplex (Asynchronize Duplex) manner.
When the type structures of the first coupler 11 and the second coupler 21 are both list type structures, the first coupler 11 sets a list type structure ixstr 1, a list type structure ixstr 3 and a list type structure ixstr 5 in a symmetrical deployment manner, and the second coupler 21 sets a list type structure ixstr 2, a list type structure ixstr 4 and a list type structure ixstr 6 in a symmetrical deployment manner.
The disk storage production master 15 is in data communication with the first physical host 12, the second physical host 22, the third physical host 13, and the fourth physical host 23, respectively.
The disk storage production slave disks 26 are in data communication with the first physical host 12, the second physical host 22, the third physical host 13, and the fourth physical host 23, respectively.
Standby clock server 24 is connected to arbitrated clock server 25.
The main clock server 14 and the standby clock server 24 are in data interconnection, and if the main clock server 14 fails, the arbitration clock server 25 sends a failure reminding instruction to the standby clock server 24, so that the standby clock server 24 takes over the service operation of the main clock server 14.
The disk storage production master 15 is connected with the disk storage production slave 26 through a disk mirror image, and if the disk storage production master 15 fails, the disk storage production slave 26 takes over the service operation work of the disk storage production master 15.
According to the scheme, the availability problem of the host module in a fault scene can be solved on the premise of not increasing the purchase quantity of hardware and software of the infrastructure in the conventional host module, the high availability of the system is further improved, and the operation and maintenance service level of the IT infrastructure of the host platform is improved with the minimum cost investment.
In the embodiment of the invention, when any one of the first host module and the second host module has a fault, a plurality of sets of parallel coupling body environments do not need to be built in different buildings to execute business operation work, and the host module which does not have the fault only needs to take over the business operation work of the host module which has the fault, so that the purposes of reducing operation and maintenance cost and meeting high-availability operation requirements in a fault scene are achieved.
For convenience of understanding, the process of data interconnection between the first host module 1 and the second host module 2 through the pre-established physical corridor and the physical optical fiber arranged in the vertical shaft by using the preset deployment mode is described with reference to fig. 2:
the first host module 1 arranged in the first building 3 adopts a corridor top deployment mode and a corridor bottom deployment mode, and performs data interconnection with the second host module 2 arranged in the second building 4 through a physical optical fiber arranged in a pre-established physical corridor.
The first host module 1 is in a shaft deployment mode and is in data interconnection with the second host module 2 through a physical optical fiber arranged in a pre-established shaft.
The shaft deployment mode can be an underground deployment mode for arranging physical optical fibers.
In the embodiment of the invention, in order to realize interconnection between hardware and software in different host computer modules between buildings, a physical connecting profile and a vertical shaft are configured between the buildings, physical optical fibers are laid in the physical connecting profile and the vertical shaft to interconnect building-spanning equipment, in order to ensure high efficiency and reliability of communication between the building-spanning equipment, the optical fiber equipment in the building connecting profile adopts a scheme of redundant deployment, optical fiber deployment is respectively carried out on a ceiling and a floor of the building connecting profile, and meanwhile, a third path of optical fiber connection is laid in the vertical shafts between different buildings, so that when a physical corridor has a fault, the communication connection between the building-spanning physical equipment is still ensured, and the reliability of equipment communication between different host computer modules between the buildings is realized.
As shown in fig. 3, in order to install the hardware devices required for operating the parallel coupling body in the first host module 1 in the first building 3 and the second host module 2 in the second building 4, the process of specifically operating the hardware devices and the logical partitions required for the parallel coupling body is as follows:
in fig. 3, each of the physical hosts (the first physical host 12, the second physical host 22, the third physical host 13, and the fourth physical host 23) is divided into Logical Partitions (LPARs), each of which becomes a minimum unit of work that can independently work, and each of the logical partitions is installed with different software according to a function to be provided. For example, in order to implement business operation functions such as account inquiry, update and the like which are common in banks, a z/OS operating system, a DB2 database, a middleware CICS and the like need to be installed on each logical partition of each physical host.
The first physical host 12 is provided with a disk image hot-switch control system GDPS K2.
The second physical host 22 is provided with a disk image hot-swap control system GDPS K1.
The z/OS operating system, DB2 database, and middleware CICS are installed on the logical partition of the first physical host 12.
The z/OS operating system, DB2 database, and middleware CICS are installed on a logical partition of the second physical host 22.
The z/OS operating system, DB2 database, and middleware CICS are installed on the logical partition of the third physical host 13.
The z/OS operating system, DB2 database, and middleware CICS are installed on the logical partition of the fourth physical host 23.
And adjusting key roles of the operation of the parallel coupling body and configuration parameters of the operating system, wherein the key roles comprise a clock server setting mode, an operating system logic partition placing mode, a structure distribution mode in the coupler, the installation position of a disk mirror image hot switching control system, master-slave role setting of disk storage equipment and the like.
The role of the main clock server 14(PTS) is set, the main clock server 14 is arranged in the first building 3, the standby clock server 24(BTS) and the arbitration clock server 25 are arranged in the second building 4, and by the arrangement scheme, when the first building 3 breaks down, the standby clock server 24 and the arbitration clock server 25 in the second building 4 automatically take over the function of the main clock server 14 through the standby clock server 24 after common negotiation, so that the clock service of the parallel coupling body is ensured.
The master clock server 14 is provided on the first coupler 11 of the first building 3, the slave clock server 24 is provided on the second coupler 21 of the second building 4, and the arbitrated clock server 25 is provided on the fourth physical host 23 of the second building 4.
Hardware devices in the two buildings are deployed and installed in an equivalent mutual backup mode, the number, the types and the supported functions of the physical devices deployed in the host computer modules in the two buildings are required to be consistent, and the device models are the same as much as possible, so that the problem that the mutual backup compatibility is poor and the high availability switching cannot be realized due to slight difference of functions among different models of the devices is solved.
If any one of the first host module 1 and the second host module 2 fails, the host module which does not fail takes over the service operation of the failed host module.
In the embodiment of the invention, the host modules required by the high-availability host system are built into a plurality of host modules in a plurality of buildings spanning different physical positions from a single host module in the original same building, and when one building or host module has a fault, the host module in the other building can support the normal operation of the system, thereby solving the problem that the service continuity requirement cannot be met in the building-level fault scene when a single building operates.
For the sake of easy understanding of the above-described type of structure of the first coupler 11 and the second coupler 21, the arrangement distribution pattern in the first coupler 11 and the second coupler 21 is set as shown in fig. 4.
In fig. 4, when the type structures of the first coupler 11 and the second coupler 21 are both LOCK type structures, the database global LOCK DB2 LOCK1 and the database component DB2 SCA are set in the first coupler 11, and the global LOCK GRS is set in the second coupler 21.
For the LOCK type structure used by database DB2, global LOCK1 and SCA components should be deployed in the same coupler to meet the strong consistency requirements of DB2 on data.
In order to solve the performance problems of LOCK1 and SCA in a cross-building access scene, an asynchronous duplex deployment mode is started to realize high availability requirements, and the specific process is as follows:
when the type structures of the first coupler 11 and the second coupler 21 are both cache type structures, the first coupler 11 and the second coupler 21 set a plurality of database global buffer pools, such as the database global buffer pool DB2 GBP1, the database global buffer pool DB2 GBP2, the database global buffer pool DB2 GBP3, and the like, in an asynchronous duplex manner.
The type structures of the first coupler 11 and the second coupler 21 are both cache type structures, and are deployed in a duplex manner in parallel coupling bodies in different buildings so as to improve access efficiency and availability.
When the type structures of the first coupler 11 and the second coupler 21 are both list type structures, the first coupler 11 sets a list type structure ixstr 1, a list type structure ixstr 3 and a list type structure ixstr 5 in a symmetrical deployment manner, and the second coupler 21 sets a list type structure ixstr 2, a list type structure ixstr 4 and a list type structure ixstr 6 in a symmetrical deployment manner.
The type structures of the first coupler 11 and the second coupler 21 are both list type structures, and the list type structures in the two couplers arranged in different buildings are symmetrically arranged, and the list type structures in the two couplers arranged in different buildings are consistent in arrangement quantity.
In the embodiment of the present invention, the arrangement distribution manner in the first coupler 11 and the second coupler 21 is set by different types of structures of the first coupler 11 and the second coupler 21, so as to improve access efficiency and high availability.
In order to facilitate understanding of the above-mentioned process of deploying the disk storage production master 15 and the disk storage production slave 26 in two different buildings respectively, fig. 5 is combined with the above-mentioned embodiment of fig. 1.
In fig. 5, the disk storage production master 15 is in data communication with the first physical host 12, the second physical host 22, the third physical host 13, and the fourth physical host 23, respectively.
The disk storage production slave disks 26 are in data communication with the first physical host 12, the second physical host 22, the third physical host 13, and the fourth physical host 23, respectively.
The first physical master 12 is provided with a GDPS K2 system, and the GDPS K2 system is expected to be a GDPS slave control system (alternate master control system).
The K2 disk is in data connection with the GDPS K2 system.
The second physical host 22 is provided with a GDPS K1 system, and a GDPS K1 system is expected to be a GDPS master control system (master control system).
The K1 disk is in data connection with the GDPS K1 system.
The operating system disk volumes of the GDPS K1 system and the GDPS K2 system adopt separate disk storage, the operating system disk volumes of the GDPS K1 system and the GDPS K2 system are separated from the disk bodies of the disk storage production master disk 15 and the disk storage production slave disk 26, and the GDPS K1 system and the GDPS K2 system are decoupled from the production disk, so that the high availability of the GDPS K1 system and the GDPS K2 system is improved when the production master disk or the production slave disk fails.
If the optical connection between the partition of the GDPS K1 system and the GDPS K2 system and the optical connection between the logical partition of the production system and the disk storage are at least separated, different physical connection channels are used for connecting the optical connection channels to the disk cabinet, and the separation is realized at the optical fiber physical connection layer.
The disk storage production master 15 is connected with the disk storage production slave 26 through a disk mirror image, and if the disk storage production master 15 fails, the disk storage production slave 26 takes over the service operation work of the disk storage production master.
In the embodiment of the invention, the disk storage production master 15 and the disk storage production slave 26 are respectively deployed in two different buildings, so that the unavailability of all disk storage devices is prevented when building-level faults (such as building collapse and the like) occur. When the disk storage production master 15 fails or the building where the disk storage production master 15 is located fails, the disk mirror image hot switching system completes automatic switching between the disk storage production master 15 and the disk storage production slave 26, and the disk storage production slave 26 continues to provide data access service to the outside.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims (10)

1. A highly available host system, the system comprising a first host module and a second host module disposed in different buildings;
the first host module is in a preset deployment mode, and is in data interconnection with the second host module through a pre-established physical corridor and physical optical fibers arranged in a vertical shaft, the first host module and the second host module form a parallel coupling body, the first host module and the second host module are respectively deployed in two buildings which are separated by a distance of X, and the value of X is less than or equal to 150 meters;
if any one of the first host module and the second host module fails, the host module which does not fail takes over the service operation of the failed host module.
2. The system of claim 1, wherein the first host module is configured to perform data interconnection with the second host module through a pre-established physical corridor and a physical optical fiber disposed in a vertical shaft in a pre-determined deployment manner, and the data interconnection includes:
the first host module is in a corridor top end deployment mode and a corridor bottom end deployment mode and is in data interconnection with the second host module through a physical optical fiber arranged in a pre-established physical corridor;
and the first host module is in a shaft deployment mode and is in data interconnection with the second host module through a physical optical fiber arranged in a pre-established shaft.
3. The system of claim 1, wherein the first host module comprises a first coupler, a first physical host, a third physical host, a K2 disk, a master clock server, and a disk storage production master;
the master clock server is arranged on the first coupler;
the first coupler is respectively connected with the first physical host and the third physical host;
the K2 magnetic disk is connected with the first physical host;
and the magnetic disk storage production master is respectively connected with the first physical host and the third physical host.
4. The system of claim 3, wherein the second host module comprises a second coupler, a second physical host, a fourth physical host, a K1 disk, a standby clock server, an arbitrated clock server, and a disk storage production slave;
the standby clock server is arranged on the second coupler;
the arbitration clock server is arranged on the fourth physical host;
the second coupler is respectively connected with the second physical host and the fourth physical host;
the K1 magnetic disk is connected with the second physical host;
and the magnetic disk storage production slave disk is respectively connected with the second physical host and the fourth physical host.
5. The system of claim 4, wherein the first coupler is connected to the second coupler, and the second coupler takes over service operation of the first coupler if the first coupler fails;
the first coupler is respectively connected with the first physical host, the second physical host, the third physical host and the fourth physical host;
the second coupler is respectively connected with the first physical host, the second physical host, the third physical host and the fourth physical host;
the disk storage production master is respectively connected with the first physical host, the second physical host, the third physical host and the fourth physical host;
the disk storage production slave disk is respectively connected with the first physical host, the second physical host, the third physical host and the fourth physical host;
the main clock server and the standby clock server are in data interconnection, and if the main clock server fails, the arbitration clock server sends a failure reminding instruction to the standby clock server to enable the standby clock server to take over the service operation work of the main clock server;
the disk storage production master is connected with the disk storage production slave through a disk mirror image, and if the disk storage production master fails, the disk storage production slave takes over the service operation work of the disk storage production master.
6. The system of claim 5, wherein the type structures of the first coupler and the second coupler are LOCK type structures, the first coupler has a database global LOCK DB2 LOCK1 and a database component DB2 SCA, and the second coupler has a global LOCK GRS.
7. The system according to claim 5, wherein the type structures of the first coupler and the second coupler are both buffer type structures, and the first coupler and the second coupler set up a plurality of database global buffer pools in an asynchronous duplex manner.
8. The system of claim 5, wherein the type structures of the first coupler and the second coupler are both list type structures, the first coupler is configured to set the list type structure IXCSTR1, the list type structure IXCSTR3 and the list type structure IXCSTR5 by symmetrical deployment, and the second coupler is configured to set the list type structure IXCSTR2, the list type structure IXCSTR4 and the list type structure IXCSTR6 by symmetrical deployment.
9. The system according to claim 5, wherein a disk image hot-swap control system GDPS K2 is disposed in the first physical host.
10. The system according to claim 5, wherein a disk image hot-swap control system GDPS K1 is provided in the second physical host.
CN202110591338.9A 2021-05-28 2021-05-28 High-availability host system Active CN113220519B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110591338.9A CN113220519B (en) 2021-05-28 2021-05-28 High-availability host system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110591338.9A CN113220519B (en) 2021-05-28 2021-05-28 High-availability host system

Publications (2)

Publication Number Publication Date
CN113220519A true CN113220519A (en) 2021-08-06
CN113220519B CN113220519B (en) 2024-04-16

Family

ID=77099068

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110591338.9A Active CN113220519B (en) 2021-05-28 2021-05-28 High-availability host system

Country Status (1)

Country Link
CN (1) CN113220519B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115378464A (en) * 2022-08-12 2022-11-22 江苏德是和通信科技有限公司 Synthetic switched systems of transmitter owner spare machine

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0869586A (en) * 1994-08-30 1996-03-12 Nippon Steel Corp Building automation system
JPH10190716A (en) * 1996-12-24 1998-07-21 Matsushita Electric Works Ltd Distributed integrated wiring system
KR20000056861A (en) * 1999-02-26 2000-09-15 문성주 Building automation system
EP2077500A1 (en) * 2007-12-28 2009-07-08 Bull S.A.S. High-availability computer system
CN102325192A (en) * 2011-09-30 2012-01-18 上海宝信软件股份有限公司 Cloud computing implementation method and system
CN107592159A (en) * 2017-09-29 2018-01-16 深圳达实智能股份有限公司 A kind of intelligent building optical network system and optical network apparatus
CN108351823A (en) * 2015-10-22 2018-07-31 Netapp股份有限公司 It realizes and automatically switches
CN108416957A (en) * 2018-03-20 2018-08-17 安徽理工大学 A kind of building safety monitoring system
US20190158309A1 (en) * 2017-02-10 2019-05-23 Johnson Controls Technology Company Building management system with space graphs
CN111478450A (en) * 2020-06-24 2020-07-31 南京长江都市建筑设计股份有限公司 Building electrical comprehensive monitoring protection device

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0869586A (en) * 1994-08-30 1996-03-12 Nippon Steel Corp Building automation system
JPH10190716A (en) * 1996-12-24 1998-07-21 Matsushita Electric Works Ltd Distributed integrated wiring system
KR20000056861A (en) * 1999-02-26 2000-09-15 문성주 Building automation system
EP2077500A1 (en) * 2007-12-28 2009-07-08 Bull S.A.S. High-availability computer system
CN102325192A (en) * 2011-09-30 2012-01-18 上海宝信软件股份有限公司 Cloud computing implementation method and system
CN108351823A (en) * 2015-10-22 2018-07-31 Netapp股份有限公司 It realizes and automatically switches
US20190158309A1 (en) * 2017-02-10 2019-05-23 Johnson Controls Technology Company Building management system with space graphs
CN107592159A (en) * 2017-09-29 2018-01-16 深圳达实智能股份有限公司 A kind of intelligent building optical network system and optical network apparatus
CN108416957A (en) * 2018-03-20 2018-08-17 安徽理工大学 A kind of building safety monitoring system
CN111478450A (en) * 2020-06-24 2020-07-31 南京长江都市建筑设计股份有限公司 Building electrical comprehensive monitoring protection device

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115378464A (en) * 2022-08-12 2022-11-22 江苏德是和通信科技有限公司 Synthetic switched systems of transmitter owner spare machine
CN115378464B (en) * 2022-08-12 2023-08-15 江苏德是和通信科技有限公司 Main and standby machine synthesis switching system of transmitter

Also Published As

Publication number Publication date
CN113220519B (en) 2024-04-16

Similar Documents

Publication Publication Date Title
US9460183B2 (en) Split brain resistant failover in high availability clusters
CN104794028B (en) A kind of disaster tolerance processing method, device, primary data center and preliminary data center
CN102325192B (en) Cloud computing implementation method and system
US8539087B2 (en) System and method to define, visualize and manage a composite service group in a high-availability disaster recovery environment
CN101227315B (en) Dynamic state server colony and control method thereof
CN110912991A (en) Super-fusion-based high-availability implementation method for double nodes
CN109446169B (en) Double-control disk array shared file system
CN204859222U (en) With two high available systems that live of city data center
JP2003076592A (en) Data storage system
CN109474465A (en) A kind of method and system of the high availability that can dynamically circulate based on server cluster
CN104536971A (en) High-availability database
CN106506201A (en) VNF moving methods, MANO and system
CN111949444A (en) Data backup and recovery system and method based on distributed service cluster
CN113220519B (en) High-availability host system
CN110719282B (en) Authentication dual-active system based on unified authority
JP3947381B2 (en) Method for processing unit synchronization for scalable parallel processing
US10891205B1 (en) Geographically dispersed disaster restart multitenancy systems
CN116974816A (en) Disaster backup system for remote double-node service
CN114706714A (en) Method for synchronizing computer memory division snapshots
CN104503871A (en) Implementation method based on full-redundancy model of small computer system
CN113626252A (en) City-level disaster recovery method and device based on cluster, electronic equipment and medium
CN112685234A (en) Financial-level two-place three-center high-availability MySQL database implementation method
JP2007094604A (en) Computer backup system for countermeasure against disaster
CN109995560A (en) Cloud resource pond management system and method
Chittigala Business Resiliency for Enterprise Blockchain Payment Systems

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant