CN104199740A - Non-tight-coupling multi-node multi-processor system and method based on system address space sharing - Google Patents

Non-tight-coupling multi-node multi-processor system and method based on system address space sharing Download PDF

Info

Publication number
CN104199740A
CN104199740A CN201410433320.6A CN201410433320A CN104199740A CN 104199740 A CN104199740 A CN 104199740A CN 201410433320 A CN201410433320 A CN 201410433320A CN 104199740 A CN104199740 A CN 104199740A
Authority
CN
China
Prior art keywords
node
address space
resource
shared
server
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201410433320.6A
Other languages
Chinese (zh)
Other versions
CN104199740B (en
Inventor
王恩东
胡雷钧
唐士斌
陈继承
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur Beijing Electronic Information Industry Co Ltd
Original Assignee
Inspur Beijing Electronic Information Industry Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur Beijing Electronic Information Industry Co Ltd filed Critical Inspur Beijing Electronic Information Industry Co Ltd
Priority to CN201410433320.6A priority Critical patent/CN104199740B/en
Publication of CN104199740A publication Critical patent/CN104199740A/en
Application granted granted Critical
Publication of CN104199740B publication Critical patent/CN104199740B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The invention provides a non-tight-coupling multi-node multi-processor system based on system address space sharing. The system comprises server nodes and inter-node control units. Resource sharing and communication interconnection between the server nodes are achieved through the inter-node control units. The invention further provides a data reading and writing method based on the system. Through the mode of system address space sharing, memory sharing and I/O sharing are achieved in the non-tight-coupling multi-node multi-processor system, and the local nodes are allowed to directly access memory resources and I/O resources of the remote nodes.

Description

No tight coupling multinode multicomputer system and the method for shared system address space
Technical field
The present invention relates to distributed computer processing technology field, be specifically related to no tight coupling multinode multicomputer system and the method for shared system address space.
Background technology
Along with developing rapidly of the fields such as shopping at network, search, Internet of Things and data mining, the required data volume to be processed of data center sharply increases rapidly, the data total amount that China in 2013 produces surpasses 0.8ZB, and 2 times to 2012, is equivalent to the data total amount in the whole world in 2009.The required data total amount to be processed of main flow Internet firm is all in PB level at present, and on the large data conferencing of 2013, the data total amount of Alibaba is at 20PB, and the data total amount of company of Tengxun is at 100PB.
In the face of the Data Growth of explosion type, the extensibility of data center has faced great challenge.Scale-up (longitudinally dilatation) is two kinds of principal modes of current data center dilatation with Scale-out (extending transversely).Scale-up is to more powerful CPU, internal memory, network and other device extension, Scale-out be by distributed algorithm by one by one independently low-cost server node form a large and strong system.Compare with Scale-up, Scale-out dilatation is simpler, cost is lower, becomes gradually the main flow framework of Future Data center development.
Yet adopt the data center of Scale-out architecture, between server node, adopt the organizational form of loose coupling, independent each other, can not shared computation resource, storage resources and I/O resource.And current distributed algorithm is difficult to accomplish calculation task is uniformly distributed between a large amount of server nodes, and the distribution of unbalanced task easily causes part server node load weight, and this part node becomes the performance bottleneck of whole system.
In order to address the above problem, researchist has proposed the solution of shared drive and shared I/O between server node, and wherein representational is " inner server " and " virtual I/O "." inner server " refers to the node of data center is divided into " computing node " and " memory service node ", the latter when the former produces page fault for it provides internal memory." virtual I/O " is by register and the internal memory of Simulation with I/O equipment, and the access of capturing operation system to IO port and register, then by the form of software or hardware, carrys out I/O equipment that is virtually reality like reality.
Yet above-mentioned solution is all more unilateral, do not form the solution of a system level.For this reason, the present invention proposes a kind of no tight coupling multinode multicomputer system and method for shared system address space, for the internal memory between the multiserver node of no tight coupling, share with I/O and share and proposed unified solution.
Summary of the invention
The no tight coupling multinode multicomputer system building method that proposes a kind of shared system address space, comprising:
Server node, for managing local computing resource, memory source and I/O resource;
Control module between node, looks for realize the overall situation between a plurality of server nodes of no tight coupling
The system address space of figure is shared;
Wherein, between server node, by control module between node, realize resource sharing and communication interconnect.
Especially, described server node comprises at least one processor, at least one Memory control module, and at least one I/O control module.
Especially, between described node, control module comprises:
The network interface being connected with each server node;
System address space mapping block, for realizing the conversion between the system address of shared resource of server node and the physical address of the local real resources of server node;
Support the latching operation administration module of atomic instructions, for realize shared lock mechanism between the server node of no tight coupling;
Internet message modular converter, for the message conversion between server node.
Especially, in described system address space mapping block, set up and have a global system Address space mappinD relation table and an intra-node resource mapping table;
Described global system Address space mappinD relation table comprises MMIO address space, and described MMIO address space has recorded the MMIO address of the shared resource of Servers-all node;
Described intra-node resource mapping table comprises the physical address space of the local real resources of the shared resource that is mapped in the described server node in MMIO address space.
Especially, all shared resources of same server node are mapped in one section of continuous MMIO address space.
Especially, the latching operation administration module of described support atomic instructions, for realizing the mutually exclusive operation between a plurality of server nodes.
Especially, described local real resources is described server node memory source and I/O resource.
A data access method for the no tight coupling multinode multicomputer system in shared address space, comprising:
S1: local node, according to the MMIO address space information of configuration, sends data access request;
S2: described data access request is forwarded to controller ENC between node;
S3: described ENC, according to the MMIO address space information of configuration, is forwarded to corresponding remote node by external interconnect network by described data access request;
S4: corresponding remote node is received after described data access request, determines the address of the real resources of this remote node inside that described data access request will be accessed according to the intra-node resource address spatial mappings table of configuration.
The invention has the beneficial effects as follows: by the mode of shared system address space, in the multinode multicomputer system of no tight coupling, realized internal memory shared shared with I/O, allowed local node directly to access memory source and the I/O resource of remote node.
Accompanying drawing explanation
Fig. 1 is the no tight coupling multinode multicomputer system of a kind of shared system address space of proposing of the present invention.
Fig. 2 is the Address space mappinD graph of a relation of the shared system address space that proposes of the present invention.
Fig. 3 is control module schematic diagram between the node that proposes of the present invention.
Fig. 4 is the intra-node resource address spatial mappings table that the present invention proposes.
Fig. 5 is intra-node atomic instructions application latching operation process flow diagram.
Fig. 6 is the address mapping schematic flow sheet that the present invention proposes.
Fig. 7 is the schematic flow sheet of the application global lock operation that proposes of the present invention.
Embodiment
In order to make object of the present invention, technical scheme clearer, provide the specific embodiment of the present invention below, by reference to the accompanying drawings and embodiment the present invention is further elaborated.Should be appreciated that specific embodiment described herein is only in order to explain the present invention.
Embodiment mono-:
The embodiment mono-that the present invention proposes proposes a kind of no tight coupling multinode multicomputer system of shared system address space.Fig. 1 has provided the structural drawing of described system.In the drawings, intra-node is connected by node internal controller INC (Intra Node Controller), and each INC can connect at least one CPU.Each node is an independently integral body, has independently computational resource, memory source and I/O resource, can move independently operating system.Between node, adopt no tight coupling form tissue, independent each other, between node, cannot directly realize sharing of memory source and I/O resource.
Described node can be server node, and this server node, for executive operating system, is managed local computational resource, memory source and I/O resource.Server node comprises at least one processor, at least one Memory control module and/or at least one I/O control module.
In order to realize sharing of memory source and I/O resource between node, described system also comprises controller ENC between node (External Node Controller).Each node is connected with dedicated interconnection network by controller ENC between node, forms extendible distributed computing system.The more important thing is, between node, by ENC, can realize sharing of memory source and I/O resource, concrete methods of realizing will be described in follow-up word.
First referring to Fig. 2, Fig. 2 shows shared system Address space mappinD graph of a relation.The mode of the present invention by shared system address space realizes sharing of memory source and I/O resource between node.Wherein, memory address and the I/O address space being shared in all remote nodes shone upon in MMIO region; In local node and remote node, the memory address space being shone upon by MMIO region, need to be distributed in nonconforming region of memory Non-coherent Memory.
For instance, non-coherent Memory in Node_0 is real internal memory, and MMIO address space record is only mapping relations, the access that all nodes send by MMIO address space all can drop on real internal memory and I/O, address by global map table (the one section of MMIO space) Node_0 that conducts interviews, finally can be transformed into the access to non-coherent Memory.Based on this mode, local node, by the address of access MMIO address space, is realized the object of access remote node memory source and I/O resource.
Fig. 3 has provided the structural representation of control module ENC between node.Between node, control module is vitals of the present invention, comprises the network interface that the node separate with each is connected, system address space mapping block, internet message administration module and the latching operation administration module of supporting atomic instructions.
1. system address space mapping block
System address space mapping block, for the address translation in remote node system address space is become to the address in local system address space, making can cross-system field communication between two nodes.Based on this module, the shared resource in the multi-node system of whole no tight coupling can be set up to a global view, as shown in Figure 2.Based on this global view, between the node of no tight coupling, can realize the resource sharing that could realize under tight coupling organizational form.
System address space mapping block is for setting up the global view (Fig. 2 illustrates) of shared system address space, the shared resource of all remote nodes is mapped in local MMIO address space, the address space in different MMIO region is not overlapping, and all shared resources of same node are mapped in one section of continuous MMIO space.The request that this module is accessed local node shared resource by remote node by MMIO address simultaneously converts discernible Address requests in node to.
Described system address space mapping block is also set up an intra-node resource mapping table.In intra-node resource mapping table, by MMIO Address space mappinD corresponding to local node in global view, be the real address space of local resource, i.e. the MMIO address of memory address and I/O equipment.
Fig. 4 has provided intra-node resource address spatial mappings table.Wherein, the left side of table is continuous MMIO address space corresponding to local node in global address space's view, and the right-hand part of table is corresponding to the local memory source of continuous N MIO address space or the local address of I/O resource.
2. support the latching operation administration module of atomic instructions
Support the latching operation administration module of atomic instructions, for realize shared lock mechanism between two nodes of no tight coupling, the sharing operation of conflicting between two nodes can be carried out in mutual exclusion.
In order to illustrate that in this patent, the atomic instructions for shared address is carried out flow process, the first review execution of the atomic instructions in node once flow process in patent.At system Atom command request, twice accessing operation in instruction will order complete in the situation that the accessing operation that conflicts without the external world disturbs, so the execution flow process of atomic instructions is divided into three steps: application global lock, execution accessing operation, release global lock.
First, as shown in Figure 5, wherein Core is that processor core, LLC are the cache controller of afterbody buffer memory to the process of application global lock, and Config Agent is Configuration Agent, and Quiescent Master is static controller.In the process of application global lock, first Core sends application, through LLC and Config Agent, arrives Quiescent Master; Quiescent Master sends the application (StopReq1) that stops request to all processors in node; After Quiescent Master receives that the confirmation of all processors is replied, Quiescent Master sends to I/O agency the application (StopReq2) that stops request; After receiving that all confirmations are replied, Quiescent Master replys acknowledge message to the Core of application latching operation.After Core receives acknowledge message, carry out accessing operation and discharge global lock operation, discharge global lock operation similar to application latching operation flow process.
In the multinode multicomputer system of no tight coupling of sharing global address space, for the atomic instructions of shared address, carry out the execution flow process of the instruction atomic instructions in flow process and node, application I/O stop ask there is very large difference in (StopReq2) process.In order to guarantee that the action scope of this atomic instructions is whole system, but not certain intra-node, when latching operation administration module is received the application that stops I/O request, this request meeting is broadcast to the latching operation manager of all nodes by the latching operation administration module in ENC.
In the process of multinode application global lock, avoiding deadlock is the problem that first will consider, in this patent for fear of deadlock, some nodes in system are elected as global lock operational administrative node (Lock Manager), the node of all application global locks all will send request to the latching operation administration module of global lock operational administrative node, then by this node, to the latching operation manager of other nodes in system, is sent the request of application global lock.
Latching operation administration module in node, while receiving the application that stops I/O request (StopReq2), the request that latching operation administration module can send global lock between application node to global lock operational administrative node Lock Manager, Lock Manager is after the request of receiving, forward the request to the latching operation administration module of all other nodes, the latching operation manager of all nodes can operate by the global lock in this intra-node application node, apply for that successfully backward Lock Manager replys acknowledge message, after Lock Manager collects all confirmation replies, node to global lock between application node is replied acknowledge message.After the latching operation manager of global lock between application node is received and confirm to be replied, to the Quiescent Master of this node, has replied and stopped the application that I/O asks.Quiescent Master replys acknowledge message to processor core, has completed the application process for the global lock of shared address.
3. internet message modular converter
Internet message modular converter for realizing the message conversion of interconnection network between intra-node interconnection network and node, is realized transparent communication between two networks.In the interactive information of the resource that node visit remote node is shared and application global lock process, mutual information all forwards between node by this module.
Embodiment bis-:
Fig. 6 has provided address mapping process flow diagram, while having described the shared resource of local node access remote node, and the flow path switch of accessed address.Idiographic flow is described below:
Step 1, local node, according to overall shared address space view, sends and reads (or modification) request of data.
Step 2, according to the address space configuration of local node, request is forwarded to ENC (controller between node).
Step 3, ENC (controller between node), according to overall shared address space view, arrives remote node by the request of external interconnect forwarded.
Step 4, remote node is received request, according to intra-node resource address spatial mappings table, the MMIO address translation of global view is become to address in local address space.
Embodiment tri-:
Fig. 7 has described the schematic flow sheet of application global lock operation.Concrete steps are as follows:
Step 1, when the latching operation administration module of local node is received the request of application lock, not carries out the request of answering but sending global lock between application node to the latching operation administration module of global lock operational administrative node Lock Manager immediately.
Step 2, the latching operation administration module of Lock Manager receives after request, the request of sending application node internal lock to the latching operation administration module of all nodes in view.
Step 3, the latching operation administration module of each node can be in its intra-node application for execution node latching operation, the latching operation administration module of then answering Lock Manager.
Step 4, the latching operation administration module of Lock Manager, after collecting all confirmation answers, is answered the latching operation administration module of initial application latching operation, allows it to carry out atom accessing operation.
Certainly; the present invention also can have other various embodiments; in the situation that not deviating from spirit of the present invention and essence thereof; those of ordinary skill in the art are when making according to the present invention various corresponding changes and distortion, but these corresponding changes and distortion all should belong to the protection domain of claim of the present invention.

Claims (8)

1. the no tight coupling multinode multicomputer system in shared address space, is characterized in that, comprising:
Server node, for managing local computing resource, memory source and I/O resource;
Control module between node, shares for realize the system address space of global view between a plurality of server nodes of no tight coupling;
Wherein, between server node, by control module between node, realize resource sharing and communication interconnect.
2. system according to claim 1, is characterized in that:
Described server node comprises at least one processor, at least one Memory control module, and at least one I/O control module.
3. system according to claim 1, is characterized in that:
Between described node, control module comprises:
The network interface being connected with each server node;
System address space mapping block, for realizing the conversion between the system address of shared resource of server node and the physical address of the local real resources of server node;
Support the latching operation administration module of atomic instructions, for realize shared lock mechanism between the server node of no tight coupling;
Internet message modular converter, for the message conversion between server node.
4. system according to claim 3, is characterized in that:
In described system address space mapping block, set up and have a global system Address space mappinD relation table and an intra-node resource mapping table;
Described global system Address space mappinD relation table comprises MMIO address space, and described MMIO address space has recorded the MMIO address of the shared resource of Servers-all node;
Described intra-node resource mapping table comprises the physical address space of the local real resources of the shared resource that is mapped in the described server node in MMIO address space.
5. system according to claim 4, is characterized in that: all shared resources of same server node are mapped in one section of continuous MMIO address space.
6. system according to claim 3, is characterized in that:
The latching operation administration module of described support atomic instructions, for realizing the mutually exclusive operation between a plurality of server nodes.
7. according to the system described in claim 3-6, it is characterized in that:
The memory source that described local real resources is described server node and I/O resource.
8. a data access method for the no tight coupling multinode multicomputer system in shared address space,
It is characterized in that, comprising:
S1: local node, according to the MMIO address space information of configuration, sends data access request;
S2: described data access request is forwarded to controller ENC between node;
S3: described ENC, according to the MMIO address space information of configuration, is forwarded to corresponding remote node by external interconnect network by described data access request;
S4: corresponding remote node is received after described reading out data request, determines the address of the real resources of this remote node inside that described data access request will be accessed according to the intra-node resource address spatial mappings table of configuration.
CN201410433320.6A 2014-08-28 2014-08-28 The no tight coupling multinode multicomputer system and method for shared system address space Active CN104199740B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410433320.6A CN104199740B (en) 2014-08-28 2014-08-28 The no tight coupling multinode multicomputer system and method for shared system address space

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410433320.6A CN104199740B (en) 2014-08-28 2014-08-28 The no tight coupling multinode multicomputer system and method for shared system address space

Publications (2)

Publication Number Publication Date
CN104199740A true CN104199740A (en) 2014-12-10
CN104199740B CN104199740B (en) 2019-03-01

Family

ID=52085037

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410433320.6A Active CN104199740B (en) 2014-08-28 2014-08-28 The no tight coupling multinode multicomputer system and method for shared system address space

Country Status (1)

Country Link
CN (1) CN104199740B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105404597A (en) * 2015-10-21 2016-03-16 华为技术有限公司 Data transmission method, device and system
CN106484626A (en) * 2015-08-31 2017-03-08 华为技术有限公司 The method of internal storage access, system and local node
CN106557427A (en) * 2015-09-25 2017-04-05 中兴通讯股份有限公司 The EMS memory management process and device of shared drive data base
WO2017063447A1 (en) * 2015-10-15 2017-04-20 华为技术有限公司 Computing apparatus, node device, and server
CN106598746A (en) * 2016-12-09 2017-04-26 北京奇虎科技有限公司 Method and device for achieving global lock in distributed system
CN110892387A (en) * 2017-07-14 2020-03-17 Arm有限公司 Memory node controller

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103049422A (en) * 2012-12-17 2013-04-17 浪潮电子信息产业股份有限公司 Method for building multi-processor node system with multiple cache consistency domains
CN103294611A (en) * 2013-03-22 2013-09-11 浪潮电子信息产业股份有限公司 Server node data cache method based on limited data consistency state
CN103327074A (en) * 2013-05-24 2013-09-25 浪潮电子信息产业股份有限公司 Designing method of global-cache-sharing tight coupling multi-control multi-active storage system
CN103870435A (en) * 2014-03-12 2014-06-18 华为技术有限公司 Server and data access method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103049422A (en) * 2012-12-17 2013-04-17 浪潮电子信息产业股份有限公司 Method for building multi-processor node system with multiple cache consistency domains
CN103294611A (en) * 2013-03-22 2013-09-11 浪潮电子信息产业股份有限公司 Server node data cache method based on limited data consistency state
CN103327074A (en) * 2013-05-24 2013-09-25 浪潮电子信息产业股份有限公司 Designing method of global-cache-sharing tight coupling multi-control multi-active storage system
CN103870435A (en) * 2014-03-12 2014-06-18 华为技术有限公司 Server and data access method

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106484626A (en) * 2015-08-31 2017-03-08 华为技术有限公司 The method of internal storage access, system and local node
CN106484626B (en) * 2015-08-31 2019-11-26 华为技术有限公司 Method, system and the local node of internal storage access
CN106557427A (en) * 2015-09-25 2017-04-05 中兴通讯股份有限公司 The EMS memory management process and device of shared drive data base
CN106557427B (en) * 2015-09-25 2021-11-12 中兴通讯股份有限公司 Memory management method and device for shared memory database
WO2017063447A1 (en) * 2015-10-15 2017-04-20 华为技术有限公司 Computing apparatus, node device, and server
US10366006B2 (en) 2015-10-15 2019-07-30 Huawei Technologies Co., Ltd. Computing apparatus, node device, and server
CN105404597A (en) * 2015-10-21 2016-03-16 华为技术有限公司 Data transmission method, device and system
WO2017067420A1 (en) * 2015-10-21 2017-04-27 华为技术有限公司 Data transmission method, equipment and system
CN105404597B (en) * 2015-10-21 2018-10-12 华为技术有限公司 Method, equipment and the system of data transmission
CN106598746A (en) * 2016-12-09 2017-04-26 北京奇虎科技有限公司 Method and device for achieving global lock in distributed system
CN110892387A (en) * 2017-07-14 2020-03-17 Arm有限公司 Memory node controller
CN110892387B (en) * 2017-07-14 2024-03-12 Arm有限公司 Memory node controller

Also Published As

Publication number Publication date
CN104199740B (en) 2019-03-01

Similar Documents

Publication Publication Date Title
CN104199740A (en) Non-tight-coupling multi-node multi-processor system and method based on system address space sharing
CN102103518B (en) System for managing resources in virtual environment and implementation method thereof
US8788760B2 (en) Adaptive caching of data
US7792916B2 (en) Management of cluster-wide resources with shared variables
US6510496B1 (en) Shared memory multiprocessor system and method with address translation between partitions and resetting of nodes included in other partitions
CN101901207B (en) Operating system of heterogeneous shared storage multiprocessor system and working method thereof
CN108696461A (en) Shared memory for intelligent network interface card
US9749445B2 (en) System and method for updating service information for across-domain messaging in a transactional middleware machine environment
CN103870435B (en) server and data access method
Essa et al. Mobile agent based new framework for improving big data analysis
EP3058690A1 (en) System and method for creating a distributed transaction manager supporting repeatable read isolation level in a mpp database
TW201005548A (en) Network virtualization in a multi-node system with multiple networks
CN102346460A (en) Transaction-based service control system and method
JP2017514239A (en) System and method for supporting common transaction identifier (XID) optimization and transaction affinity based on resource manager (RM) instance detection in a transaction environment
CN102479100A (en) Pervasive computing environment virtual machine platform and creation method thereof
CN114500623B (en) Network target range interconnection and intercommunication method, device, equipment and readable storage medium
Bosse et al. Distributed computing and reliable communication in sensor networks using multi-agent systems
US10652094B2 (en) Network traffic management for virtualized graphics devices
CN103412739A (en) Data transmission method and system based on seismic data processing
CN112104504B (en) Transaction management framework for large-scale resource access, design method and cloud platform
CN104951238A (en) Method and device for managing data storage in distributed virtual environment
CN112433826B (en) Hybrid heterogeneous virtualization communication method and chip
Guo et al. Mobile agent‐based service migration in mobile edge computing
US9501312B2 (en) Using compensation transactions for multiple one-phase commit participants
CN117056123A (en) Data recovery method, device, medium and electronic equipment

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant