CN104199740A

CN104199740A - Non-tight-coupling multi-node multi-processor system and method based on system address space sharing

Info

Publication number: CN104199740A
Application number: CN201410433320.6A
Authority: CN
Inventors: 王恩东; 胡雷钧; 唐士斌; 陈继承
Original assignee: Inspur Beijing Electronic Information Industry Co Ltd
Current assignee: Inspur Beijing Electronic Information Industry Co Ltd
Priority date: 2014-08-28
Filing date: 2014-08-28
Publication date: 2014-12-10
Anticipated expiration: 2034-08-28
Also published as: CN104199740B

Abstract

The invention provides a non-tight-coupling multi-node multi-processor system based on system address space sharing. The system comprises server nodes and inter-node control units. Resource sharing and communication interconnection between the server nodes are achieved through the inter-node control units. The invention further provides a data reading and writing method based on the system. Through the mode of system address space sharing, memory sharing and I/O sharing are achieved in the non-tight-coupling multi-node multi-processor system, and the local nodes are allowed to directly access memory resources and I/O resources of the remote nodes.

Description

No tight coupling multinode multicomputer system and the method for shared system address space

Technical field

The present invention relates to distributed computer processing technology field, be specifically related to no tight coupling multinode multicomputer system and the method for shared system address space.

Background technology

Along with developing rapidly of the fields such as shopping at network, search, Internet of Things and data mining, the required data volume to be processed of data center sharply increases rapidly, the data total amount that China in 2013 produces surpasses 0.8ZB, and 2 times to 2012, is equivalent to the data total amount in the whole world in 2009.The required data total amount to be processed of main flow Internet firm is all in PB level at present, and on the large data conferencing of 2013, the data total amount of Alibaba is at 20PB, and the data total amount of company of Tengxun is at 100PB.

In the face of the Data Growth of explosion type, the extensibility of data center has faced great challenge.Scale-up (longitudinally dilatation) is two kinds of principal modes of current data center dilatation with Scale-out (extending transversely).Scale-up is to more powerful CPU, internal memory, network and other device extension, Scale-out be by distributed algorithm by one by one independently low-cost server node form a large and strong system.Compare with Scale-up, Scale-out dilatation is simpler, cost is lower, becomes gradually the main flow framework of Future Data center development.

Yet adopt the data center of Scale-out architecture, between server node, adopt the organizational form of loose coupling, independent each other, can not shared computation resource, storage resources and I/O resource.And current distributed algorithm is difficult to accomplish calculation task is uniformly distributed between a large amount of server nodes, and the distribution of unbalanced task easily causes part server node load weight, and this part node becomes the performance bottleneck of whole system.

In order to address the above problem, researchist has proposed the solution of shared drive and shared I/O between server node, and wherein representational is " inner server " and " virtual I/O "." inner server " refers to the node of data center is divided into " computing node " and " memory service node ", the latter when the former produces page fault for it provides internal memory." virtual I/O " is by register and the internal memory of Simulation with I/O equipment, and the access of capturing operation system to IO port and register, then by the form of software or hardware, carrys out I/O equipment that is virtually reality like reality.

Yet above-mentioned solution is all more unilateral, do not form the solution of a system level.For this reason, the present invention proposes a kind of no tight coupling multinode multicomputer system and method for shared system address space, for the internal memory between the multiserver node of no tight coupling, share with I/O and share and proposed unified solution.

Summary of the invention

The no tight coupling multinode multicomputer system building method that proposes a kind of shared system address space, comprising:

Server node, for managing local computing resource, memory source and I/O resource;

Control module between node, looks for realize the overall situation between a plurality of server nodes of no tight coupling

The system address space of figure is shared;

Wherein, between server node, by control module between node, realize resource sharing and communication interconnect.

Especially, described server node comprises at least one processor, at least one Memory control module, and at least one I/O control module.

Especially, between described node, control module comprises:

The network interface being connected with each server node;

System address space mapping block, for realizing the conversion between the system address of shared resource of server node and the physical address of the local real resources of server node;

Support the latching operation administration module of atomic instructions, for realize shared lock mechanism between the server node of no tight coupling;

Internet message modular converter, for the message conversion between server node.

Especially, in described system address space mapping block, set up and have a global system Address space mappinD relation table and an intra-node resource mapping table;

Described global system Address space mappinD relation table comprises MMIO address space, and described MMIO address space has recorded the MMIO address of the shared resource of Servers-all node;

Described intra-node resource mapping table comprises the physical address space of the local real resources of the shared resource that is mapped in the described server node in MMIO address space.

Especially, all shared resources of same server node are mapped in one section of continuous MMIO address space.

Especially, the latching operation administration module of described support atomic instructions, for realizing the mutually exclusive operation between a plurality of server nodes.

Especially, described local real resources is described server node memory source and I/O resource.

A data access method for the no tight coupling multinode multicomputer system in shared address space, comprising:

S1: local node, according to the MMIO address space information of configuration, sends data access request;

S2: described data access request is forwarded to controller ENC between node;

S3: described ENC, according to the MMIO address space information of configuration, is forwarded to corresponding remote node by external interconnect network by described data access request;

S4: corresponding remote node is received after described data access request, determines the address of the real resources of this remote node inside that described data access request will be accessed according to the intra-node resource address spatial mappings table of configuration.

The invention has the beneficial effects as follows: by the mode of shared system address space, in the multinode multicomputer system of no tight coupling, realized internal memory shared shared with I/O, allowed local node directly to access memory source and the I/O resource of remote node.

Accompanying drawing explanation

Fig. 1 is the no tight coupling multinode multicomputer system of a kind of shared system address space of proposing of the present invention.

Fig. 2 is the Address space mappinD graph of a relation of the shared system address space that proposes of the present invention.

Fig. 3 is control module schematic diagram between the node that proposes of the present invention.

Fig. 4 is the intra-node resource address spatial mappings table that the present invention proposes.

Fig. 5 is intra-node atomic instructions application latching operation process flow diagram.

Fig. 6 is the address mapping schematic flow sheet that the present invention proposes.

Fig. 7 is the schematic flow sheet of the application global lock operation that proposes of the present invention.

Embodiment

In order to make object of the present invention, technical scheme clearer, provide the specific embodiment of the present invention below, by reference to the accompanying drawings and embodiment the present invention is further elaborated.Should be appreciated that specific embodiment described herein is only in order to explain the present invention.

Embodiment mono-:

The embodiment mono-that the present invention proposes proposes a kind of no tight coupling multinode multicomputer system of shared system address space.Fig. 1 has provided the structural drawing of described system.In the drawings, intra-node is connected by node internal controller INC (Intra Node Controller), and each INC can connect at least one CPU.Each node is an independently integral body, has independently computational resource, memory source and I/O resource, can move independently operating system.Between node, adopt no tight coupling form tissue, independent each other, between node, cannot directly realize sharing of memory source and I/O resource.

Described node can be server node, and this server node, for executive operating system, is managed local computational resource, memory source and I/O resource.Server node comprises at least one processor, at least one Memory control module and/or at least one I/O control module.

In order to realize sharing of memory source and I/O resource between node, described system also comprises controller ENC between node (External Node Controller).Each node is connected with dedicated interconnection network by controller ENC between node, forms extendible distributed computing system.The more important thing is, between node, by ENC, can realize sharing of memory source and I/O resource, concrete methods of realizing will be described in follow-up word.

First referring to Fig. 2, Fig. 2 shows shared system Address space mappinD graph of a relation.The mode of the present invention by shared system address space realizes sharing of memory source and I/O resource between node.Wherein, memory address and the I/O address space being shared in all remote nodes shone upon in MMIO region; In local node and remote node, the memory address space being shone upon by MMIO region, need to be distributed in nonconforming region of memory Non-coherent Memory.

For instance, non-coherent Memory in Node_0 is real internal memory, and MMIO address space record is only mapping relations, the access that all nodes send by MMIO address space all can drop on real internal memory and I/O, address by global map table (the one section of MMIO space) Node_0 that conducts interviews, finally can be transformed into the access to non-coherent Memory.Based on this mode, local node, by the address of access MMIO address space, is realized the object of access remote node memory source and I/O resource.

Fig. 3 has provided the structural representation of control module ENC between node.Between node, control module is vitals of the present invention, comprises the network interface that the node separate with each is connected, system address space mapping block, internet message administration module and the latching operation administration module of supporting atomic instructions.

1. system address space mapping block

System address space mapping block, for the address translation in remote node system address space is become to the address in local system address space, making can cross-system field communication between two nodes.Based on this module, the shared resource in the multi-node system of whole no tight coupling can be set up to a global view, as shown in Figure 2.Based on this global view, between the node of no tight coupling, can realize the resource sharing that could realize under tight coupling organizational form.

System address space mapping block is for setting up the global view (Fig. 2 illustrates) of shared system address space, the shared resource of all remote nodes is mapped in local MMIO address space, the address space in different MMIO region is not overlapping, and all shared resources of same node are mapped in one section of continuous MMIO space.The request that this module is accessed local node shared resource by remote node by MMIO address simultaneously converts discernible Address requests in node to.

Described system address space mapping block is also set up an intra-node resource mapping table.In intra-node resource mapping table, by MMIO Address space mappinD corresponding to local node in global view, be the real address space of local resource, i.e. the MMIO address of memory address and I/O equipment.

Fig. 4 has provided intra-node resource address spatial mappings table.Wherein, the left side of table is continuous MMIO address space corresponding to local node in global address space's view, and the right-hand part of table is corresponding to the local memory source of continuous N MIO address space or the local address of I/O resource.

2. support the latching operation administration module of atomic instructions

Support the latching operation administration module of atomic instructions, for realize shared lock mechanism between two nodes of no tight coupling, the sharing operation of conflicting between two nodes can be carried out in mutual exclusion.

In order to illustrate that in this patent, the atomic instructions for shared address is carried out flow process, the first review execution of the atomic instructions in node once flow process in patent.At system Atom command request, twice accessing operation in instruction will order complete in the situation that the accessing operation that conflicts without the external world disturbs, so the execution flow process of atomic instructions is divided into three steps: application global lock, execution accessing operation, release global lock.

First, as shown in Figure 5, wherein Core is that processor core, LLC are the cache controller of afterbody buffer memory to the process of application global lock, and Config Agent is Configuration Agent, and Quiescent Master is static controller.In the process of application global lock, first Core sends application, through LLC and Config Agent, arrives Quiescent Master; Quiescent Master sends the application (StopReq1) that stops request to all processors in node; After Quiescent Master receives that the confirmation of all processors is replied, Quiescent Master sends to I/O agency the application (StopReq2) that stops request; After receiving that all confirmations are replied, Quiescent Master replys acknowledge message to the Core of application latching operation.After Core receives acknowledge message, carry out accessing operation and discharge global lock operation, discharge global lock operation similar to application latching operation flow process.

In the multinode multicomputer system of no tight coupling of sharing global address space, for the atomic instructions of shared address, carry out the execution flow process of the instruction atomic instructions in flow process and node, application I/O stop ask there is very large difference in (StopReq2) process.In order to guarantee that the action scope of this atomic instructions is whole system, but not certain intra-node, when latching operation administration module is received the application that stops I/O request, this request meeting is broadcast to the latching operation manager of all nodes by the latching operation administration module in ENC.

In the process of multinode application global lock, avoiding deadlock is the problem that first will consider, in this patent for fear of deadlock, some nodes in system are elected as global lock operational administrative node (Lock Manager), the node of all application global locks all will send request to the latching operation administration module of global lock operational administrative node, then by this node, to the latching operation manager of other nodes in system, is sent the request of application global lock.

Latching operation administration module in node, while receiving the application that stops I/O request (StopReq2), the request that latching operation administration module can send global lock between application node to global lock operational administrative node Lock Manager, Lock Manager is after the request of receiving, forward the request to the latching operation administration module of all other nodes, the latching operation manager of all nodes can operate by the global lock in this intra-node application node, apply for that successfully backward Lock Manager replys acknowledge message, after Lock Manager collects all confirmation replies, node to global lock between application node is replied acknowledge message.After the latching operation manager of global lock between application node is received and confirm to be replied, to the Quiescent Master of this node, has replied and stopped the application that I/O asks.Quiescent Master replys acknowledge message to processor core, has completed the application process for the global lock of shared address.

3. internet message modular converter

Internet message modular converter for realizing the message conversion of interconnection network between intra-node interconnection network and node, is realized transparent communication between two networks.In the interactive information of the resource that node visit remote node is shared and application global lock process, mutual information all forwards between node by this module.

Embodiment bis-:

Fig. 6 has provided address mapping process flow diagram, while having described the shared resource of local node access remote node, and the flow path switch of accessed address.Idiographic flow is described below:

Step 1, local node, according to overall shared address space view, sends and reads (or modification) request of data.

Step 2, according to the address space configuration of local node, request is forwarded to ENC (controller between node).

Step 3, ENC (controller between node), according to overall shared address space view, arrives remote node by the request of external interconnect forwarded.

Step 4, remote node is received request, according to intra-node resource address spatial mappings table, the MMIO address translation of global view is become to address in local address space.

Embodiment tri-:

Fig. 7 has described the schematic flow sheet of application global lock operation.Concrete steps are as follows:

Step 1, when the latching operation administration module of local node is received the request of application lock, not carries out the request of answering but sending global lock between application node to the latching operation administration module of global lock operational administrative node Lock Manager immediately.

Step 2, the latching operation administration module of Lock Manager receives after request, the request of sending application node internal lock to the latching operation administration module of all nodes in view.

Step 3, the latching operation administration module of each node can be in its intra-node application for execution node latching operation, the latching operation administration module of then answering Lock Manager.

Step 4, the latching operation administration module of Lock Manager, after collecting all confirmation answers, is answered the latching operation administration module of initial application latching operation, allows it to carry out atom accessing operation.

Certainly; the present invention also can have other various embodiments; in the situation that not deviating from spirit of the present invention and essence thereof; those of ordinary skill in the art are when making according to the present invention various corresponding changes and distortion, but these corresponding changes and distortion all should belong to the protection domain of claim of the present invention.

Claims

1. the no tight coupling multinode multicomputer system in shared address space, is characterized in that, comprising:

Control module between node, shares for realize the system address space of global view between a plurality of server nodes of no tight coupling;

2. system according to claim 1, is characterized in that:

Described server node comprises at least one processor, at least one Memory control module, and at least one I/O control module.

3. system according to claim 1, is characterized in that:

Between described node, control module comprises:

The network interface being connected with each server node;

4. system according to claim 3, is characterized in that:

In described system address space mapping block, set up and have a global system Address space mappinD relation table and an intra-node resource mapping table;

5. system according to claim 4, is characterized in that: all shared resources of same server node are mapped in one section of continuous MMIO address space.

6. system according to claim 3, is characterized in that:

The latching operation administration module of described support atomic instructions, for realizing the mutually exclusive operation between a plurality of server nodes.

7. according to the system described in claim 3-6, it is characterized in that:

The memory source that described local real resources is described server node and I/O resource.

8. a data access method for the no tight coupling multinode multicomputer system in shared address space,

It is characterized in that, comprising:

S2: described data access request is forwarded to controller ENC between node;

S4: corresponding remote node is received after described reading out data request, determines the address of the real resources of this remote node inside that described data access request will be accessed according to the intra-node resource address spatial mappings table of configuration.