CN106383791B

CN106383791B - A kind of memory block combined method and device based on nonuniform memory access framework

Info

Publication number: CN106383791B
Application number: CN201610844237.7A
Authority: CN
Inventors: 张健; 王梅
Original assignee: Shenzhen Polytechnic
Current assignee: Shenzhen Polytechnic
Priority date: 2016-09-23
Filing date: 2016-09-23
Publication date: 2019-07-12
Anticipated expiration: 2036-09-23
Also published as: CN106383791A

Abstract

The invention belongs to cloud storage technical fields, are related to a kind of memory block combined method and device based on nonuniform memory access framework.Method includes three steps, 1) the memory logical connection of the enabled node of same frequency according to the frequency of node, is constituted a memory block by the memory that provides enabled node；2) using memory block as window block, it puts in order by adjusting each enabled node in putting in order between each window block and window block, determine the smallest logic arrangement result of link cost, it wherein include the host node in the smallest logic arrangement of link cost in logic arrangement result, in the routing table by logic arrangement result record；3) routing table is stored in the control processor being connected with host node, and each memory block global address is distributed to by control processor, to construct memory cloud.The present invention can overcome the memory different with heterozygosis due to the poor efficiency of cluster interconnection network, the as much as possible non-unified access memory cloud storage of construction high quality.

Description

A kind of memory block combined method and device based on nonuniform memory access framework

Technical field

The invention belongs to cloud storage technical fields, and in particular to a kind of memory block group based on nonuniform memory access framework Close method and device.

Background technique

Currently, the cloud storage technology development in cloud computing is getting faster, (the Solid State from disk array to SSD Drives, solid state hard disk) array, RAM (Random Access Memory, random access memory) cloud till now deposits Storage.RAM cloud storage stores the data entirely applied, throughput using the memory ram of up to several hundred or even thousands of servers On it is higher than disk base system it is several hundred~thousands of times, delay but only several hundred~several one thousandths.Typically MapReduce is The new technology that Google's recent years rises, it is therefore intended that improve data access speed, eliminate delay issue.It is solved Large-scale problem but if being continuous data access will make the program be only limited to answering in random access data With middle use.In terms of this set distributed computing framework of MapReduce realizes that primary limitation is following two, one is with MapReduce writes linear communication model comparision trouble, the second is how it improves all or a frame based on batch mode Frame；The RAMCloud project that Stanford University announces constructs memory array using the memory of same type, realizes more than 1PB Amount of storage.But the limitation of the project is the memory using same type.

NUMA (Non Uniform Memory Access Architecture, Non Uniform Memory Access access) framework then mentions The different types of memory of permission has been supplied to be combined into the possibility of memory cloud storage.But if only memory group is passed through corresponding Board, bus or network connection are got up, and the memory cloud storage of optimization can not be constituted.

Summary of the invention

The purpose of the invention is to change the memory cloud framework of existing homotype memory array composition and other correlations to ask Topic, propose a kind of memory block combined method and device based on nonuniform memory access framework, can efficiently to non-homotype, Non- unified access memory is ranked up merger, and logic arrangement result is transferred to control processor, constructs high quality as much as possible Non- unified access memory cloud storage.

To achieve the above object, the present invention adopts the following technical scheme: it is a kind of based in nonuniform memory access framework Counterfoil combined method, includes the following steps:

Step 1: the memory that enabled node is provided according to node frequency, by the memory of the enabled node of same frequency Logical connection constitutes a memory block；

Step 2: using memory block as window block, by adjusting in putting in order between each window block and window block Each enabled node puts in order, and determines the smallest logic arrangement of link cost as a result, wherein in the logic arrangement result Including the host node in the smallest logic arrangement of link cost, in the routing table by logic arrangement result record；

Step 3: the routing table is stored in the control processor being connected with the host node, and by described Control processor distributes to each memory block global address, to construct memory cloud.

Algorithm of the invention is based on NUMA and SIMD hardware environment.Heretofore described node is network section Point, wherein enabled node is connected to the node of network for the node that part can provide memory by NUMA card.Wherein, about section Point frequency, due in model each server and connection, there are different memories, CPU, mainboard and network interfaces, therefore Connection speed is different, and suchlike each factor for influencing speed is reduced to the frequency of node memory by the present invention.Its In, host node is the smallest enabled node of totle drilling cost that other each enabled nodes are arrived in enabled node.Wherein, link cost, shadow Any factor for ringing data transmission is considered as link cost.Wherein, the cost of host node to memory block is that host node is interior to this The cost of all nodes is cumulative in counterfoil.

Preferably, the step 2 includes:

An enabled node is first chosen from the enabled node as host node by simulated annealing, wherein described Host node is the connecting interface of the control processor；

By each window block, arranged according to the link cost sequence from small to large of the host node to window block, and By the enabled node in each window block according to the host node to the enabled node in each window block link cost from small to large Sequence arranged.

Preferably, the step 3 includes that the host node is connected by bus with the control processor.

On the other hand, the present invention also provides a kind of, and the memory block based on nonuniform memory access framework combines the unit, described Device includes:

Division module, memory for providing enabled node according to node frequency, by the enabled node of same frequency Memory logical connection constitute a memory block；

Processing module is used for using memory block as window block, by adjusting putting in order between each window block and window Each enabled node puts in order in buccal mass, determines the smallest logic arrangement of link cost as a result, the wherein logic arrangement It as a result include the host node in the smallest logic arrangement of link cost in, in the routing table by logic arrangement result record；

Module being constructed, for being stored in the routing table in the control processor being connected with the host node, and being led to It crosses the control processor and distributes to each memory block global address, to construct memory cloud.

Preferably, the processing module is also used to first choose one from the enabled node by simulated annealing Enabled node is as host node, wherein the host node is the connecting interface of the control processor；

The processing module, is also used to each window block, according to the host node to window block link cost from it is small to Big sequence is arranged, and by the enabled node in each window block according to the host node to the enabled node in each window block Link cost sequence from small to large arranged.

Memory block combined method and device based on nonuniform memory access framework of the invention, the algorithm are based on non-unification Internal storage access framework efficiently can be ranked up merger to non-homotype, non-unified access memory, constitute processor and operation The architecture of system interconnectivity and sharing memory bus；The present invention can be applied to large-scale NUMA memory cloud storage platform, Overcome the memory different with heterozygosis due to the poor efficiency of cluster interconnection network, the as much as possible non-unified visit of construction high quality Ask memory cloud storage.

Detailed description of the invention

Fig. 1 is RAMCloud nonuniform memory access framework in the embodiment of the present invention；

Fig. 2 is potential data center node topology in the embodiment of the present invention；

Fig. 3 is the memory block merged in the embodiment of the present invention；

Fig. 4 is window block-simulated annealing in the embodiment of the present invention；

Fig. 5 is number of run and convergence state figure in the embodiment of the present invention.

Specific embodiment

Embodiment 1:

A kind of memory block combined method based on nonuniform memory access framework, includes the following steps:

A kind of memory block combination unit based on nonuniform memory access framework, described device include:

The embodiment will be applied to large-scale NUMA memory cloud storage platform, mutual using processor and operating system cluster The even architecture of shared memory bus, this structure overcome since the poor efficiency of cluster interconnection network is different with heterozygosis Memory, availability leap, constitutes the memory cloud storage more optimized.

Embodiment 2:

Wherein, the step 2 includes:

Wherein, the step 3 includes that the host node is connected by bus with the control processor.

A kind of memory block combination unit based on framework nonuniform memory access framework, described device include:

Wherein, the processing module, being also used to first choose one from the enabled node by simulated annealing can Use node as host node, wherein the host node is the connecting interface of the control processor；

As shown in Figure 1, the memory cloud under nonuniform memory access framework includes application library 1, application library 2 ..., application library n, data center and control processor.Nonuniform memory access framework tissue memory cloud is pressed by data center, Control processor manages data center.

For memory cloud, high performance network technology of the target it is necessary to have following characteristic of its low latency is realized: low to prolong Late, high bandwidth and full-duplex bandwidth.

Algorithm of the invention is elaborated below by model:

1. model formulation

Assuming that 1: each node have memory, may with the non-homotype of other nodes, such as different frequency, bus, CPU model With the speed of service etc., in this model, these aspects are all reduced to different frequencies；

Assuming that 2: according to the prior art it is found that node is by frequency sequence merger, available optimal performance；

Assuming that 3: connecting node needs different costs.Any factor for influencing data transmission is all assumed to connect into This.

2. model defines

As shown in Fig. 2, the connection topological structure of node A/B/C.../H, uses the different non-homotypes of frequency analog, the Organization of African Unity One memory.These nodes provide in a certain number of respectively and are stored to cloud；Node is connected to each other with different costs.

3. data model and initialization

For each node above-mentioned, each node has memory capacity and frequency.Related data such as 1 institute of table Show.

Table 1: nodal information

For any node being connected, node 1 arrives node 2 and corresponding cost.Related data is as shown in table 2.

Table 2: node connects expense

Node 1	Node 2	Node connects expense
			A	B	2
B	A	1
			A	D	3
D	A	1
			…	…	…
D	B	1

The model is the cloud storage of nonuniform memory access framework, and when access follows following 3 rules:

It (1) must not the adjacent memory node of random writing；

(2) adjacent memory node must not be read at random；

(3) asynchronous adjacent memory node.

Experiment shows performance can be made sharply to decline if any violation dependency rule.To Jin Shidun memory performance test data Show that in the combination of identical clock memory be optimal.Otherwise, memory may under single channel or single tape wide mode work Make, decline Memory access speeds can sharply.

In memory cloud, the research of sequence merger join algorithm optimization is concentrated mainly on NUMA and SIMD hardware environment.It is non- Sorting in parallel merger join algorithm under uniform memory access framework can be divided into three phases: phase sorting, separator stage and company Connect the stage.Therefore, the present invention is to merge homotype memory and find with the access node of the smallest cost, which, which directly passes through, makes It is interconnected with the bus of processor, such as AMD HT (super transmission) and Intel QPI (Quick Path Interconnect).

We will define following rule:

Rule 1: in order to obtain optimum performance, sorting according to nodal frequency and merge the memory of enabled node, after sequence To corresponding memory set of blocks, it is denoted asMemory block will be a set, be denoted as { Mbi }；

Rule 2: connecting interface of the host node as control processor is found.From host node to the assembly of other nodes This is minimum, is expressed asMeanwhile it not being changed logically inside the memory block after merging；

Rule 3: from the second memory block, sequence node is immediate first to sort by by the cost from host node to this node, The group is represented as { Ai }.

According to above-mentioned rule, can fast and effeciently sort the non-homotype of merger, non-unified memory, and search out connection control The node of processor；The distribution of global address is carried out by control processor, constructs memory cloud, for applying routine access.

By taking model shown in Fig. 2 as an example, which will be there are three the stage: sequence merger, subregion and connection

(1) sequence, merger ----initialization

According to data shown in table 1, we sort the memory of merge node.By the memory of node memory and same frequency Rate logical connection.Four memory block { Mbi }={ 6,9,6,2 } are obtained, as shown in Figure 3.

(2) subregion ----window block simulated annealing

According to data shown in table 2, we initialize system.We have obtained the data in table 3, from any node to The shortest path of the cost of other nodes.If access details are 0, it means that the two nodes are connected directly；Otherwise this will It is a character string as from a node to another routed path.Associated data are illustrated as table 3.

Table 3: from the minimum cost and corresponding connection path of server-to-server

According to table 3, current overhead is

The present invention by simulated annealing thought, in a big search space approximation global optimization approach.

According to rule 1, in this case, the memory block after merger cannot be broken.The present invention utilizes window block, each Window block can all be taken as an internal storage location.Inside window block, each node can be reordered.In the process, it counts It lets it pass the totle drilling cost currently annealed, and anneals.As the internal node of Moving Window buccal mass and window block sorts, finite time is obtained Best solution in cost.

In Fig. 4, a, which is one, be possible solution.Host node is F, and coprocessor accesses other nodes from F point, always Cost is 65.

In Fig. 4, b is a preferable solution.Host node is B, and from B to other nodes, totle drilling cost is 27, and from 2nd piece of window starts, and sequence node is ranked up by rule 3.

(3) connect --- combined memory cloud

When obtaining a best solution, as shown in b in Fig. 4, coprocessor will be connected to node B, routing table (class It is similar to table 3) coprocessor will be copied and stored in.Coordinator will distribute to the global address of each cluster.

The details of the present embodiment not to the greatest extent, please refers to the associated description of previous embodiment 1, details are not described herein again.

The simulated annealing that the embodiment uses has carried out some improvement to traditional algorithm, not only proportionately to memory block Originally it is ranked up, while also the node inside memory block is ranked up, it is flexible using simulated annealing, high-efficient, it is new when having Node when being added in the memory cloud, can quickly in memory cloud memory block and respective nodes make adjustment, thus Construct the non-unified access memory cloud storage of high quality.

The memory block in the present embodiment based on nonuniform memory access framework is combined with a concrete application scene below Method is illustrated, and concrete mode is as follows:

(4) algorithm description

It according to rule 1, initializes first, calls Init () sequence merger node, generate first state S0, be shown in Table 3.So Afterwards, according to window block simulated annealing rule 2.Call Cost () will to calculate and return the cost of Present solutions.It calls Neighbor () arrives traditional simulated annealing, it will generate the given state of a randomly selected neighbours.Most Afterwards, it obtains best solution.Coprocessor is connected to host node and duplication route information table by function Connect () 3.Function AssignGlobalAddress () is by coordinated allocation cluster memory according to the global address of block sequence.

Parameter S0 initial solution, parameter Sbest are preferably solution best at present, and parameter T0 is initial temperature, and α is cold But speed, β are a constants, and M represents the time until the update of next parameter, and the maximum time limit is the lehr attendant of total time Skill.

Following pseudocode gives the described memory block combined method for being directed to nonuniform memory access framework.

In the algorithm, most important function is Neighbor ().One of its one randomly selected neighbour of generation gives Fixed state.In " window block ", each node will be rearranged by rule 3；Outside " block " window, each window Block will all rearrange.

In this model, there are 8 nodes and 4 window blocks.By many experiments as a result, the overhead of best solution Finally it is converged in 27.Best situation is 3 times, top it all off 15 times, as shown in Figure 5.

Claims

1. a kind of memory block combined method based on nonuniform memory access framework, which comprises the steps of:

Step 1: the memory that enabled node is provided according to node frequency, by the memory logic of the enabled node of same frequency Connect and compose a memory block；The frequency is used to indicate the connection speed of node；

Step 2: using memory block as window block, by adjusting each in putting in order between each window block and window block Enabled node puts in order, and determines the smallest logic arrangement of link cost as a result, wherein including in the logic arrangement result Host node in the smallest logic arrangement of link cost, in the routing table by logic arrangement result record；

Step 3: the routing table is stored in the control processor being connected with the host node, and passes through the control Processor distributes to each memory block global address, to construct memory cloud；

The step 2 includes:

An enabled node is first chosen from the enabled node as host node, wherein the main section by simulated annealing Point is the connecting interface of the control processor；

By each window block, arranged according to the link cost sequence from small to large of the host node to window block, and will be each Enabled node in window block according to the host node to the enabled node in each window block link cost row from small to large Sequence is arranged.

2. the memory block combined method according to claim 1 based on nonuniform memory access framework, it is characterised in that: institute Stating step 3 includes that the host node is connected by bus with the control processor.

3. a kind of memory block combination unit based on framework nonuniform memory access framework, which is characterized in that described device includes:

Division module, the memory for providing enabled node, will be in the enabled nodes of same frequency according to the frequency of node It deposits logical connection and constitutes a memory block；The frequency is used to indicate the connection speed of node；

Processing module is used for using memory block as window block, by adjusting putting in order between each window block and window block In each enabled node put in order, determine the smallest logic arrangement of link cost as a result, the wherein logic arrangement result In include the smallest logic arrangement of link cost in host node, by the logic arrangement result record in the routing table；

Module being constructed, for being stored in the routing table in the control processor being connected with the host node, and passing through institute It states control processor and distributes to each memory block global address, to construct memory cloud；

The processing module is also used to first choose an enabled node conduct from the enabled node by simulated annealing Host node, wherein the host node is the connecting interface of the control processor；

The processing module, be also used to each window block, according to the host node to window block link cost from small to large It is arranged, and by the enabled node in each window block according to the host node to the company of the enabled node in each window block The sequence being connected into originally from small to large is arranged.