CN106383791B - A kind of memory block combined method and device based on nonuniform memory access framework - Google Patents

A kind of memory block combined method and device based on nonuniform memory access framework Download PDF

Info

Publication number
CN106383791B
CN106383791B CN201610844237.7A CN201610844237A CN106383791B CN 106383791 B CN106383791 B CN 106383791B CN 201610844237 A CN201610844237 A CN 201610844237A CN 106383791 B CN106383791 B CN 106383791B
Authority
CN
China
Prior art keywords
memory
node
block
enabled
window block
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610844237.7A
Other languages
Chinese (zh)
Other versions
CN106383791A (en
Inventor
张健
王梅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Polytechnic
Original Assignee
Shenzhen Polytechnic
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Polytechnic filed Critical Shenzhen Polytechnic
Priority to CN201610844237.7A priority Critical patent/CN106383791B/en
Publication of CN106383791A publication Critical patent/CN106383791A/en
Application granted granted Critical
Publication of CN106383791B publication Critical patent/CN106383791B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/06Addressing a physical block of locations, e.g. base addressing, module addressing, memory dedication
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer
    • G06F13/16Handling requests for interconnection or transfer for access to memory bus
    • G06F13/1605Handling requests for interconnection or transfer for access to memory bus based on arbitration
    • G06F13/1652Handling requests for interconnection or transfer for access to memory bus based on arbitration in a multiprocessor architecture
    • G06F13/1657Access to multiple memories
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/25Using a specific main memory architecture
    • G06F2212/254Distributed memory
    • G06F2212/2542Non-uniform memory access [NUMA] architecture

Abstract

The invention belongs to cloud storage technical fields, are related to a kind of memory block combined method and device based on nonuniform memory access framework.Method includes three steps, 1) the memory logical connection of the enabled node of same frequency according to the frequency of node, is constituted a memory block by the memory that provides enabled node;2) using memory block as window block, it puts in order by adjusting each enabled node in putting in order between each window block and window block, determine the smallest logic arrangement result of link cost, it wherein include the host node in the smallest logic arrangement of link cost in logic arrangement result, in the routing table by logic arrangement result record;3) routing table is stored in the control processor being connected with host node, and each memory block global address is distributed to by control processor, to construct memory cloud.The present invention can overcome the memory different with heterozygosis due to the poor efficiency of cluster interconnection network, the as much as possible non-unified access memory cloud storage of construction high quality.

Description

A kind of memory block combined method and device based on nonuniform memory access framework
Technical field
The invention belongs to cloud storage technical fields, and in particular to a kind of memory block group based on nonuniform memory access framework Close method and device.
Background technique
Currently, the cloud storage technology development in cloud computing is getting faster, (the Solid State from disk array to SSD Drives, solid state hard disk) array, RAM (Random Access Memory, random access memory) cloud till now deposits Storage.RAM cloud storage stores the data entirely applied, throughput using the memory ram of up to several hundred or even thousands of servers On it is higher than disk base system it is several hundred~thousands of times, delay but only several hundred~several one thousandths.Typically MapReduce is The new technology that Google's recent years rises, it is therefore intended that improve data access speed, eliminate delay issue.It is solved Large-scale problem but if being continuous data access will make the program be only limited to answering in random access data With middle use.In terms of this set distributed computing framework of MapReduce realizes that primary limitation is following two, one is with MapReduce writes linear communication model comparision trouble, the second is how it improves all or a frame based on batch mode Frame;The RAMCloud project that Stanford University announces constructs memory array using the memory of same type, realizes more than 1PB Amount of storage.But the limitation of the project is the memory using same type.
NUMA (Non Uniform Memory Access Architecture, Non Uniform Memory Access access) framework then mentions The different types of memory of permission has been supplied to be combined into the possibility of memory cloud storage.But if only memory group is passed through corresponding Board, bus or network connection are got up, and the memory cloud storage of optimization can not be constituted.
Summary of the invention
The purpose of the invention is to change the memory cloud framework of existing homotype memory array composition and other correlations to ask Topic, propose a kind of memory block combined method and device based on nonuniform memory access framework, can efficiently to non-homotype, Non- unified access memory is ranked up merger, and logic arrangement result is transferred to control processor, constructs high quality as much as possible Non- unified access memory cloud storage.
To achieve the above object, the present invention adopts the following technical scheme: it is a kind of based in nonuniform memory access framework Counterfoil combined method, includes the following steps:
Step 1: the memory that enabled node is provided according to node frequency, by the memory of the enabled node of same frequency Logical connection constitutes a memory block;
Step 2: using memory block as window block, by adjusting in putting in order between each window block and window block Each enabled node puts in order, and determines the smallest logic arrangement of link cost as a result, wherein in the logic arrangement result Including the host node in the smallest logic arrangement of link cost, in the routing table by logic arrangement result record;
Step 3: the routing table is stored in the control processor being connected with the host node, and by described Control processor distributes to each memory block global address, to construct memory cloud.
Algorithm of the invention is based on NUMA and SIMD hardware environment.Heretofore described node is network section Point, wherein enabled node is connected to the node of network for the node that part can provide memory by NUMA card.Wherein, about section Point frequency, due in model each server and connection, there are different memories, CPU, mainboard and network interfaces, therefore Connection speed is different, and suchlike each factor for influencing speed is reduced to the frequency of node memory by the present invention.Its In, host node is the smallest enabled node of totle drilling cost that other each enabled nodes are arrived in enabled node.Wherein, link cost, shadow Any factor for ringing data transmission is considered as link cost.Wherein, the cost of host node to memory block is that host node is interior to this The cost of all nodes is cumulative in counterfoil.
Preferably, the step 2 includes:
An enabled node is first chosen from the enabled node as host node by simulated annealing, wherein described Host node is the connecting interface of the control processor;
By each window block, arranged according to the link cost sequence from small to large of the host node to window block, and By the enabled node in each window block according to the host node to the enabled node in each window block link cost from small to large Sequence arranged.
Preferably, the step 3 includes that the host node is connected by bus with the control processor.
On the other hand, the present invention also provides a kind of, and the memory block based on nonuniform memory access framework combines the unit, described Device includes:
Division module, memory for providing enabled node according to node frequency, by the enabled node of same frequency Memory logical connection constitute a memory block;
Processing module is used for using memory block as window block, by adjusting putting in order between each window block and window Each enabled node puts in order in buccal mass, determines the smallest logic arrangement of link cost as a result, the wherein logic arrangement It as a result include the host node in the smallest logic arrangement of link cost in, in the routing table by logic arrangement result record;
Module being constructed, for being stored in the routing table in the control processor being connected with the host node, and being led to It crosses the control processor and distributes to each memory block global address, to construct memory cloud.
Preferably, the processing module is also used to first choose one from the enabled node by simulated annealing Enabled node is as host node, wherein the host node is the connecting interface of the control processor;
The processing module, is also used to each window block, according to the host node to window block link cost from it is small to Big sequence is arranged, and by the enabled node in each window block according to the host node to the enabled node in each window block Link cost sequence from small to large arranged.
Memory block combined method and device based on nonuniform memory access framework of the invention, the algorithm are based on non-unification Internal storage access framework efficiently can be ranked up merger to non-homotype, non-unified access memory, constitute processor and operation The architecture of system interconnectivity and sharing memory bus;The present invention can be applied to large-scale NUMA memory cloud storage platform, Overcome the memory different with heterozygosis due to the poor efficiency of cluster interconnection network, the as much as possible non-unified visit of construction high quality Ask memory cloud storage.
Detailed description of the invention
Fig. 1 is RAMCloud nonuniform memory access framework in the embodiment of the present invention;
Fig. 2 is potential data center node topology in the embodiment of the present invention;
Fig. 3 is the memory block merged in the embodiment of the present invention;
Fig. 4 is window block-simulated annealing in the embodiment of the present invention;
Fig. 5 is number of run and convergence state figure in the embodiment of the present invention.
Specific embodiment
Embodiment 1:
A kind of memory block combined method based on nonuniform memory access framework, includes the following steps:
Step 1: the memory that enabled node is provided according to node frequency, by the memory of the enabled node of same frequency Logical connection constitutes a memory block;
Step 2: using memory block as window block, by adjusting in putting in order between each window block and window block Each enabled node puts in order, and determines the smallest logic arrangement of link cost as a result, wherein in the logic arrangement result Including the host node in the smallest logic arrangement of link cost, in the routing table by logic arrangement result record;
Step 3: the routing table is stored in the control processor being connected with the host node, and by described Control processor distributes to each memory block global address, to construct memory cloud.
A kind of memory block combination unit based on nonuniform memory access framework, described device include:
Division module, memory for providing enabled node according to node frequency, by the enabled node of same frequency Memory logical connection constitute a memory block;
Processing module is used for using memory block as window block, by adjusting putting in order between each window block and window Each enabled node puts in order in buccal mass, determines the smallest logic arrangement of link cost as a result, the wherein logic arrangement It as a result include the host node in the smallest logic arrangement of link cost in, in the routing table by logic arrangement result record;
Module being constructed, for being stored in the routing table in the control processor being connected with the host node, and being led to It crosses the control processor and distributes to each memory block global address, to construct memory cloud.
Algorithm of the invention is based on NUMA and SIMD hardware environment.Heretofore described node is network section Point, wherein enabled node is connected to the node of network for the node that part can provide memory by NUMA card.Wherein, about section Point frequency, due in model each server and connection, there are different memories, CPU, mainboard and network interfaces, therefore Connection speed is different, and suchlike each factor for influencing speed is reduced to the frequency of node memory by the present invention.Its In, host node is the smallest enabled node of totle drilling cost that other each enabled nodes are arrived in enabled node.Wherein, link cost, shadow Any factor for ringing data transmission is considered as link cost.Wherein, the cost of host node to memory block is that host node is interior to this The cost of all nodes is cumulative in counterfoil.
The embodiment will be applied to large-scale NUMA memory cloud storage platform, mutual using processor and operating system cluster The even architecture of shared memory bus, this structure overcome since the poor efficiency of cluster interconnection network is different with heterozygosis Memory, availability leap, constitutes the memory cloud storage more optimized.
Embodiment 2:
A kind of memory block combined method based on nonuniform memory access framework, includes the following steps:
Step 1: the memory that enabled node is provided according to node frequency, by the memory of the enabled node of same frequency Logical connection constitutes a memory block;
Step 2: using memory block as window block, by adjusting in putting in order between each window block and window block Each enabled node puts in order, and determines the smallest logic arrangement of link cost as a result, wherein in the logic arrangement result Including the host node in the smallest logic arrangement of link cost, in the routing table by logic arrangement result record;
Step 3: the routing table is stored in the control processor being connected with the host node, and by described Control processor distributes to each memory block global address, to construct memory cloud.
Wherein, the step 2 includes:
An enabled node is first chosen from the enabled node as host node by simulated annealing, wherein described Host node is the connecting interface of the control processor;
By each window block, arranged according to the link cost sequence from small to large of the host node to window block, and By the enabled node in each window block according to the host node to the enabled node in each window block link cost from small to large Sequence arranged.
Wherein, the step 3 includes that the host node is connected by bus with the control processor.
A kind of memory block combination unit based on framework nonuniform memory access framework, described device include:
Division module, memory for providing enabled node according to node frequency, by the enabled node of same frequency Memory logical connection constitute a memory block;
Processing module is used for using memory block as window block, by adjusting putting in order between each window block and window Each enabled node puts in order in buccal mass, determines the smallest logic arrangement of link cost as a result, the wherein logic arrangement It as a result include the host node in the smallest logic arrangement of link cost in, in the routing table by logic arrangement result record;
Module being constructed, for being stored in the routing table in the control processor being connected with the host node, and being led to It crosses the control processor and distributes to each memory block global address, to construct memory cloud.
Wherein, the processing module, being also used to first choose one from the enabled node by simulated annealing can Use node as host node, wherein the host node is the connecting interface of the control processor;
The processing module, is also used to each window block, according to the host node to window block link cost from it is small to Big sequence is arranged, and by the enabled node in each window block according to the host node to the enabled node in each window block Link cost sequence from small to large arranged.
As shown in Figure 1, the memory cloud under nonuniform memory access framework includes application library 1, application library 2 ..., application library n, data center and control processor.Nonuniform memory access framework tissue memory cloud is pressed by data center, Control processor manages data center.
For memory cloud, high performance network technology of the target it is necessary to have following characteristic of its low latency is realized: low to prolong Late, high bandwidth and full-duplex bandwidth.
Algorithm of the invention is elaborated below by model:
1. model formulation
Assuming that 1: each node have memory, may with the non-homotype of other nodes, such as different frequency, bus, CPU model With the speed of service etc., in this model, these aspects are all reduced to different frequencies;
Assuming that 2: according to the prior art it is found that node is by frequency sequence merger, available optimal performance;
Assuming that 3: connecting node needs different costs.Any factor for influencing data transmission is all assumed to connect into This.
2. model defines
As shown in Fig. 2, the connection topological structure of node A/B/C.../H, uses the different non-homotypes of frequency analog, the Organization of African Unity One memory.These nodes provide in a certain number of respectively and are stored to cloud;Node is connected to each other with different costs.
3. data model and initialization
For each node above-mentioned, each node has memory capacity and frequency.Related data such as 1 institute of table Show.
Table 1: nodal information
For any node being connected, node 1 arrives node 2 and corresponding cost.Related data is as shown in table 2.
Table 2: node connects expense
Node 1 Node 2 Node connects expense
A B 2
B A 1
A D 3
D A 1
D B 1
The model is the cloud storage of nonuniform memory access framework, and when access follows following 3 rules:
It (1) must not the adjacent memory node of random writing;
(2) adjacent memory node must not be read at random;
(3) asynchronous adjacent memory node.
Experiment shows performance can be made sharply to decline if any violation dependency rule.To Jin Shidun memory performance test data Show that in the combination of identical clock memory be optimal.Otherwise, memory may under single channel or single tape wide mode work Make, decline Memory access speeds can sharply.
In memory cloud, the research of sequence merger join algorithm optimization is concentrated mainly on NUMA and SIMD hardware environment.It is non- Sorting in parallel merger join algorithm under uniform memory access framework can be divided into three phases: phase sorting, separator stage and company Connect the stage.Therefore, the present invention is to merge homotype memory and find with the access node of the smallest cost, which, which directly passes through, makes It is interconnected with the bus of processor, such as AMD HT (super transmission) and Intel QPI (Quick Path Interconnect).
We will define following rule:
Rule 1: in order to obtain optimum performance, sorting according to nodal frequency and merge the memory of enabled node, after sequence To corresponding memory set of blocks, it is denoted asMemory block will be a set, be denoted as { Mbi };
Rule 2: connecting interface of the host node as control processor is found.From host node to the assembly of other nodes This is minimum, is expressed asMeanwhile it not being changed logically inside the memory block after merging;
Rule 3: from the second memory block, sequence node is immediate first to sort by by the cost from host node to this node, The group is represented as { Ai }.
According to above-mentioned rule, can fast and effeciently sort the non-homotype of merger, non-unified memory, and search out connection control The node of processor;The distribution of global address is carried out by control processor, constructs memory cloud, for applying routine access.
By taking model shown in Fig. 2 as an example, which will be there are three the stage: sequence merger, subregion and connection
(1) sequence, merger ----initialization
According to data shown in table 1, we sort the memory of merge node.By the memory of node memory and same frequency Rate logical connection.Four memory block { Mbi }={ 6,9,6,2 } are obtained, as shown in Figure 3.
(2) subregion ----window block simulated annealing
According to data shown in table 2, we initialize system.We have obtained the data in table 3, from any node to The shortest path of the cost of other nodes.If access details are 0, it means that the two nodes are connected directly;Otherwise this will It is a character string as from a node to another routed path.Associated data are illustrated as table 3.
Table 3: from the minimum cost and corresponding connection path of server-to-server
According to table 3, current overhead is
The present invention by simulated annealing thought, in a big search space approximation global optimization approach.
According to rule 1, in this case, the memory block after merger cannot be broken.The present invention utilizes window block, each Window block can all be taken as an internal storage location.Inside window block, each node can be reordered.In the process, it counts It lets it pass the totle drilling cost currently annealed, and anneals.As the internal node of Moving Window buccal mass and window block sorts, finite time is obtained Best solution in cost.
In Fig. 4, a, which is one, be possible solution.Host node is F, and coprocessor accesses other nodes from F point, always Cost is 65.
In Fig. 4, b is a preferable solution.Host node is B, and from B to other nodes, totle drilling cost is 27, and from 2nd piece of window starts, and sequence node is ranked up by rule 3.
(3) connect --- combined memory cloud
When obtaining a best solution, as shown in b in Fig. 4, coprocessor will be connected to node B, routing table (class It is similar to table 3) coprocessor will be copied and stored in.Coordinator will distribute to the global address of each cluster.
The details of the present embodiment not to the greatest extent, please refers to the associated description of previous embodiment 1, details are not described herein again.
The simulated annealing that the embodiment uses has carried out some improvement to traditional algorithm, not only proportionately to memory block Originally it is ranked up, while also the node inside memory block is ranked up, it is flexible using simulated annealing, high-efficient, it is new when having Node when being added in the memory cloud, can quickly in memory cloud memory block and respective nodes make adjustment, thus Construct the non-unified access memory cloud storage of high quality.
The memory block in the present embodiment based on nonuniform memory access framework is combined with a concrete application scene below Method is illustrated, and concrete mode is as follows:
(4) algorithm description
It according to rule 1, initializes first, calls Init () sequence merger node, generate first state S0, be shown in Table 3.So Afterwards, according to window block simulated annealing rule 2.Call Cost () will to calculate and return the cost of Present solutions.It calls Neighbor () arrives traditional simulated annealing, it will generate the given state of a randomly selected neighbours.Most Afterwards, it obtains best solution.Coprocessor is connected to host node and duplication route information table by function Connect () 3.Function AssignGlobalAddress () is by coordinated allocation cluster memory according to the global address of block sequence.
Parameter S0 initial solution, parameter Sbest are preferably solution best at present, and parameter T0 is initial temperature, and α is cold But speed, β are a constants, and M represents the time until the update of next parameter, and the maximum time limit is the lehr attendant of total time Skill.
Following pseudocode gives the described memory block combined method for being directed to nonuniform memory access framework.
In the algorithm, most important function is Neighbor ().One of its one randomly selected neighbour of generation gives Fixed state.In " window block ", each node will be rearranged by rule 3;Outside " block " window, each window Block will all rearrange.
In this model, there are 8 nodes and 4 window blocks.By many experiments as a result, the overhead of best solution Finally it is converged in 27.Best situation is 3 times, top it all off 15 times, as shown in Figure 5.

Claims (3)

1. a kind of memory block combined method based on nonuniform memory access framework, which comprises the steps of:
Step 1: the memory that enabled node is provided according to node frequency, by the memory logic of the enabled node of same frequency Connect and compose a memory block;The frequency is used to indicate the connection speed of node;
Step 2: using memory block as window block, by adjusting each in putting in order between each window block and window block Enabled node puts in order, and determines the smallest logic arrangement of link cost as a result, wherein including in the logic arrangement result Host node in the smallest logic arrangement of link cost, in the routing table by logic arrangement result record;
Step 3: the routing table is stored in the control processor being connected with the host node, and passes through the control Processor distributes to each memory block global address, to construct memory cloud;
The step 2 includes:
An enabled node is first chosen from the enabled node as host node, wherein the main section by simulated annealing Point is the connecting interface of the control processor;
By each window block, arranged according to the link cost sequence from small to large of the host node to window block, and will be each Enabled node in window block according to the host node to the enabled node in each window block link cost row from small to large Sequence is arranged.
2. the memory block combined method according to claim 1 based on nonuniform memory access framework, it is characterised in that: institute Stating step 3 includes that the host node is connected by bus with the control processor.
3. a kind of memory block combination unit based on framework nonuniform memory access framework, which is characterized in that described device includes:
Division module, the memory for providing enabled node, will be in the enabled nodes of same frequency according to the frequency of node It deposits logical connection and constitutes a memory block;The frequency is used to indicate the connection speed of node;
Processing module is used for using memory block as window block, by adjusting putting in order between each window block and window block In each enabled node put in order, determine the smallest logic arrangement of link cost as a result, the wherein logic arrangement result In include the smallest logic arrangement of link cost in host node, by the logic arrangement result record in the routing table;
Module being constructed, for being stored in the routing table in the control processor being connected with the host node, and passing through institute It states control processor and distributes to each memory block global address, to construct memory cloud;
The processing module is also used to first choose an enabled node conduct from the enabled node by simulated annealing Host node, wherein the host node is the connecting interface of the control processor;
The processing module, be also used to each window block, according to the host node to window block link cost from small to large It is arranged, and by the enabled node in each window block according to the host node to the company of the enabled node in each window block The sequence being connected into originally from small to large is arranged.
CN201610844237.7A 2016-09-23 2016-09-23 A kind of memory block combined method and device based on nonuniform memory access framework Active CN106383791B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610844237.7A CN106383791B (en) 2016-09-23 2016-09-23 A kind of memory block combined method and device based on nonuniform memory access framework

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610844237.7A CN106383791B (en) 2016-09-23 2016-09-23 A kind of memory block combined method and device based on nonuniform memory access framework

Publications (2)

Publication Number Publication Date
CN106383791A CN106383791A (en) 2017-02-08
CN106383791B true CN106383791B (en) 2019-07-12

Family

ID=57936804

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610844237.7A Active CN106383791B (en) 2016-09-23 2016-09-23 A kind of memory block combined method and device based on nonuniform memory access framework

Country Status (1)

Country Link
CN (1) CN106383791B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112558869A (en) * 2020-12-11 2021-03-26 北京航天世景信息技术有限公司 Remote sensing image caching method based on big data

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104144194A (en) * 2013-05-10 2014-11-12 中国移动通信集团公司 Data processing method and device for cloud storage system
CN104199718A (en) * 2014-08-22 2014-12-10 上海交通大学 Dispatching method of virtual processor based on NUMA high-performance network cache resource affinity
CN104506362A (en) * 2014-12-29 2015-04-08 浪潮电子信息产业股份有限公司 Method for system state switching and monitoring on CC-NUMA (cache coherent-non uniform memory access architecture) multi-node server
CN104657198A (en) * 2015-01-24 2015-05-27 深圳职业技术学院 Memory access optimization method and memory access optimization system for NUMA (Non-Uniform Memory Access) architecture system in virtual machine environment
CN104850461A (en) * 2015-05-12 2015-08-19 华中科技大学 NUMA-oriented virtual cpu (central processing unit) scheduling and optimizing method
CN105391590A (en) * 2015-12-26 2016-03-09 深圳职业技术学院 Method and system for automatically obtaining system routing table of NUMA

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6754782B2 (en) * 2001-06-21 2004-06-22 International Business Machines Corporation Decentralized global coherency management in a multi-node computer system

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104144194A (en) * 2013-05-10 2014-11-12 中国移动通信集团公司 Data processing method and device for cloud storage system
CN104199718A (en) * 2014-08-22 2014-12-10 上海交通大学 Dispatching method of virtual processor based on NUMA high-performance network cache resource affinity
CN104506362A (en) * 2014-12-29 2015-04-08 浪潮电子信息产业股份有限公司 Method for system state switching and monitoring on CC-NUMA (cache coherent-non uniform memory access architecture) multi-node server
CN104657198A (en) * 2015-01-24 2015-05-27 深圳职业技术学院 Memory access optimization method and memory access optimization system for NUMA (Non-Uniform Memory Access) architecture system in virtual machine environment
CN104850461A (en) * 2015-05-12 2015-08-19 华中科技大学 NUMA-oriented virtual cpu (central processing unit) scheduling and optimizing method
CN105391590A (en) * 2015-12-26 2016-03-09 深圳职业技术学院 Method and system for automatically obtaining system routing table of NUMA

Also Published As

Publication number Publication date
CN106383791A (en) 2017-02-08

Similar Documents

Publication Publication Date Title
Zhang et al. GraphP: Reducing communication for PIM-based graph processing with efficient data partition
JP4857274B2 (en) Optimization of application layout on massively parallel supercomputer
Dally Express cubes: Improving the performance ofk-ary n-cube interconnection networks
Mamidala et al. MPI collectives on modern multicore clusters: Performance optimizations and communication characteristics
Siegel et al. Using the multistage cube network topology in parallel supercomputers
JPH0766718A (en) Wafer scale structure for programmable logic
US8447954B2 (en) Parallel pipelined vector reduction in a data processing system
Firuzan et al. Reconfigurable network-on-chip for 3D neural network accelerators
Wang et al. A message-passing multi-softcore architecture on FPGA for breadth-first search
Li et al. On data center network architectures for interconnecting dual-port servers
Chen et al. Tology-aware optimal data placement algorithm for network traffic optimization
Zhou et al. Cost-aware partitioning for efficient large graph processing in geo-distributed datacenters
Musha et al. Deep learning on high performance FPGA switching boards: Flow-in-cloud
CN106383791B (en) A kind of memory block combined method and device based on nonuniform memory access framework
Kobus et al. Gossip: Efficient communication primitives for multi-gpu systems
US20220121928A1 (en) Enhanced reconfigurable interconnect network
Xie et al. Mesh-of-Torus: a new topology for server-centric data center networks
Mirsadeghi et al. PTRAM: A parallel topology-and routing-aware mapping framework for large-scale HPC systems
Sun et al. Multi-node acceleration for large-scale GCNs
Balkan et al. An area-efficient high-throughput hybrid interconnection network for single-chip parallel processing
Lin et al. A distributed resource management mechanism for a partitionable multiprocessor system
Fernández et al. Efficient VLSI layouts for homogeneous product networks
Konstantinidou The selective extra stage butterfly
Mackenzie et al. Comparative modeling of network topologies and routing strategies in multicomputers
Lee Barrier synchronization over multistage interconnection networks

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant