CN107894914A - Buffer consistency treating method and apparatus - Google Patents
Buffer consistency treating method and apparatus Download PDFInfo
- Publication number
- CN107894914A CN107894914A CN201610878873.1A CN201610878873A CN107894914A CN 107894914 A CN107894914 A CN 107894914A CN 201610878873 A CN201610878873 A CN 201610878873A CN 107894914 A CN107894914 A CN 107894914A
- Authority
- CN
- China
- Prior art keywords
- processor
- mark
- consistency maintenance
- core
- processor core
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0806—Multiuser, multiprocessor or multiprocessing cache systems
- G06F12/0813—Multiuser, multiprocessor or multiprocessing cache systems with a network or matrix configuration
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/16—Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/44—Arrangements for executing specific programs
- G06F9/455—Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/44—Arrangements for executing specific programs
- G06F9/455—Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
- G06F9/45533—Hypervisors; Virtual machine monitors
- G06F9/45558—Hypervisor-specific management and integration aspects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/44—Arrangements for executing specific programs
- G06F9/455—Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
- G06F9/45533—Hypervisors; Virtual machine monitors
- G06F9/45558—Hypervisor-specific management and integration aspects
- G06F2009/45562—Creating, deleting, cloning virtual machine instances
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computer Hardware Design (AREA)
- Mathematical Physics (AREA)
- Multi Processors (AREA)
- Memory System Of A Hierarchy Structure (AREA)
Abstract
The application is related to buffer consistency (Cache Coherency, CC) treatment technology in field of cloud computer technology, more particularly to chip multi-core processor.Wherein, cache coherence method includes:The first router receives the consistency maintenance request that directly connected first processor core is sent, and consistency maintenance request carries the mark of first processor core;Default consistency maintenance list item is inquired about according to the mark of first processor core, the consistency maintenance order according to the generation of above-mentioned consistency maintenance list item for the Uniform Domains;And the consistency maintenance order is sent to other processor cores of the Uniform Domains by network-on-chip.Using such scheme, the buffer consistency in certain area is effectively being safeguarded simultaneously, the problem of and can enough reduces the existing communication of chip multi-core processor in the prior art and area overhead.
Description
Technical field
The present invention relates to the buffer consistency (Cache in field of cloud computer technology, more particularly to chip multi-core processor
Coherency) treatment technology.
Background technology
Chip multi-core processor (English:Chip Multi-processors, referred to as:CMPs) framework sets as processor
The main flow framework of meter, it passes through router (English by many processor cores (Core):Router) network-on-chip (the English formed
Text:Network on chip, referred to as:NoC) connect and compose.Caching (Cache) in chip designs according to classification, its
In, L1 caching (English:Level-1cache) it is privately owned caching, it is designed in inside processor core, wherein the data stored
Block is only capable of being accessed by the processor core;L2 caching (English:Level-2cache) it is shared buffer memory, wherein the data block stored is
It shared data block, can be accessed by multiple processor cores, can be typically designed in by the way of centralization outside processor core, or
Person by being distributed is designed near each processor core, i.e., according to being physically distributed near each processor core and pass through piece
Upper network is interconnected, and can be shared in logic.
In the application of chip multi-core processor, some data blocks be present by one or more processing in the processor
The scene that device core accesses.For the scene, it will usually the data block is stored in shared buffer memory, so that one or more is handled
Device core is able to access that.In order to accelerate the access of data block, the privately owned of the one or more processors core of the data block was being accessed
The copy of the data block is created in caching, so when a certain core for accessing the data block needs to access the data block again,
Only needs carry out the reading of the data block into the privately owned caching of the core.Because in one or more accessed processor core
Privately owned caching in have the copy of the data block, it is necessary to safeguard copy of the data block in the privately owned caching of multiple cores it
Between uniformity, solve copy between consistency problem be referred to as buffer consistency (Cache Coherence) problem.It is slow to solve this
The general principle for depositing consistency problem is that other are stored with the data block when the copy of the data block in some core is changed
The processor core of copy needs to carry out consistency operation (to update in other cores the copy of the data block or delete the data block and exist
Copy in other cores), this in which of polycaryon processor core just it needs to be determined that there is copy in the data block.
The concrete mode for solving buffer consistency mainly includes:Consistency protocol and base based on catalogue (Directory)
In the consistency protocol for intercepting (Snooping).Wherein, the consistency protocol based on catalogue is come record data block using catalogue
Which accessed by processor core, when needing to carry out consistency maintenance, by the list item of query directory, it is determined that being stored with the data
The processor core of the copy of block, consistency operation then is carried out to the data block in corresponding processor core;Based on what is intercepted
Consistency protocol is by the way of all processor core bus modes, i.e., when some processor core have modified in its privately owned caching
A certain data block after, the consistency maintenance message is broadcasted in bus so that other are stored with the processing of the data block copy
Device core carries out consistency operation.
With the development of cloud computing, the concept of resource pool is arisen at the historic moment, resource (processor, internal memory, the net of original server
Network I/O port etc.) 1 or the use of multiple virtual machines are assigned to, can be that virtual machine distribute this by resource management software
A little resources.Under cloud computing scene, how the processor resource for distributing to virtual machine realizes that buffer consistency is that industry faces
Problem.
The content of the invention
It is more on buffer consistency processing and piece to realize this document describes a kind of buffer consistency treating method and apparatus
The communication of core processor and the balance of area overhead.
On the one hand, embodiments herein provides a kind of buffer consistency processing method.Method is applied at chip multi-core
Device is managed, including the first router receives the consistency maintenance request that directly connected first processor core is sent, this is consistent
Property maintenance request carry first processor core mark;Default consistency maintenance table is inquired about according to the mark of first processor core
, the consistency maintenance order of Uniform Domains of the generation belonging to the first processor core;By the consistency maintenance of generation
Order is sent to other processor cores of above-mentioned Uniform Domains by network-on-chip.Adopt in manner just described, enable to one
Cause property is safeguarded and only completed in the Uniform Domains that caching data block write operation occurs, and institute's band is safeguarded so as to avoid global coherency
The expense come.
In a kind of possible design, the first router is default consistent according to the inquiry of the mark of the first processor core
Property maintenance list item, the mark of the affiliated Uniform Domains of first processor core is obtained, and in the Uniform Domains
The mark of other processor cores;And the mark of the first router Uniform Domains according to where the first processor core, with
And in the Uniform Domains other processor cores mark, generation for the Uniform Domains consistency maintenance order
Order.
In a kind of possible design, the first router is according to network-on-chip topology status, and above-mentioned Uniform Domains
The mark of other interior processor cores, determine at least one route transmission path, wherein, every route transmission path by and uniformity
The connected router composition of other processor cores in region;For every route transmission path, the first router is to above-mentioned consistent
Property maintenance command be reconstructed processing, consistency maintenance order of the generation for this route transmission path;Utilize every route
Transmission path, the consistency maintenance order for this route transmission path is sent to the processor on the route transmission path
Core.By the way of this route directional transmissions, the existing broadcast of network-on-chip caused by broadcast mode can be avoided passing through
Storm risk, significantly reduce the pressure of the router included by network-on-chip.
In a kind of possible design, the first router according to the marks of other processor cores in Uniform Domains, it is determined that
With the mark of the router that other processor cores are connected in the Uniform Domains;And according to other processing in the Uniform Domains
The mark of the router of device nuclear phase even, and the topology status of network-on-chip, the first router enter walking along the street according to XY routing algorithms
By finding, and determine at least one route transmission path.Above-mentioned route finding process, it is convenient and swift to determine institute in Uniform Domains
Including route transmission path.
In alternatively possible design, default consistency maintenance list item, it can generate in the following way:Resource pipe
Manage device and receive the processor resource distribution request that virtual machine is sent, processor resource distribution request is used to ask the resource management
Device is at least two processors that virtual machine is distributed including first processor core;Explorer provides according to the processor
Source distribution request, generation are directed to the consistency maintenance list item of virtual machine, and the consistency maintenance list item includes:The mark of Uniform Domains
Know, and the mark of processor core that Uniform Domains include.
In alternatively possible design, in explorer according to processor resource distribution request, generation is for virtual
After the consistency maintenance list item of machine, in addition to:Explorer receives the processor resource adjust request that virtual machine is sent, place
Reason device resource adjust request is used to ask explorer to be adjusted the processor core for distributing to the virtual machine;Resource pipe
Device is managed according to processor resource adjust request, the consistency maintenance list item for virtual machine is adjusted.
Wherein, when processor resource adjustment is to reduce processor core, explorer needs that reduced place will be treated in advance
Data cached in reason device core is write back in internal memory, then empties the data for treating reduced processor core, and above-mentioned consistent
Property maintenance list item in treat reduced processor core by above-mentioned mark delete;And when processor resource adjustment is increase processor
During core, explorer is by the identification record of processor core to be increased in above-mentioned consistency maintenance list item.
On the other hand, the embodiments of the invention provide a kind of processor chips, the processor chips to include:Multiple processors
Core, including first processor core;Network-on-chip, connected and composed by multiple routers, wherein, including the first router, described
One router and the first processor core are joined directly together;Wherein the first router includes:Processor core connectivity port, for
The first processor nuclear phase connects;At least one output port, it is connected at least one router with the network-on-chip;
Cache module, for storing default consistency maintenance list item;Processing module, for being connect by the processor core connectivity port
The consistency maintenance request that the first processor core is sent is received, the consistency maintenance request carries the first processor core
Mark;The consistency maintenance list item of the buffer storage, and root are inquired about according to the mark of the first processor core
Consistency maintenance order according to consistency maintenance list item generation for the Uniform Domains;The consistency maintenance is ordered
Order is sent to other processor cores of the Uniform Domains by the output port.
In a possible design, processing module is additionally operable to:According to the inquiry of the mark of the first processor core
The consistency maintenance list item of cache module storage, obtain the mark of the affiliated Uniform Domains of first processor core, Yi Jiwei
In the mark of other processor cores in the Uniform Domains;And the Uniform Domains according to where the first processor core
Mark, and in the Uniform Domains other processor cores mark, generation is consistent for the Uniform Domains
Property maintenance command.
In a possible design, processing module is additionally operable to:According to the network-on-chip topology status, and described one
The mark of other processor cores, the first via determine at least one route transmission path in cause property region, wherein, described in every
Route transmission path is made up of the router being connected with other processor cores in the Uniform Domains;For route every described
Transmission path, the consistency maintenance order is reconstructed processing, and generation is tieed up for the uniformity in this route transmission path
Shield order;Using route transmission path every described, the consistency maintenance order for this route transmission path is passed through into institute
State the processor core that output port is sent on the route transmission path.
In a possible design, processing module is additionally operable to:According to the mark of other processor cores in Uniform Domains,
It is determined that the mark with the router that other processor cores are connected in Uniform Domains;According to other processor cores in Uniform Domains
The mark of connected router, and the topology status of network-on-chip, route discovery is carried out according to XY routing algorithms, and determined extremely
A few route transmission path.
Another aspect, the embodiments of the invention provide a kind of computer system, including the processing as disclosed in previous aspect
Device chip, and explorer, the distribution request of the processor resource for receiving virtual machine transmission, wherein, processor money
Source distribution request is used to ask the explorer to distribute at least two processors for the virtual machine;And according to processor
Resource allocation request, generation are directed to the consistency maintenance list item of virtual machine, and consistency maintenance list item includes:The mark of Uniform Domains
Know, and the mark of processor core that Uniform Domains include.The function of explorer can be realized by hardware, can also
Corresponding software is performed by hardware to realize.The hardware or software include one or more moulds corresponding with above-mentioned function phase
Block.The module can be software and/or hardware.
In a possible design, explorer is additionally operable to:The processor resource adjustment that virtual machine is sent is received to ask
Ask, processor resource adjust request is used to ask explorer to be adjusted the processor core for distributing to virtual machine;And
According to processor resource adjust request, the consistency maintenance list item for virtual machine is adjusted.
In alternatively possible design, explorer is additionally operable to:Included in processor resource adjust request to processing
When device resource is reduced, explorer need in advance by treat in reduced processor core it is data cached write back in internal memory, so
The data for treating reduced processor core are emptied afterwards, and reduced processor core is treated by above-mentioned in above-mentioned consistency maintenance list item
Mark delete.
In another possible design, explorer is additionally operable to:When processor resource adjustment is increase processing
During device core, by the identification record of the processor core to be increased in the consistency maintenance list item.
Another further aspect, the embodiments of the invention provide a kind of computer-readable storage medium, for saving as above-mentioned router institute
Computer software instructions, it, which is included, is used to perform the program designed by above-mentioned aspect.
Another further aspect, the embodiments of the invention provide a kind of computer-readable storage medium, for saving as above-mentioned resource management
Computer software instructions used in device, it, which is included, is used to perform the program designed by above-mentioned aspect.
Compared to prior art, scheme provided by the invention can more flexibly manage consistency maintenance region, with reality
Existing buffer consistency processing and the communication of chip multi-core processor and the balance of area overhead.
Brief description of the drawings
The required accompanying drawing used in embodiment or description of the prior art will be briefly described below.
Fig. 1 is the cloud computing configuration diagram that the present invention applies;
Fig. 2 is a kind of schematic diagram of processor chips of the present invention;
Fig. 3 is buffer consistency process flow schematic diagram provided in an embodiment of the present invention;
Fig. 4 is a kind of structural representation of router-module provided in an embodiment of the present invention.
Embodiment
Below in conjunction with the accompanying drawings, the technical scheme in the embodiment of the present invention is explicitly described.
The system architecture and business scenario of description of the embodiment of the present invention are in order to which more clearly the explanation present invention is implemented
The technical scheme of example, does not form the restriction for technical scheme provided in an embodiment of the present invention, those of ordinary skill in the art
Understand, with the differentiation of system architecture and the appearance of new business scene, technical scheme provided in an embodiment of the present invention is for similar
Technical problem, it is equally applicable.
As shown in figure 1, the cloud computing framework that the application is applied is divided into four levels from top to down, it is respectively:
Application layer (Application Layer) 100, the layer run types of applications (English:Application), it is
User provides corresponding service.
Operating system layer (Operating System Layer) 200, the layer include operating system (English:Operating
System, abbreviation:OS), it is responsible for operating in types of applications distribution hardware resource (processor, internal memory and network thereon
IO).Wherein, operating system and the application operated in thereon form virtual machine (English:Virtual Machine, referred to as:VM)
Software architecture.The operating system for belonging to some virtual machine is to operate in it in the range of the hardware resource that virtual machine is possessed
On application distribution hardware resource (such as:Processor, internal memory and network I/O etc.).By taking Fig. 1 as an example, two virtual machines are shown,
Wherein virtual machine 1 includes operating system 1 and application 1 and using 2, and virtual machine 2 includes operating system 2 and application 3 and application
4。
Resource management layer (Resource Management Layer) 300, the layer run explorer (English:
Resource Manager, referred to as:RM).In cloud computing system, in physical machine (English:Physical Machine, referred to as:
PM multiple virtual machines can be created on).Virtual machine is when being created, it is necessary to be distributed by explorer for virtual machine to be created
Hardware resource, these hardware resources include:Processor, internal memory, network I/O etc..In actual applications, the explorer and quilt
Referred to as virtual machine monitor (English:Virtual Machine Monitor, referred to as:) or Hypervisor VMM.
Resource layer (Processor Layer) 400, including the hardware resource that resource management layer 300 can manage, as
Citing, Fig. 1 is shown with processor, internal memory and network I/O etc..
As the citing of processor, Fig. 2 shows chip multi-core processor, and it is by 16 processor cores (abbreviation:Core, English
Text:Core) network-on-chip being made up of 16 routers connects and composes.Wherein, each core and a direct phase of router
Even, the network-on-chip being made up of router, can realize intercore communication.It should be noted that can be between core and router
(electrically connected, such as by wired connection:Connected using copper cash;Or light connects, such as:Connected using optical fiber), nothing can also be used
Line mode connects.
Solution of the embodiment of the present invention towards the cache coherency problems under cloud computing scene.In cloud computing scene
Under, it can create at least two virtual machines on a server.When a virtual machine is created, it is necessary to which explorer is it
Hardware resource is distributed, wherein, explorer can be that a number of processor core of virtual machine distribution is used as the virtual machine
Process resource.What the hardware resource that any two virtual machines are possessed logically was entirely isolated.Based on this applied field
Scape, in the buffer consistency operation process of chip multi-core processor, avoid the need for by the way of " global coherency "
Remove to carry out buffer consistency in all processor cores to handle, and only need to carry out locally coherence processing.
Specifically illustrated with Fig. 2, the chip multi-core processor includes 16 processor cores, it is assumed that the chip multi-core processor
Two virtual machines are distributed to by explorer to use.Wherein, processor core 0~9 and 12~13 is assigned to virtual machine 1,
Processor core 10~11 and 14~15 is assigned to virtual machine 2, due to distributing to the processor core of different virtual machine in logic
It is isolation, therefore, the processor core in Fig. 2 is divided into two Uniform Domains, such as table one:
Uniform Domains are numbered | Comprising processor core |
Uniform Domains 1 | Core 0~9, and core 12~13 |
Uniform Domains 2 | Core 10~11, and core 14~15 |
Table one
For virtual machine 1, by taking processor core 1 as an example, when core 1 enters to a certain caching data block of the privately owned caching in the core
, it is necessary in the Uniform Domains 1 where processor core 1 after row write operation, consistency maintenance is carried out for the caching data block
Operation, its processing procedure are as follows:
Step 310, the first router receive the consistency maintenance request that first processor core is sent, and the consistency maintenance please
Seek the mark for carrying the first processor core.
As the citing of implementation process, referring to Fig. 2, consistency maintenance request is the processing by completing data block write operation
What the cache controller in device core 1 was sent, the mark of processor core 1 is carried in consistency operation request.Wherein, processor
The mark of core can be numbering of the system to processor core.
Step 320, the first router inquire about default consistency maintenance list item according to the mark of first processor core, obtain
The mark of the affiliated Uniform Domains of first processor core, and in the Uniform Domains other processor cores mark.
Wherein, consistency maintenance list item, it is in virtual machine creating, virtual machine transmission is received by explorer
After managing device resource allocation request, generated according to processor resource distribution request.Wherein, processor resource distribution request is used for
It is that above-mentioned virtual machine distributes at least two processors to ask explorer.Wherein, as an example, consistency maintenance list item structure
Including:Significance bit (Valid), Uniform Domains mark (English:Coherence Domain Identifier, referred to as:
CDID), and Uniform Domains core home identity (Core IDs), wherein:Significance bit is used for representing whether the list item is effective;
Uniform Domains mark is used for representing the mark of the Uniform Domains;The core home identity of Uniform Domains uses bit-masks
The form of (Bit Mask), represent whether corresponding processor core belongs to Uniform Domains (wherein, the ratio from right to left with this
Specially for 1, then it represents that belong to the Uniform Domains, the bit is 0, then it represents that is not belonging to the Uniform Domains).With one in Fig. 2
Exemplified by cause property region 1 and Uniform Domains 2, the maintenance list item of the Uniform Domains represents as follows, wherein, on the piece shown in Fig. 2
Polycaryon processor includes 16 cores, represents (from right to left, sequentially to represent 0~core of core using the vector of 16 bits compositions
15), as shown in Table 2:
Table two
Consistency maintenance list item is sent respectively to corresponding by explorer after above-mentioned consistency maintenance list item is created
Router in Uniform Domains.As an example, referring to Fig. 2, explorer is by the consistency maintenance table of Uniform Domains 1
, the router included by Uniform Domains 1 is sent to, the consistency maintenance list item can be stored in itself by these routers
Caching or router register in.
In this step, the first router inquires about the consistency maintenance list item according to the mark of first processor core, specifically
It can pass through:The core home identity of the Uniform Domains of above-mentioned consistency maintenance list item is inquired about with the numbering of first processor core,
Determine the mark of the Uniform Domains belonging to first processor core, and other processor cores in the Uniform Domains
Mark.
The mark of step 330, the first router Uniform Domains according to where first processor core, and positioned at uniformity
The mark of other processor cores in region, consistency maintenance order of the generation for the Uniform Domains.
In this step, the first router is according to the mark positioned at other processor cores of Uniform Domains, and generation is for being somebody's turn to do
The consistency maintenance order of Uniform Domains.As an example, consistency maintenance order is the increase on the basis of pack arrangement is route
The field such as CDID domains and Core IDs domains, wherein the meaning in each domain is as shown in Table 3:
Flit Type | Src Addr | Dst Addr | CDID | Core IDs | Payload |
0x99 | 1 | - | 1 | 0011001111111111 | Set a=2 |
Table three
Flit Type:Uniformity command type is defined, such as:0x99;
Src Addr:Source address, the mark for referring to initiating the Core of consistency maintenance request in this programme is (in this reality
Example citing is applied, it is processor core 1 to initiate consistency maintenance request, 1) its numbering is;
Dst Addr:Destination address (is reserved) herein;
CDID:I.e. in consistency maintenance list item Uniform Domains mark (the present embodiment citing in, the Uniform Domains
It is identified as 1);
Core IDs:Uniform Domains i.e. in consistency maintenance list item core home identity (the present embodiment citing in,
Processor core corresponding to the core home identity of Uniform Domains is core 0~9 and core 12~13);
Payload:Refer to the consistency operation content specifically included in the consistency maintenance order (such as:By a certain change
The data of amount are updated to a certain numerical value, in the citing of the present embodiment, set variable a value as 2).
Above-mentioned consistency maintenance order is sent to Uniform Domains by step 340, the first router by network-on-chip
Other processor cores.
In this step, two kinds of implementations be present:
Mode one:By the way of broadcast
As an example, in fig. 2, router 1 using broadcast mode by the consistency maintenance order of generation be sent to this one
Cause property region others processor core (i.e. core 0~9 and core 12~13), by network-on-chip, router 1 is by consistency maintenance
Other routers (router 0, and router 2~15) that order passes through in network-on-chip are sent to the chip multi-core processor
Upper other processor cores (core 0, and core 2~15), the processor core of above-mentioned consistency maintenance order is each received, by itself
Mark and the Core Ids domains of above-mentioned consistency maintenance order be compared, when finding that bit is 1 corresponding to Core IDs,
Then illustrate that the processor core belongs to above-mentioned Uniform Domains, it is necessary to carry out consistency treatment according to consistency maintenance order;Work as hair
When bit corresponding to existing Core IDs is 0, then illustrates that the processor core is not belonging to above-mentioned Uniform Domains, then abandon the uniformity
Maintenance command, it is without any processing.
Mode two:By the way of directive sending is route
As an example, in fig. 2, router 1 obtain network-on-chip topology status, and the Uniform Domains in other
The mark of processor core, at least one route transmission path is determined according to the information of above-mentioned acquisition.Specifically, router 1 can
Perceive the topology status of network-on-chip, calculate the mode of its possible next-hop, confirm next-hop node, i.e., router (0,2,
5).For example, in 2D Mesh networks, according to XY method for routing, (x+1, y), (x-1, y), (x, y+1), (x, y- are calculated respectively
1) the first hop node for, determining router 1 is router (0,2,5).Router 1 according to obtained in consistency maintenance list item this one
The mark of other processor cores of cause property region, judges that the processor core that router (0,2,5) is connected belongs to Uniform Domains 1;
Then the selection of next-hop node is carried out in a comparable manner, the second hop node of router 1 is router (3,4,6,9),
According to the mark for getting other processor cores in Uniform Domains 1, the processor core that router (3,4,6,9) is connected is determined
Fall within Uniform Domains 1;Continue to judge with this, until completing the traversal of routing node belonged in Uniform Domains 1
Judge;Thus, it is possible to obtain three route transmission paths, it is respectively:1 → 0 → 4 → 8 → 12 (westwards routeing), 1 → 5 → 9 →
13 (routes to the south), 1 → 2 → 3 (6) → 7 (are route eastwards).
For every route transmission path, processing is reconstructed to above-mentioned consistency maintenance order in the first router, generation
For the consistency maintenance order in this route transmission path, i.e., referred to according to consistency maintenance order, generation route directive sending
Order, represent as follows:
(1) it route directive sending instruction (westwards routeing) as shown in Table 4:
Flit Type | Src Addr | Dst Addr | CDID | Core IDs | Payload |
0x99 | 1 | - | 1 | 0001000100010001 | Set a=2 |
Table four
(2) it route directive sending instruction (routeing eastwards) as shown in Table 5:
Flit Type | Src Addr | Dst Addr | CDID | Core IDs | Payload |
0x99 | 1 | - | 1 | 0000000011001100 | Set a=2 |
Table five
(3) it route directive sending instruction (route to the south) as shown in Table 6:
Flit Type | Src Addr | Dst Addr | CDID | Core IDs | Payload |
0x99 | 1 | - | 1 | 0010001000100000 | Set a=2 |
Table six
It should be noted that when the route directive sending instruction routeing eastwards is sent to router 2, due to having 1 → 2
→ 3 → 7 and 1 → 2 → 6 Liang Ge branches, on router 2, the route directive sending instruction routeing eastwards can be decomposed further
For following two:
(4) 2 → 3 → 7 route directive sending instructions, as shown in Table 7:
Flit Type | Src Addr | Dst Addr | CDID | Core IDs | Payload |
0x99 | 1 | - | 1 | 000000010001000 | Set a=2 |
Table seven
(5) 2 → 6 route directive sending instructions, as shown in Table 8:
Flit Type | Src Addr | Dst Addr | CDID | Core IDs | Payload |
0x99 | 1 | - | 1 | 0000000001000000 | Set a=2 |
Table eight
After reconstructing for the route directive sending instruction in route transmission path, using every route transmission path, by pin
Consistency maintenance order to this route transmission path is sent to the processor core on above-mentioned route transmission path.
By the way of this route directive sending, broadcast storm risk existing for network-on-chip can be avoided, effectively
Ground reduces the message package forwarding pressure of each routing node of network-on-chip.
As the expansion of above-described embodiment scheme, the processor resource adjustment that virtual machine transmission is received in explorer please
After asking, the processor resource adjust request is used to ask explorer to be adjusted the processor core for distributing to virtual machine;
Explorer is adjusted according to above-mentioned processor resource adjust request to the consistency maintenance list item for above-mentioned virtual machine
It is whole.
Above-mentioned adjust request includes two types:
(1) increase of processor core
When processor resource adjust request is included to processor resource increase, explorer is needed in advance to be increased
Processor core in it is data cached write back to internal memory (also known as:Main storage) in, then by processor core to be increased
Data empty, and by the identification record of processor core to be increased in above-mentioned consistency maintenance list item.
(2) reduction of processor core
When processor resource adjust request includes and processor resource is reduced, explorer needs to wait to reduce in advance
Processor core in it is data cached write back in internal memory, then the data for treating reduced processor core are emptied, and above-mentioned
The above-mentioned mark for treating reduced processor core is deleted in consistency maintenance list item.
Fig. 2 show a kind of processor chips 200 being related in above-described embodiment design block diagram (be limited to size,
Only draw the part of the processor chips).Wherein, the processor chips 200 include:
Multiple processor cores, including first processor core 210 (as an example, being used as first processor core using core 1);
Network-on-chip, connected and composed by multiple routers, wherein, including the first router 220, the He of the first router 220
First processor core 210 is joined directly together;
The first router 220 includes:
Processor core connectivity port 221, for being connected with first processor core 210;
At least one output port 222, it is connected at least one router with network-on-chip;
Cache module 223, for storing default consistency maintenance list item;
Processing module 224, for receiving the consistent of the transmission of first processor core 210 by processor core connectivity port 221
Property maintenance request, consistency maintenance request carries the mark of first processor core 210;Looked into according to the mark of first processor core
The consistency maintenance list item that the cache module 223 stores is ask, obtains Uniform Domains belonging to first processor core 210
Mark, and in the Uniform Domains other processor cores mark;One according to where the first processor core 210
Cause property region mark, and in the Uniform Domains other processor cores mark, generation be directed to the uniformity
The consistency maintenance order in region;The consistency maintenance order is sent to the Uniform Domains by output port 222
Other processor cores.
Fig. 1 also show a kind of computer system, and the system includes:Explorer, operate on explorer
Virtual machine, and the processor chips such as previous embodiment offer.
Wherein, explorer, the distribution request of the processor resource for receiving virtual machine transmission, the processor money
Source distribution request is used to ask the explorer to distribute at least two processors for the virtual machine;And according to the place
Device resource allocation request is managed, generation is directed to the consistency maintenance list item of the virtual machine, and the consistency maintenance list item includes:Institute
State the mark of Uniform Domains, and the mark of processor core that the Uniform Domains include.
Further, explorer, it is additionally operable to:Receive the processor resource adjust request that the virtual machine is sent, institute
Processor resource adjust request is stated to be used to ask the explorer to adjust the processor core for distributing to the virtual machine
It is whole;And according to the processor resource adjust request, the consistency maintenance list item for the virtual machine is adjusted.
Further, explorer, it is additionally operable to:When processor resource adjust request is to reduce processor core, resource
The data cached data for writing back in internal memory, then treating reduced processor core that manager will be treated in reduced processor core
Empty, and delete the mark for treating reduced processor core in consistency maintenance list item;And
When processor resource adjustment is increase processor core, by the identification record of processor core to be increased described one
In cause property maintenance list item.
Wherein, explorer can use software to realize, or be realized by the way of hardware+software.Such as using hard
It is central processing unit (English for performing the above-mentioned explorer of the present invention when part mode is realized:Central Processing
Unit, abbreviation:CPU), general processor, digital signal processor (English:Digital Signal Processor, abbreviation:
DSP), application specific integrated circuit (English:Application-specific integrated circuit, abbreviation:ASIC), it is existing
Field programmable gate array (English:Field Programmable Gate Array, abbreviation:FPGA) or other may be programmed and patrol
Collect device, transistor logic, hardware component or its any combination.It can be realized or performed with reference in of the invention disclose
Hold described various exemplary logic blocks, module and circuit.The processor can also be the group for realizing computing function
Close, such as combined comprising one or more microprocessors, combination of DSP and microprocessor etc..
The step of method or algorithm with reference to described by the disclosure of invention, can be realized in a manner of hardware, also may be used
By be by computing device software instruction in a manner of realize.Software instruction can be made up of corresponding software module, software mould
Block can be stored on RAM (Random Access Memory, random access memory) memory, flash memory, ROM (Read-
Only Memory, read-only storage) memory, (Erasable Programmable Read-Only Memory, can by EPROM
Erasable programmable read only memory) memory, EEPROM (Electrically-Erasable Programmable Read-
Only Memory, EEPROM) memory, register, hard disk, mobile hard disk, CD-ROM or ability
In the storage medium of any other form known to domain.A kind of exemplary storage medium is coupled to processor, so that processing
Device can be from the read information, and can write information to the storage medium.Certainly, storage medium can also be processing
The part of device.Processor and storage medium can be located in ASIC.In addition, the ASIC can be located in user equipment.When
So, processor and storage medium can also be present in user equipment as discrete assembly.
Those skilled in the art are it will be appreciated that in said one or multiple examples, work(described in the invention
It is able to can be realized with hardware, software, firmware or their any combination.When implemented in software, can be by these functions
It is stored in computer-readable medium or is transmitted as one or more instructions on computer-readable medium or code.
Computer-readable medium includes computer-readable storage medium and communication media, and wherein communication media includes being easy to from a place to another
Any medium of one place transmission computer program.It is any that storage medium can be that universal or special computer can access
Usable medium.
Above-described embodiment, the purpose of the present invention, technical scheme and beneficial effect are carried out further
Describe in detail, should be understood that the embodiment that the foregoing is only the present invention, be not intended to limit the present invention
Protection domain, all any modification, equivalent substitution and improvements on the basis of technical scheme, done etc., all should
It is included within protection scope of the present invention.
Claims (14)
- A kind of 1. buffer consistency processing method, applied to chip multi-core processor, it is characterised in that methods described includes:The first router receives the consistency maintenance request that the first processor core that is directly connected is sent, the consistency maintenance Request carries the mark of the first processor core;The first router inquires about default consistency maintenance list item according to the mark of the first processor core, and according to institute State consistency maintenance order of the consistency maintenance list item generation for the Uniform Domains;The consistency maintenance order is sent to the Uniform Domains by the first router by the network-on-chip Other processor cores.
- 2. according to the method for claim 1, it is characterised in that the first router is according to the first processor core Mark inquires about default consistency maintenance list item, and according to consistency maintenance list item generation for the Uniform Domains Consistency maintenance order includes:The first router inquires about default consistency maintenance list item according to the mark of the first processor core, described in acquisition The mark of the affiliated Uniform Domains of first processor core, and in the Uniform Domains other processor cores mark;The mark of the first router Uniform Domains according to where the first processor core, and positioned at the uniformity The mark of other processor cores in region, consistency maintenance order of the generation for the Uniform Domains.
- 3. method according to claim 1 or 2, it is characterised in that the first router orders the consistency maintenance Order is sent to other processor cores of the Uniform Domains by the network-on-chip, including:According to the network-on-chip topology status, and in the Uniform Domains other processor cores mark, described first Router determines at least one route transmission path, wherein, every route transmission path by with the Uniform Domains The connected router composition of other processor cores;For route transmission path every described, processing is reconstructed to the consistency maintenance order in the first router, Consistency maintenance order of the generation for this route transmission path;Using route transmission path every described, the consistency maintenance order for this route transmission path is sent to described Processor core on route transmission path.
- 4. according to the method for claim 3, it is characterised in that described according to the network-on-chip topology status, Yi Jisuo The mark of other processor cores Uniform Domains Nei is stated, the first router determines at least one route transmission path, including:According to the mark of other processor cores in the Uniform Domains, the first router determines and the Uniform Domains The mark of the connected router of other interior processor cores;The mark for the router being connected according to other processor cores in the Uniform Domains, and the topology of the network-on-chip State, the first router carries out route discovery according to XY routing algorithms, and determines at least one route transmission path.
- 5. according to any described methods of claim 1-4, it is characterised in that at the first router is according to described first The mark for managing device core is inquired about before default consistency maintenance list item, and methods described also includes:Explorer receives the processor resource distribution request that virtual machine is sent, and the processor resource distribution request is used for please At least two processors for asking the explorer to be distributed for the virtual machine including first processor core;The explorer is according to the processor resource distribution request, consistency maintenance table of the generation for the virtual machine , the consistency maintenance list item includes:The mark of the Uniform Domains, and the processor that the Uniform Domains include The mark of core.
- 6. according to the method for claim 5, it is characterised in that in the explorer according to the processor resource point With request, after generation is for the consistency maintenance list item of the virtual machine, methods described also includes:The explorer receives the processor resource adjust request that the virtual machine is sent, and the processor resource adjustment please Ask for asking the explorer to be adjusted the processor core for distributing to the virtual machine;The explorer is according to the processor resource adjust request, to the consistency maintenance list item for the virtual machine It is adjusted.
- 7. according to the method for claim 6, it is characterised in that the processor resource administrative unit is according to the processor Resource adjust request, the consistency maintenance list item for the virtual machine is adjusted, including:When processor resource adjustment is to reduce processor core, the explorer will be treated in reduced processor core It is data cached to write back in internal memory, then the data for treating reduced processor core are emptied, and in the consistency maintenance list item It is middle to delete the mark for treating reduced processor core.
- A kind of 8. processor chips, it is characterised in that including:Multiple processor cores, including first processor core;Network-on-chip, connected and composed by multiple routers, wherein, including the first router, the first router and described One processor core is joined directly together;The first router includes:Processor core connectivity port, for connecting with the first processor nuclear phase;At least one output port, it is connected at least one router with the network-on-chip;Cache module, for storing default consistency maintenance list item;Processing module, for receiving the consistency maintenance of the first processor core transmission by the processor core connectivity port Request, the consistency maintenance request carry the mark of the first processor core;According to the mark of the first processor core The consistency maintenance list item of the buffer storage is inquired about, and described one is directed to according to consistency maintenance list item generation The consistency maintenance order in cause property region;The consistency maintenance order is sent to the uniformity by the output port Other processor cores in region.
- 9. processor chips according to claim 8, it is characterised in that the processing module is additionally operable to:The consistency maintenance list item that the cache module stores is inquired about according to the mark of the first processor core, obtains described the The mark of the affiliated Uniform Domains of one processor core, and in the Uniform Domains other processor cores mark;With AndThe mark of Uniform Domains according to where the first processor core, and other are handled in the Uniform Domains The mark of device core, consistency maintenance order of the generation for the Uniform Domains.
- 10. processor chips according to claim 8 or claim 9, it is characterised in that the processing module is additionally operable to:According to the network-on-chip topology status, and in the Uniform Domains other processor cores mark, described first Road determines at least one route transmission path, wherein, every route transmission path by with the Uniform Domains other The connected router composition of processor core;For route transmission path every described, the consistency maintenance order is reconstructed processing, generation is directed to this road By the consistency maintenance order of transmission path;Using route transmission path every described, the consistency maintenance order for this route transmission path is passed through described defeated Exit port is sent to the processor core on the route transmission path.
- 11. processor chips according to claim 10, it is characterised in that the processing module is additionally operable to:According to the mark of other processor cores in the Uniform Domains, it is determined that with other processor cores in the Uniform Domains The mark of connected router;The mark for the router being connected according to other processor cores in the Uniform Domains, and the topology of the network-on-chip State, route discovery is carried out according to XY routing algorithms, and determine at least one route transmission path.
- 12. a kind of computer system, it is characterised in that the system includes:Explorer, the distribution request of the processor resource for receiving virtual machine transmission, the processor resource distribution please Ask for asking the explorer to distribute at least two processors for the virtual machine;And according to the processor resource Distribution request, generation are directed to the consistency maintenance list item of the virtual machine, and the consistency maintenance list item includes:The uniformity The mark in region, and the mark of processor core that the Uniform Domains include;And as described in claim 7-9 is any Processor chips.
- 13. computer system according to claim 12, it is characterised in that the explorer, be additionally operable to:Receive institute The processor resource adjust request of virtual machine transmission is stated, the processor resource adjust request is used to ask the explorer The processor core for distributing to the virtual machine is adjusted;And according to the processor resource adjust request, to for institute The consistency maintenance list item for stating virtual machine is adjusted.
- 14. computer system according to claim 13, it is characterised in that the explorer, be additionally operable to:When described Processor resource adjust request is the caching number that the explorer will be treated in reduced processor core when reducing processor core According to writing back in internal memory, then the data for treating reduced processor core are emptied, and by institute in the consistency maintenance list item State and treat that the mark of reduced processor core is deleted.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610878873.1A CN107894914A (en) | 2016-09-30 | 2016-09-30 | Buffer consistency treating method and apparatus |
PCT/CN2017/104021 WO2018059497A1 (en) | 2016-09-30 | 2017-09-28 | Cache consistency processing method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610878873.1A CN107894914A (en) | 2016-09-30 | 2016-09-30 | Buffer consistency treating method and apparatus |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107894914A true CN107894914A (en) | 2018-04-10 |
Family
ID=61763220
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610878873.1A Pending CN107894914A (en) | 2016-09-30 | 2016-09-30 | Buffer consistency treating method and apparatus |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN107894914A (en) |
WO (1) | WO2018059497A1 (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112131174A (en) * | 2019-06-25 | 2020-12-25 | 北京百度网讯科技有限公司 | Method, apparatus, electronic device, and computer storage medium supporting communication between multiple chips |
CN113661485A (en) * | 2019-04-10 | 2021-11-16 | 赛灵思公司 | Domain assisted processor peering for coherency acceleration |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20210409265A1 (en) * | 2021-01-28 | 2021-12-30 | Intel Corporation | In-network multicast operations |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1577294A (en) * | 2003-06-25 | 2005-02-09 | 国际商业机器公司 | Multiprocessor computer system and method having multiple coherency regions |
CN102270180A (en) * | 2011-08-09 | 2011-12-07 | 清华大学 | Multicore processor cache and management method thereof |
CN103440223A (en) * | 2013-08-29 | 2013-12-11 | 西安电子科技大学 | Layering system for achieving caching consistency protocol and method thereof |
CN103858111A (en) * | 2013-10-08 | 2014-06-11 | 华为技术有限公司 | Methods, equipment and system for realizing memory sharing in aggregation virtualization |
CN104991868A (en) * | 2015-06-09 | 2015-10-21 | 浪潮(北京)电子信息产业有限公司 | Multi-core processor system and cache coherency processing method |
US20160147658A1 (en) * | 2014-11-20 | 2016-05-26 | International Business Machines Corp | Configuration based cache coherency protocol selection |
CN105740164A (en) * | 2014-12-10 | 2016-07-06 | 阿里巴巴集团控股有限公司 | Multi-core processor supporting cache consistency, reading and writing methods and apparatuses as well as device |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101859281A (en) * | 2009-04-13 | 2010-10-13 | 廖鑫 | Method for embedded multi-core buffer consistency based on centralized directory |
WO2014065802A2 (en) * | 2012-10-25 | 2014-05-01 | Empire Technology Development Llc | Multi-granular cache coherence |
CN104239270A (en) * | 2014-07-25 | 2014-12-24 | 浪潮(北京)电子信息产业有限公司 | High-speed cache synchronization method and high-speed cache synchronization device |
-
2016
- 2016-09-30 CN CN201610878873.1A patent/CN107894914A/en active Pending
-
2017
- 2017-09-28 WO PCT/CN2017/104021 patent/WO2018059497A1/en active Application Filing
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1577294A (en) * | 2003-06-25 | 2005-02-09 | 国际商业机器公司 | Multiprocessor computer system and method having multiple coherency regions |
CN102270180A (en) * | 2011-08-09 | 2011-12-07 | 清华大学 | Multicore processor cache and management method thereof |
CN103440223A (en) * | 2013-08-29 | 2013-12-11 | 西安电子科技大学 | Layering system for achieving caching consistency protocol and method thereof |
CN103858111A (en) * | 2013-10-08 | 2014-06-11 | 华为技术有限公司 | Methods, equipment and system for realizing memory sharing in aggregation virtualization |
US20160147658A1 (en) * | 2014-11-20 | 2016-05-26 | International Business Machines Corp | Configuration based cache coherency protocol selection |
CN105740164A (en) * | 2014-12-10 | 2016-07-06 | 阿里巴巴集团控股有限公司 | Multi-core processor supporting cache consistency, reading and writing methods and apparatuses as well as device |
CN104991868A (en) * | 2015-06-09 | 2015-10-21 | 浪潮(北京)电子信息产业有限公司 | Multi-core processor system and cache coherency processing method |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113661485A (en) * | 2019-04-10 | 2021-11-16 | 赛灵思公司 | Domain assisted processor peering for coherency acceleration |
CN113661485B (en) * | 2019-04-10 | 2024-05-07 | 赛灵思公司 | Domain assisted processor peering for coherency acceleration |
CN112131174A (en) * | 2019-06-25 | 2020-12-25 | 北京百度网讯科技有限公司 | Method, apparatus, electronic device, and computer storage medium supporting communication between multiple chips |
Also Published As
Publication number | Publication date |
---|---|
WO2018059497A1 (en) | 2018-04-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US8521967B1 (en) | Network computing systems having shared memory clouds with addresses of disk-read-only memories mapped into processor address spaces | |
US9952975B2 (en) | Memory network to route memory traffic and I/O traffic | |
US8250254B2 (en) | Offloading input/output (I/O) virtualization operations to a processor | |
US8898254B2 (en) | Transaction processing using multiple protocol engines | |
CN110582997A (en) | Coordinating inter-region operations in a provider network environment | |
US8930618B2 (en) | Smart memory | |
US10156890B2 (en) | Network computer systems with power management | |
CN103870435B (en) | server and data access method | |
US11741022B2 (en) | Fine grained memory and heap management for sharable entities across coordinating participants in database environment | |
US11710206B2 (en) | Session coordination for auto-scaled virtualized graphics processing | |
CN101257457A (en) | Method for network processor to copy packet and network processor | |
CN107894914A (en) | Buffer consistency treating method and apparatus | |
US9542317B2 (en) | System and a method for data processing with management of a cache consistency in a network of processors with cache memories | |
Abousamra et al. | Codesign of NoC and cache organization for reducing access latency in chip multiprocessors | |
US10452575B1 (en) | System, method and apparatus for ordering logic | |
TW202301133A (en) | Memory inclusivity management in computing systems | |
CN117043755A (en) | Memory operation management in a computing system | |
Thorson et al. | SGI® UV2: A fused computation and data analysis machine | |
US10970217B1 (en) | Domain aware data migration in coherent heterogenous systems | |
Garcia-Guirado et al. | DAPSCO: Distance-aware partially shared cache organization | |
CN106557448B (en) | The kernel code read method and system of multi-node system | |
US11573719B2 (en) | PMEM cache RDMA security | |
US20240040002A1 (en) | Managed connectivity between cloud service edge locations used for latency-sensitive distributed applications | |
CN107360033A (en) | A kind of method and apparatus of network resource management | |
JP6766567B2 (en) | Information processing equipment, methods and programs |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180410 |
|
RJ01 | Rejection of invention patent application after publication |