CN107894914A - Buffer consistency treating method and apparatus - Google Patents

Buffer consistency treating method and apparatus Download PDF

Info

Publication number
CN107894914A
CN107894914A CN201610878873.1A CN201610878873A CN107894914A CN 107894914 A CN107894914 A CN 107894914A CN 201610878873 A CN201610878873 A CN 201610878873A CN 107894914 A CN107894914 A CN 107894914A
Authority
CN
China
Prior art keywords
processor
mark
consistency maintenance
core
processor core
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610878873.1A
Other languages
Chinese (zh)
Inventor
崔晓松
陈云
蔡毅
黄勤业
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN201610878873.1A priority Critical patent/CN107894914A/en
Priority to PCT/CN2017/104021 priority patent/WO2018059497A1/en
Publication of CN107894914A publication Critical patent/CN107894914A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0806Multiuser, multiprocessor or multiprocessing cache systems
    • G06F12/0813Multiuser, multiprocessor or multiprocessing cache systems with a network or matrix configuration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/16Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/45562Creating, deleting, cloning virtual machine instances

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Mathematical Physics (AREA)
  • Multi Processors (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

The application is related to buffer consistency (Cache Coherency, CC) treatment technology in field of cloud computer technology, more particularly to chip multi-core processor.Wherein, cache coherence method includes:The first router receives the consistency maintenance request that directly connected first processor core is sent, and consistency maintenance request carries the mark of first processor core;Default consistency maintenance list item is inquired about according to the mark of first processor core, the consistency maintenance order according to the generation of above-mentioned consistency maintenance list item for the Uniform Domains;And the consistency maintenance order is sent to other processor cores of the Uniform Domains by network-on-chip.Using such scheme, the buffer consistency in certain area is effectively being safeguarded simultaneously, the problem of and can enough reduces the existing communication of chip multi-core processor in the prior art and area overhead.

Description

Buffer consistency treating method and apparatus
Technical field
The present invention relates to the buffer consistency (Cache in field of cloud computer technology, more particularly to chip multi-core processor Coherency) treatment technology.
Background technology
Chip multi-core processor (English:Chip Multi-processors, referred to as:CMPs) framework sets as processor The main flow framework of meter, it passes through router (English by many processor cores (Core):Router) network-on-chip (the English formed Text:Network on chip, referred to as:NoC) connect and compose.Caching (Cache) in chip designs according to classification, its In, L1 caching (English:Level-1cache) it is privately owned caching, it is designed in inside processor core, wherein the data stored Block is only capable of being accessed by the processor core;L2 caching (English:Level-2cache) it is shared buffer memory, wherein the data block stored is It shared data block, can be accessed by multiple processor cores, can be typically designed in by the way of centralization outside processor core, or Person by being distributed is designed near each processor core, i.e., according to being physically distributed near each processor core and pass through piece Upper network is interconnected, and can be shared in logic.
In the application of chip multi-core processor, some data blocks be present by one or more processing in the processor The scene that device core accesses.For the scene, it will usually the data block is stored in shared buffer memory, so that one or more is handled Device core is able to access that.In order to accelerate the access of data block, the privately owned of the one or more processors core of the data block was being accessed The copy of the data block is created in caching, so when a certain core for accessing the data block needs to access the data block again, Only needs carry out the reading of the data block into the privately owned caching of the core.Because in one or more accessed processor core Privately owned caching in have the copy of the data block, it is necessary to safeguard copy of the data block in the privately owned caching of multiple cores it Between uniformity, solve copy between consistency problem be referred to as buffer consistency (Cache Coherence) problem.It is slow to solve this The general principle for depositing consistency problem is that other are stored with the data block when the copy of the data block in some core is changed The processor core of copy needs to carry out consistency operation (to update in other cores the copy of the data block or delete the data block and exist Copy in other cores), this in which of polycaryon processor core just it needs to be determined that there is copy in the data block.
The concrete mode for solving buffer consistency mainly includes:Consistency protocol and base based on catalogue (Directory) In the consistency protocol for intercepting (Snooping).Wherein, the consistency protocol based on catalogue is come record data block using catalogue Which accessed by processor core, when needing to carry out consistency maintenance, by the list item of query directory, it is determined that being stored with the data The processor core of the copy of block, consistency operation then is carried out to the data block in corresponding processor core;Based on what is intercepted Consistency protocol is by the way of all processor core bus modes, i.e., when some processor core have modified in its privately owned caching A certain data block after, the consistency maintenance message is broadcasted in bus so that other are stored with the processing of the data block copy Device core carries out consistency operation.
With the development of cloud computing, the concept of resource pool is arisen at the historic moment, resource (processor, internal memory, the net of original server Network I/O port etc.) 1 or the use of multiple virtual machines are assigned to, can be that virtual machine distribute this by resource management software A little resources.Under cloud computing scene, how the processor resource for distributing to virtual machine realizes that buffer consistency is that industry faces Problem.
The content of the invention
It is more on buffer consistency processing and piece to realize this document describes a kind of buffer consistency treating method and apparatus The communication of core processor and the balance of area overhead.
On the one hand, embodiments herein provides a kind of buffer consistency processing method.Method is applied at chip multi-core Device is managed, including the first router receives the consistency maintenance request that directly connected first processor core is sent, this is consistent Property maintenance request carry first processor core mark;Default consistency maintenance table is inquired about according to the mark of first processor core , the consistency maintenance order of Uniform Domains of the generation belonging to the first processor core;By the consistency maintenance of generation Order is sent to other processor cores of above-mentioned Uniform Domains by network-on-chip.Adopt in manner just described, enable to one Cause property is safeguarded and only completed in the Uniform Domains that caching data block write operation occurs, and institute's band is safeguarded so as to avoid global coherency The expense come.
In a kind of possible design, the first router is default consistent according to the inquiry of the mark of the first processor core Property maintenance list item, the mark of the affiliated Uniform Domains of first processor core is obtained, and in the Uniform Domains The mark of other processor cores;And the mark of the first router Uniform Domains according to where the first processor core, with And in the Uniform Domains other processor cores mark, generation for the Uniform Domains consistency maintenance order Order.
In a kind of possible design, the first router is according to network-on-chip topology status, and above-mentioned Uniform Domains The mark of other interior processor cores, determine at least one route transmission path, wherein, every route transmission path by and uniformity The connected router composition of other processor cores in region;For every route transmission path, the first router is to above-mentioned consistent Property maintenance command be reconstructed processing, consistency maintenance order of the generation for this route transmission path;Utilize every route Transmission path, the consistency maintenance order for this route transmission path is sent to the processor on the route transmission path Core.By the way of this route directional transmissions, the existing broadcast of network-on-chip caused by broadcast mode can be avoided passing through Storm risk, significantly reduce the pressure of the router included by network-on-chip.
In a kind of possible design, the first router according to the marks of other processor cores in Uniform Domains, it is determined that With the mark of the router that other processor cores are connected in the Uniform Domains;And according to other processing in the Uniform Domains The mark of the router of device nuclear phase even, and the topology status of network-on-chip, the first router enter walking along the street according to XY routing algorithms By finding, and determine at least one route transmission path.Above-mentioned route finding process, it is convenient and swift to determine institute in Uniform Domains Including route transmission path.
In alternatively possible design, default consistency maintenance list item, it can generate in the following way:Resource pipe Manage device and receive the processor resource distribution request that virtual machine is sent, processor resource distribution request is used to ask the resource management Device is at least two processors that virtual machine is distributed including first processor core;Explorer provides according to the processor Source distribution request, generation are directed to the consistency maintenance list item of virtual machine, and the consistency maintenance list item includes:The mark of Uniform Domains Know, and the mark of processor core that Uniform Domains include.
In alternatively possible design, in explorer according to processor resource distribution request, generation is for virtual After the consistency maintenance list item of machine, in addition to:Explorer receives the processor resource adjust request that virtual machine is sent, place Reason device resource adjust request is used to ask explorer to be adjusted the processor core for distributing to the virtual machine;Resource pipe Device is managed according to processor resource adjust request, the consistency maintenance list item for virtual machine is adjusted.
Wherein, when processor resource adjustment is to reduce processor core, explorer needs that reduced place will be treated in advance Data cached in reason device core is write back in internal memory, then empties the data for treating reduced processor core, and above-mentioned consistent Property maintenance list item in treat reduced processor core by above-mentioned mark delete;And when processor resource adjustment is increase processor During core, explorer is by the identification record of processor core to be increased in above-mentioned consistency maintenance list item.
On the other hand, the embodiments of the invention provide a kind of processor chips, the processor chips to include:Multiple processors Core, including first processor core;Network-on-chip, connected and composed by multiple routers, wherein, including the first router, described One router and the first processor core are joined directly together;Wherein the first router includes:Processor core connectivity port, for The first processor nuclear phase connects;At least one output port, it is connected at least one router with the network-on-chip; Cache module, for storing default consistency maintenance list item;Processing module, for being connect by the processor core connectivity port The consistency maintenance request that the first processor core is sent is received, the consistency maintenance request carries the first processor core Mark;The consistency maintenance list item of the buffer storage, and root are inquired about according to the mark of the first processor core Consistency maintenance order according to consistency maintenance list item generation for the Uniform Domains;The consistency maintenance is ordered Order is sent to other processor cores of the Uniform Domains by the output port.
In a possible design, processing module is additionally operable to:According to the inquiry of the mark of the first processor core The consistency maintenance list item of cache module storage, obtain the mark of the affiliated Uniform Domains of first processor core, Yi Jiwei In the mark of other processor cores in the Uniform Domains;And the Uniform Domains according to where the first processor core Mark, and in the Uniform Domains other processor cores mark, generation is consistent for the Uniform Domains Property maintenance command.
In a possible design, processing module is additionally operable to:According to the network-on-chip topology status, and described one The mark of other processor cores, the first via determine at least one route transmission path in cause property region, wherein, described in every Route transmission path is made up of the router being connected with other processor cores in the Uniform Domains;For route every described Transmission path, the consistency maintenance order is reconstructed processing, and generation is tieed up for the uniformity in this route transmission path Shield order;Using route transmission path every described, the consistency maintenance order for this route transmission path is passed through into institute State the processor core that output port is sent on the route transmission path.
In a possible design, processing module is additionally operable to:According to the mark of other processor cores in Uniform Domains, It is determined that the mark with the router that other processor cores are connected in Uniform Domains;According to other processor cores in Uniform Domains The mark of connected router, and the topology status of network-on-chip, route discovery is carried out according to XY routing algorithms, and determined extremely A few route transmission path.
Another aspect, the embodiments of the invention provide a kind of computer system, including the processing as disclosed in previous aspect Device chip, and explorer, the distribution request of the processor resource for receiving virtual machine transmission, wherein, processor money Source distribution request is used to ask the explorer to distribute at least two processors for the virtual machine;And according to processor Resource allocation request, generation are directed to the consistency maintenance list item of virtual machine, and consistency maintenance list item includes:The mark of Uniform Domains Know, and the mark of processor core that Uniform Domains include.The function of explorer can be realized by hardware, can also Corresponding software is performed by hardware to realize.The hardware or software include one or more moulds corresponding with above-mentioned function phase Block.The module can be software and/or hardware.
In a possible design, explorer is additionally operable to:The processor resource adjustment that virtual machine is sent is received to ask Ask, processor resource adjust request is used to ask explorer to be adjusted the processor core for distributing to virtual machine;And According to processor resource adjust request, the consistency maintenance list item for virtual machine is adjusted.
In alternatively possible design, explorer is additionally operable to:Included in processor resource adjust request to processing When device resource is reduced, explorer need in advance by treat in reduced processor core it is data cached write back in internal memory, so The data for treating reduced processor core are emptied afterwards, and reduced processor core is treated by above-mentioned in above-mentioned consistency maintenance list item Mark delete.
In another possible design, explorer is additionally operable to:When processor resource adjustment is increase processing During device core, by the identification record of the processor core to be increased in the consistency maintenance list item.
Another further aspect, the embodiments of the invention provide a kind of computer-readable storage medium, for saving as above-mentioned router institute Computer software instructions, it, which is included, is used to perform the program designed by above-mentioned aspect.
Another further aspect, the embodiments of the invention provide a kind of computer-readable storage medium, for saving as above-mentioned resource management Computer software instructions used in device, it, which is included, is used to perform the program designed by above-mentioned aspect.
Compared to prior art, scheme provided by the invention can more flexibly manage consistency maintenance region, with reality Existing buffer consistency processing and the communication of chip multi-core processor and the balance of area overhead.
Brief description of the drawings
The required accompanying drawing used in embodiment or description of the prior art will be briefly described below.
Fig. 1 is the cloud computing configuration diagram that the present invention applies;
Fig. 2 is a kind of schematic diagram of processor chips of the present invention;
Fig. 3 is buffer consistency process flow schematic diagram provided in an embodiment of the present invention;
Fig. 4 is a kind of structural representation of router-module provided in an embodiment of the present invention.
Embodiment
Below in conjunction with the accompanying drawings, the technical scheme in the embodiment of the present invention is explicitly described.
The system architecture and business scenario of description of the embodiment of the present invention are in order to which more clearly the explanation present invention is implemented The technical scheme of example, does not form the restriction for technical scheme provided in an embodiment of the present invention, those of ordinary skill in the art Understand, with the differentiation of system architecture and the appearance of new business scene, technical scheme provided in an embodiment of the present invention is for similar Technical problem, it is equally applicable.
As shown in figure 1, the cloud computing framework that the application is applied is divided into four levels from top to down, it is respectively:
Application layer (Application Layer) 100, the layer run types of applications (English:Application), it is User provides corresponding service.
Operating system layer (Operating System Layer) 200, the layer include operating system (English:Operating System, abbreviation:OS), it is responsible for operating in types of applications distribution hardware resource (processor, internal memory and network thereon IO).Wherein, operating system and the application operated in thereon form virtual machine (English:Virtual Machine, referred to as:VM) Software architecture.The operating system for belonging to some virtual machine is to operate in it in the range of the hardware resource that virtual machine is possessed On application distribution hardware resource (such as:Processor, internal memory and network I/O etc.).By taking Fig. 1 as an example, two virtual machines are shown, Wherein virtual machine 1 includes operating system 1 and application 1 and using 2, and virtual machine 2 includes operating system 2 and application 3 and application 4。
Resource management layer (Resource Management Layer) 300, the layer run explorer (English: Resource Manager, referred to as:RM).In cloud computing system, in physical machine (English:Physical Machine, referred to as: PM multiple virtual machines can be created on).Virtual machine is when being created, it is necessary to be distributed by explorer for virtual machine to be created Hardware resource, these hardware resources include:Processor, internal memory, network I/O etc..In actual applications, the explorer and quilt Referred to as virtual machine monitor (English:Virtual Machine Monitor, referred to as:) or Hypervisor VMM.
Resource layer (Processor Layer) 400, including the hardware resource that resource management layer 300 can manage, as Citing, Fig. 1 is shown with processor, internal memory and network I/O etc..
As the citing of processor, Fig. 2 shows chip multi-core processor, and it is by 16 processor cores (abbreviation:Core, English Text:Core) network-on-chip being made up of 16 routers connects and composes.Wherein, each core and a direct phase of router Even, the network-on-chip being made up of router, can realize intercore communication.It should be noted that can be between core and router (electrically connected, such as by wired connection:Connected using copper cash;Or light connects, such as:Connected using optical fiber), nothing can also be used Line mode connects.
Solution of the embodiment of the present invention towards the cache coherency problems under cloud computing scene.In cloud computing scene Under, it can create at least two virtual machines on a server.When a virtual machine is created, it is necessary to which explorer is it Hardware resource is distributed, wherein, explorer can be that a number of processor core of virtual machine distribution is used as the virtual machine Process resource.What the hardware resource that any two virtual machines are possessed logically was entirely isolated.Based on this applied field Scape, in the buffer consistency operation process of chip multi-core processor, avoid the need for by the way of " global coherency " Remove to carry out buffer consistency in all processor cores to handle, and only need to carry out locally coherence processing.
Specifically illustrated with Fig. 2, the chip multi-core processor includes 16 processor cores, it is assumed that the chip multi-core processor Two virtual machines are distributed to by explorer to use.Wherein, processor core 0~9 and 12~13 is assigned to virtual machine 1, Processor core 10~11 and 14~15 is assigned to virtual machine 2, due to distributing to the processor core of different virtual machine in logic It is isolation, therefore, the processor core in Fig. 2 is divided into two Uniform Domains, such as table one:
Uniform Domains are numbered Comprising processor core
Uniform Domains 1 Core 0~9, and core 12~13
Uniform Domains 2 Core 10~11, and core 14~15
Table one
For virtual machine 1, by taking processor core 1 as an example, when core 1 enters to a certain caching data block of the privately owned caching in the core , it is necessary in the Uniform Domains 1 where processor core 1 after row write operation, consistency maintenance is carried out for the caching data block Operation, its processing procedure are as follows:
Step 310, the first router receive the consistency maintenance request that first processor core is sent, and the consistency maintenance please Seek the mark for carrying the first processor core.
As the citing of implementation process, referring to Fig. 2, consistency maintenance request is the processing by completing data block write operation What the cache controller in device core 1 was sent, the mark of processor core 1 is carried in consistency operation request.Wherein, processor The mark of core can be numbering of the system to processor core.
Step 320, the first router inquire about default consistency maintenance list item according to the mark of first processor core, obtain The mark of the affiliated Uniform Domains of first processor core, and in the Uniform Domains other processor cores mark.
Wherein, consistency maintenance list item, it is in virtual machine creating, virtual machine transmission is received by explorer After managing device resource allocation request, generated according to processor resource distribution request.Wherein, processor resource distribution request is used for It is that above-mentioned virtual machine distributes at least two processors to ask explorer.Wherein, as an example, consistency maintenance list item structure Including:Significance bit (Valid), Uniform Domains mark (English:Coherence Domain Identifier, referred to as: CDID), and Uniform Domains core home identity (Core IDs), wherein:Significance bit is used for representing whether the list item is effective; Uniform Domains mark is used for representing the mark of the Uniform Domains;The core home identity of Uniform Domains uses bit-masks The form of (Bit Mask), represent whether corresponding processor core belongs to Uniform Domains (wherein, the ratio from right to left with this Specially for 1, then it represents that belong to the Uniform Domains, the bit is 0, then it represents that is not belonging to the Uniform Domains).With one in Fig. 2 Exemplified by cause property region 1 and Uniform Domains 2, the maintenance list item of the Uniform Domains represents as follows, wherein, on the piece shown in Fig. 2 Polycaryon processor includes 16 cores, represents (from right to left, sequentially to represent 0~core of core using the vector of 16 bits compositions 15), as shown in Table 2:
Table two
Consistency maintenance list item is sent respectively to corresponding by explorer after above-mentioned consistency maintenance list item is created Router in Uniform Domains.As an example, referring to Fig. 2, explorer is by the consistency maintenance table of Uniform Domains 1 , the router included by Uniform Domains 1 is sent to, the consistency maintenance list item can be stored in itself by these routers Caching or router register in.
In this step, the first router inquires about the consistency maintenance list item according to the mark of first processor core, specifically It can pass through:The core home identity of the Uniform Domains of above-mentioned consistency maintenance list item is inquired about with the numbering of first processor core, Determine the mark of the Uniform Domains belonging to first processor core, and other processor cores in the Uniform Domains Mark.
The mark of step 330, the first router Uniform Domains according to where first processor core, and positioned at uniformity The mark of other processor cores in region, consistency maintenance order of the generation for the Uniform Domains.
In this step, the first router is according to the mark positioned at other processor cores of Uniform Domains, and generation is for being somebody's turn to do The consistency maintenance order of Uniform Domains.As an example, consistency maintenance order is the increase on the basis of pack arrangement is route The field such as CDID domains and Core IDs domains, wherein the meaning in each domain is as shown in Table 3:
Flit Type Src Addr Dst Addr CDID Core IDs Payload
0x99 1 - 1 0011001111111111 Set a=2
Table three
Flit Type:Uniformity command type is defined, such as:0x99;
Src Addr:Source address, the mark for referring to initiating the Core of consistency maintenance request in this programme is (in this reality Example citing is applied, it is processor core 1 to initiate consistency maintenance request, 1) its numbering is;
Dst Addr:Destination address (is reserved) herein;
CDID:I.e. in consistency maintenance list item Uniform Domains mark (the present embodiment citing in, the Uniform Domains It is identified as 1);
Core IDs:Uniform Domains i.e. in consistency maintenance list item core home identity (the present embodiment citing in, Processor core corresponding to the core home identity of Uniform Domains is core 0~9 and core 12~13);
Payload:Refer to the consistency operation content specifically included in the consistency maintenance order (such as:By a certain change The data of amount are updated to a certain numerical value, in the citing of the present embodiment, set variable a value as 2).
Above-mentioned consistency maintenance order is sent to Uniform Domains by step 340, the first router by network-on-chip Other processor cores.
In this step, two kinds of implementations be present:
Mode one:By the way of broadcast
As an example, in fig. 2, router 1 using broadcast mode by the consistency maintenance order of generation be sent to this one Cause property region others processor core (i.e. core 0~9 and core 12~13), by network-on-chip, router 1 is by consistency maintenance Other routers (router 0, and router 2~15) that order passes through in network-on-chip are sent to the chip multi-core processor Upper other processor cores (core 0, and core 2~15), the processor core of above-mentioned consistency maintenance order is each received, by itself Mark and the Core Ids domains of above-mentioned consistency maintenance order be compared, when finding that bit is 1 corresponding to Core IDs, Then illustrate that the processor core belongs to above-mentioned Uniform Domains, it is necessary to carry out consistency treatment according to consistency maintenance order;Work as hair When bit corresponding to existing Core IDs is 0, then illustrates that the processor core is not belonging to above-mentioned Uniform Domains, then abandon the uniformity Maintenance command, it is without any processing.
Mode two:By the way of directive sending is route
As an example, in fig. 2, router 1 obtain network-on-chip topology status, and the Uniform Domains in other The mark of processor core, at least one route transmission path is determined according to the information of above-mentioned acquisition.Specifically, router 1 can Perceive the topology status of network-on-chip, calculate the mode of its possible next-hop, confirm next-hop node, i.e., router (0,2, 5).For example, in 2D Mesh networks, according to XY method for routing, (x+1, y), (x-1, y), (x, y+1), (x, y- are calculated respectively 1) the first hop node for, determining router 1 is router (0,2,5).Router 1 according to obtained in consistency maintenance list item this one The mark of other processor cores of cause property region, judges that the processor core that router (0,2,5) is connected belongs to Uniform Domains 1; Then the selection of next-hop node is carried out in a comparable manner, the second hop node of router 1 is router (3,4,6,9), According to the mark for getting other processor cores in Uniform Domains 1, the processor core that router (3,4,6,9) is connected is determined Fall within Uniform Domains 1;Continue to judge with this, until completing the traversal of routing node belonged in Uniform Domains 1 Judge;Thus, it is possible to obtain three route transmission paths, it is respectively:1 → 0 → 4 → 8 → 12 (westwards routeing), 1 → 5 → 9 → 13 (routes to the south), 1 → 2 → 3 (6) → 7 (are route eastwards).
For every route transmission path, processing is reconstructed to above-mentioned consistency maintenance order in the first router, generation For the consistency maintenance order in this route transmission path, i.e., referred to according to consistency maintenance order, generation route directive sending Order, represent as follows:
(1) it route directive sending instruction (westwards routeing) as shown in Table 4:
Flit Type Src Addr Dst Addr CDID Core IDs Payload
0x99 1 - 1 0001000100010001 Set a=2
Table four
(2) it route directive sending instruction (routeing eastwards) as shown in Table 5:
Flit Type Src Addr Dst Addr CDID Core IDs Payload
0x99 1 - 1 0000000011001100 Set a=2
Table five
(3) it route directive sending instruction (route to the south) as shown in Table 6:
Flit Type Src Addr Dst Addr CDID Core IDs Payload
0x99 1 - 1 0010001000100000 Set a=2
Table six
It should be noted that when the route directive sending instruction routeing eastwards is sent to router 2, due to having 1 → 2 → 3 → 7 and 1 → 2 → 6 Liang Ge branches, on router 2, the route directive sending instruction routeing eastwards can be decomposed further For following two:
(4) 2 → 3 → 7 route directive sending instructions, as shown in Table 7:
Flit Type Src Addr Dst Addr CDID Core IDs Payload
0x99 1 - 1 000000010001000 Set a=2
Table seven
(5) 2 → 6 route directive sending instructions, as shown in Table 8:
Flit Type Src Addr Dst Addr CDID Core IDs Payload
0x99 1 - 1 0000000001000000 Set a=2
Table eight
After reconstructing for the route directive sending instruction in route transmission path, using every route transmission path, by pin Consistency maintenance order to this route transmission path is sent to the processor core on above-mentioned route transmission path.
By the way of this route directive sending, broadcast storm risk existing for network-on-chip can be avoided, effectively Ground reduces the message package forwarding pressure of each routing node of network-on-chip.
As the expansion of above-described embodiment scheme, the processor resource adjustment that virtual machine transmission is received in explorer please After asking, the processor resource adjust request is used to ask explorer to be adjusted the processor core for distributing to virtual machine; Explorer is adjusted according to above-mentioned processor resource adjust request to the consistency maintenance list item for above-mentioned virtual machine It is whole.
Above-mentioned adjust request includes two types:
(1) increase of processor core
When processor resource adjust request is included to processor resource increase, explorer is needed in advance to be increased Processor core in it is data cached write back to internal memory (also known as:Main storage) in, then by processor core to be increased Data empty, and by the identification record of processor core to be increased in above-mentioned consistency maintenance list item.
(2) reduction of processor core
When processor resource adjust request includes and processor resource is reduced, explorer needs to wait to reduce in advance Processor core in it is data cached write back in internal memory, then the data for treating reduced processor core are emptied, and above-mentioned The above-mentioned mark for treating reduced processor core is deleted in consistency maintenance list item.
Fig. 2 show a kind of processor chips 200 being related in above-described embodiment design block diagram (be limited to size, Only draw the part of the processor chips).Wherein, the processor chips 200 include:
Multiple processor cores, including first processor core 210 (as an example, being used as first processor core using core 1);
Network-on-chip, connected and composed by multiple routers, wherein, including the first router 220, the He of the first router 220 First processor core 210 is joined directly together;
The first router 220 includes:
Processor core connectivity port 221, for being connected with first processor core 210;
At least one output port 222, it is connected at least one router with network-on-chip;
Cache module 223, for storing default consistency maintenance list item;
Processing module 224, for receiving the consistent of the transmission of first processor core 210 by processor core connectivity port 221 Property maintenance request, consistency maintenance request carries the mark of first processor core 210;Looked into according to the mark of first processor core The consistency maintenance list item that the cache module 223 stores is ask, obtains Uniform Domains belonging to first processor core 210 Mark, and in the Uniform Domains other processor cores mark;One according to where the first processor core 210 Cause property region mark, and in the Uniform Domains other processor cores mark, generation be directed to the uniformity The consistency maintenance order in region;The consistency maintenance order is sent to the Uniform Domains by output port 222 Other processor cores.
Fig. 1 also show a kind of computer system, and the system includes:Explorer, operate on explorer Virtual machine, and the processor chips such as previous embodiment offer.
Wherein, explorer, the distribution request of the processor resource for receiving virtual machine transmission, the processor money Source distribution request is used to ask the explorer to distribute at least two processors for the virtual machine;And according to the place Device resource allocation request is managed, generation is directed to the consistency maintenance list item of the virtual machine, and the consistency maintenance list item includes:Institute State the mark of Uniform Domains, and the mark of processor core that the Uniform Domains include.
Further, explorer, it is additionally operable to:Receive the processor resource adjust request that the virtual machine is sent, institute Processor resource adjust request is stated to be used to ask the explorer to adjust the processor core for distributing to the virtual machine It is whole;And according to the processor resource adjust request, the consistency maintenance list item for the virtual machine is adjusted.
Further, explorer, it is additionally operable to:When processor resource adjust request is to reduce processor core, resource The data cached data for writing back in internal memory, then treating reduced processor core that manager will be treated in reduced processor core Empty, and delete the mark for treating reduced processor core in consistency maintenance list item;And
When processor resource adjustment is increase processor core, by the identification record of processor core to be increased described one In cause property maintenance list item.
Wherein, explorer can use software to realize, or be realized by the way of hardware+software.Such as using hard It is central processing unit (English for performing the above-mentioned explorer of the present invention when part mode is realized:Central Processing Unit, abbreviation:CPU), general processor, digital signal processor (English:Digital Signal Processor, abbreviation: DSP), application specific integrated circuit (English:Application-specific integrated circuit, abbreviation:ASIC), it is existing Field programmable gate array (English:Field Programmable Gate Array, abbreviation:FPGA) or other may be programmed and patrol Collect device, transistor logic, hardware component or its any combination.It can be realized or performed with reference in of the invention disclose Hold described various exemplary logic blocks, module and circuit.The processor can also be the group for realizing computing function Close, such as combined comprising one or more microprocessors, combination of DSP and microprocessor etc..
The step of method or algorithm with reference to described by the disclosure of invention, can be realized in a manner of hardware, also may be used By be by computing device software instruction in a manner of realize.Software instruction can be made up of corresponding software module, software mould Block can be stored on RAM (Random Access Memory, random access memory) memory, flash memory, ROM (Read- Only Memory, read-only storage) memory, (Erasable Programmable Read-Only Memory, can by EPROM Erasable programmable read only memory) memory, EEPROM (Electrically-Erasable Programmable Read- Only Memory, EEPROM) memory, register, hard disk, mobile hard disk, CD-ROM or ability In the storage medium of any other form known to domain.A kind of exemplary storage medium is coupled to processor, so that processing Device can be from the read information, and can write information to the storage medium.Certainly, storage medium can also be processing The part of device.Processor and storage medium can be located in ASIC.In addition, the ASIC can be located in user equipment.When So, processor and storage medium can also be present in user equipment as discrete assembly.
Those skilled in the art are it will be appreciated that in said one or multiple examples, work(described in the invention It is able to can be realized with hardware, software, firmware or their any combination.When implemented in software, can be by these functions It is stored in computer-readable medium or is transmitted as one or more instructions on computer-readable medium or code. Computer-readable medium includes computer-readable storage medium and communication media, and wherein communication media includes being easy to from a place to another Any medium of one place transmission computer program.It is any that storage medium can be that universal or special computer can access Usable medium.
Above-described embodiment, the purpose of the present invention, technical scheme and beneficial effect are carried out further Describe in detail, should be understood that the embodiment that the foregoing is only the present invention, be not intended to limit the present invention Protection domain, all any modification, equivalent substitution and improvements on the basis of technical scheme, done etc., all should It is included within protection scope of the present invention.

Claims (14)

  1. A kind of 1. buffer consistency processing method, applied to chip multi-core processor, it is characterised in that methods described includes:
    The first router receives the consistency maintenance request that the first processor core that is directly connected is sent, the consistency maintenance Request carries the mark of the first processor core;
    The first router inquires about default consistency maintenance list item according to the mark of the first processor core, and according to institute State consistency maintenance order of the consistency maintenance list item generation for the Uniform Domains;
    The consistency maintenance order is sent to the Uniform Domains by the first router by the network-on-chip Other processor cores.
  2. 2. according to the method for claim 1, it is characterised in that the first router is according to the first processor core Mark inquires about default consistency maintenance list item, and according to consistency maintenance list item generation for the Uniform Domains Consistency maintenance order includes:
    The first router inquires about default consistency maintenance list item according to the mark of the first processor core, described in acquisition The mark of the affiliated Uniform Domains of first processor core, and in the Uniform Domains other processor cores mark;
    The mark of the first router Uniform Domains according to where the first processor core, and positioned at the uniformity The mark of other processor cores in region, consistency maintenance order of the generation for the Uniform Domains.
  3. 3. method according to claim 1 or 2, it is characterised in that the first router orders the consistency maintenance Order is sent to other processor cores of the Uniform Domains by the network-on-chip, including:
    According to the network-on-chip topology status, and in the Uniform Domains other processor cores mark, described first Router determines at least one route transmission path, wherein, every route transmission path by with the Uniform Domains The connected router composition of other processor cores;
    For route transmission path every described, processing is reconstructed to the consistency maintenance order in the first router, Consistency maintenance order of the generation for this route transmission path;
    Using route transmission path every described, the consistency maintenance order for this route transmission path is sent to described Processor core on route transmission path.
  4. 4. according to the method for claim 3, it is characterised in that described according to the network-on-chip topology status, Yi Jisuo The mark of other processor cores Uniform Domains Nei is stated, the first router determines at least one route transmission path, including:
    According to the mark of other processor cores in the Uniform Domains, the first router determines and the Uniform Domains The mark of the connected router of other interior processor cores;
    The mark for the router being connected according to other processor cores in the Uniform Domains, and the topology of the network-on-chip State, the first router carries out route discovery according to XY routing algorithms, and determines at least one route transmission path.
  5. 5. according to any described methods of claim 1-4, it is characterised in that at the first router is according to described first The mark for managing device core is inquired about before default consistency maintenance list item, and methods described also includes:
    Explorer receives the processor resource distribution request that virtual machine is sent, and the processor resource distribution request is used for please At least two processors for asking the explorer to be distributed for the virtual machine including first processor core;
    The explorer is according to the processor resource distribution request, consistency maintenance table of the generation for the virtual machine , the consistency maintenance list item includes:The mark of the Uniform Domains, and the processor that the Uniform Domains include The mark of core.
  6. 6. according to the method for claim 5, it is characterised in that in the explorer according to the processor resource point With request, after generation is for the consistency maintenance list item of the virtual machine, methods described also includes:
    The explorer receives the processor resource adjust request that the virtual machine is sent, and the processor resource adjustment please Ask for asking the explorer to be adjusted the processor core for distributing to the virtual machine;
    The explorer is according to the processor resource adjust request, to the consistency maintenance list item for the virtual machine It is adjusted.
  7. 7. according to the method for claim 6, it is characterised in that the processor resource administrative unit is according to the processor Resource adjust request, the consistency maintenance list item for the virtual machine is adjusted, including:
    When processor resource adjustment is to reduce processor core, the explorer will be treated in reduced processor core It is data cached to write back in internal memory, then the data for treating reduced processor core are emptied, and in the consistency maintenance list item It is middle to delete the mark for treating reduced processor core.
  8. A kind of 8. processor chips, it is characterised in that including:
    Multiple processor cores, including first processor core;
    Network-on-chip, connected and composed by multiple routers, wherein, including the first router, the first router and described One processor core is joined directly together;
    The first router includes:
    Processor core connectivity port, for connecting with the first processor nuclear phase;
    At least one output port, it is connected at least one router with the network-on-chip;
    Cache module, for storing default consistency maintenance list item;
    Processing module, for receiving the consistency maintenance of the first processor core transmission by the processor core connectivity port Request, the consistency maintenance request carry the mark of the first processor core;According to the mark of the first processor core The consistency maintenance list item of the buffer storage is inquired about, and described one is directed to according to consistency maintenance list item generation The consistency maintenance order in cause property region;The consistency maintenance order is sent to the uniformity by the output port Other processor cores in region.
  9. 9. processor chips according to claim 8, it is characterised in that the processing module is additionally operable to:
    The consistency maintenance list item that the cache module stores is inquired about according to the mark of the first processor core, obtains described the The mark of the affiliated Uniform Domains of one processor core, and in the Uniform Domains other processor cores mark;With And
    The mark of Uniform Domains according to where the first processor core, and other are handled in the Uniform Domains The mark of device core, consistency maintenance order of the generation for the Uniform Domains.
  10. 10. processor chips according to claim 8 or claim 9, it is characterised in that the processing module is additionally operable to:
    According to the network-on-chip topology status, and in the Uniform Domains other processor cores mark, described first Road determines at least one route transmission path, wherein, every route transmission path by with the Uniform Domains other The connected router composition of processor core;
    For route transmission path every described, the consistency maintenance order is reconstructed processing, generation is directed to this road By the consistency maintenance order of transmission path;
    Using route transmission path every described, the consistency maintenance order for this route transmission path is passed through described defeated Exit port is sent to the processor core on the route transmission path.
  11. 11. processor chips according to claim 10, it is characterised in that the processing module is additionally operable to:
    According to the mark of other processor cores in the Uniform Domains, it is determined that with other processor cores in the Uniform Domains The mark of connected router;
    The mark for the router being connected according to other processor cores in the Uniform Domains, and the topology of the network-on-chip State, route discovery is carried out according to XY routing algorithms, and determine at least one route transmission path.
  12. 12. a kind of computer system, it is characterised in that the system includes:
    Explorer, the distribution request of the processor resource for receiving virtual machine transmission, the processor resource distribution please Ask for asking the explorer to distribute at least two processors for the virtual machine;And according to the processor resource Distribution request, generation are directed to the consistency maintenance list item of the virtual machine, and the consistency maintenance list item includes:The uniformity The mark in region, and the mark of processor core that the Uniform Domains include;And as described in claim 7-9 is any Processor chips.
  13. 13. computer system according to claim 12, it is characterised in that the explorer, be additionally operable to:Receive institute The processor resource adjust request of virtual machine transmission is stated, the processor resource adjust request is used to ask the explorer The processor core for distributing to the virtual machine is adjusted;And according to the processor resource adjust request, to for institute The consistency maintenance list item for stating virtual machine is adjusted.
  14. 14. computer system according to claim 13, it is characterised in that the explorer, be additionally operable to:When described Processor resource adjust request is the caching number that the explorer will be treated in reduced processor core when reducing processor core According to writing back in internal memory, then the data for treating reduced processor core are emptied, and by institute in the consistency maintenance list item State and treat that the mark of reduced processor core is deleted.
CN201610878873.1A 2016-09-30 2016-09-30 Buffer consistency treating method and apparatus Pending CN107894914A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201610878873.1A CN107894914A (en) 2016-09-30 2016-09-30 Buffer consistency treating method and apparatus
PCT/CN2017/104021 WO2018059497A1 (en) 2016-09-30 2017-09-28 Cache consistency processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610878873.1A CN107894914A (en) 2016-09-30 2016-09-30 Buffer consistency treating method and apparatus

Publications (1)

Publication Number Publication Date
CN107894914A true CN107894914A (en) 2018-04-10

Family

ID=61763220

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610878873.1A Pending CN107894914A (en) 2016-09-30 2016-09-30 Buffer consistency treating method and apparatus

Country Status (2)

Country Link
CN (1) CN107894914A (en)
WO (1) WO2018059497A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112131174A (en) * 2019-06-25 2020-12-25 北京百度网讯科技有限公司 Method, apparatus, electronic device, and computer storage medium supporting communication between multiple chips
CN113661485A (en) * 2019-04-10 2021-11-16 赛灵思公司 Domain assisted processor peering for coherency acceleration

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210409265A1 (en) * 2021-01-28 2021-12-30 Intel Corporation In-network multicast operations

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1577294A (en) * 2003-06-25 2005-02-09 国际商业机器公司 Multiprocessor computer system and method having multiple coherency regions
CN102270180A (en) * 2011-08-09 2011-12-07 清华大学 Multicore processor cache and management method thereof
CN103440223A (en) * 2013-08-29 2013-12-11 西安电子科技大学 Layering system for achieving caching consistency protocol and method thereof
CN103858111A (en) * 2013-10-08 2014-06-11 华为技术有限公司 Methods, equipment and system for realizing memory sharing in aggregation virtualization
CN104991868A (en) * 2015-06-09 2015-10-21 浪潮(北京)电子信息产业有限公司 Multi-core processor system and cache coherency processing method
US20160147658A1 (en) * 2014-11-20 2016-05-26 International Business Machines Corp Configuration based cache coherency protocol selection
CN105740164A (en) * 2014-12-10 2016-07-06 阿里巴巴集团控股有限公司 Multi-core processor supporting cache consistency, reading and writing methods and apparatuses as well as device

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101859281A (en) * 2009-04-13 2010-10-13 廖鑫 Method for embedded multi-core buffer consistency based on centralized directory
WO2014065802A2 (en) * 2012-10-25 2014-05-01 Empire Technology Development Llc Multi-granular cache coherence
CN104239270A (en) * 2014-07-25 2014-12-24 浪潮(北京)电子信息产业有限公司 High-speed cache synchronization method and high-speed cache synchronization device

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1577294A (en) * 2003-06-25 2005-02-09 国际商业机器公司 Multiprocessor computer system and method having multiple coherency regions
CN102270180A (en) * 2011-08-09 2011-12-07 清华大学 Multicore processor cache and management method thereof
CN103440223A (en) * 2013-08-29 2013-12-11 西安电子科技大学 Layering system for achieving caching consistency protocol and method thereof
CN103858111A (en) * 2013-10-08 2014-06-11 华为技术有限公司 Methods, equipment and system for realizing memory sharing in aggregation virtualization
US20160147658A1 (en) * 2014-11-20 2016-05-26 International Business Machines Corp Configuration based cache coherency protocol selection
CN105740164A (en) * 2014-12-10 2016-07-06 阿里巴巴集团控股有限公司 Multi-core processor supporting cache consistency, reading and writing methods and apparatuses as well as device
CN104991868A (en) * 2015-06-09 2015-10-21 浪潮(北京)电子信息产业有限公司 Multi-core processor system and cache coherency processing method

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113661485A (en) * 2019-04-10 2021-11-16 赛灵思公司 Domain assisted processor peering for coherency acceleration
CN113661485B (en) * 2019-04-10 2024-05-07 赛灵思公司 Domain assisted processor peering for coherency acceleration
CN112131174A (en) * 2019-06-25 2020-12-25 北京百度网讯科技有限公司 Method, apparatus, electronic device, and computer storage medium supporting communication between multiple chips

Also Published As

Publication number Publication date
WO2018059497A1 (en) 2018-04-05

Similar Documents

Publication Publication Date Title
US8521967B1 (en) Network computing systems having shared memory clouds with addresses of disk-read-only memories mapped into processor address spaces
US9952975B2 (en) Memory network to route memory traffic and I/O traffic
US8250254B2 (en) Offloading input/output (I/O) virtualization operations to a processor
US8898254B2 (en) Transaction processing using multiple protocol engines
CN110582997A (en) Coordinating inter-region operations in a provider network environment
US8930618B2 (en) Smart memory
US10156890B2 (en) Network computer systems with power management
CN103870435B (en) server and data access method
US11741022B2 (en) Fine grained memory and heap management for sharable entities across coordinating participants in database environment
US11710206B2 (en) Session coordination for auto-scaled virtualized graphics processing
CN101257457A (en) Method for network processor to copy packet and network processor
CN107894914A (en) Buffer consistency treating method and apparatus
US9542317B2 (en) System and a method for data processing with management of a cache consistency in a network of processors with cache memories
Abousamra et al. Codesign of NoC and cache organization for reducing access latency in chip multiprocessors
US10452575B1 (en) System, method and apparatus for ordering logic
TW202301133A (en) Memory inclusivity management in computing systems
CN117043755A (en) Memory operation management in a computing system
Thorson et al. SGI® UV2: A fused computation and data analysis machine
US10970217B1 (en) Domain aware data migration in coherent heterogenous systems
Garcia-Guirado et al. DAPSCO: Distance-aware partially shared cache organization
CN106557448B (en) The kernel code read method and system of multi-node system
US11573719B2 (en) PMEM cache RDMA security
US20240040002A1 (en) Managed connectivity between cloud service edge locations used for latency-sensitive distributed applications
CN107360033A (en) A kind of method and apparatus of network resource management
JP6766567B2 (en) Information processing equipment, methods and programs

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20180410

RJ01 Rejection of invention patent application after publication