CN107577616A - A kind of method and system for dividing final stage shared buffer memory - Google Patents
A kind of method and system for dividing final stage shared buffer memory Download PDFInfo
- Publication number
- CN107577616A CN107577616A CN201710791546.7A CN201710791546A CN107577616A CN 107577616 A CN107577616 A CN 107577616A CN 201710791546 A CN201710791546 A CN 201710791546A CN 107577616 A CN107577616 A CN 107577616A
- Authority
- CN
- China
- Prior art keywords
- page
- caching
- processor core
- buffer memory
- chromatic number
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 42
- 238000004040 coloring Methods 0.000 claims description 17
- 238000009826 distribution Methods 0.000 claims description 13
- 238000012545 processing Methods 0.000 claims description 13
- 239000003086 colorant Substances 0.000 claims description 6
- 238000003780 insertion Methods 0.000 claims description 6
- 230000037431 insertion Effects 0.000 claims description 6
- 238000012163 sequencing technique Methods 0.000 claims description 6
- 238000000151 deposition Methods 0.000 abstract description 5
- 230000009286 beneficial effect Effects 0.000 abstract description 2
- 238000010586 diagram Methods 0.000 description 6
- 230000008569 process Effects 0.000 description 6
- 238000007726 management method Methods 0.000 description 4
- 238000005192 partition Methods 0.000 description 4
- 230000008901 benefit Effects 0.000 description 3
- 230000008859 change Effects 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 238000003860 storage Methods 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 1
- 230000003139 buffering effect Effects 0.000 description 1
- 238000000205 computational method Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 230000000116 mitigating effect Effects 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000000638 solvent extraction Methods 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Landscapes
- Memory System Of A Hierarchy Structure (AREA)
Abstract
A kind of method of division final stage shared buffer memory provided herein, including determine most suitable caching when each processor core is run;According to the most suitable caching, it is determined that each processor core page to be allocated chromatic number;Chromatic number according to the most suitable caching and the page, each processor core caching line number to be allocated is calculated;The page as corresponding to the processor core the gentle descending order of line number of depositing of chromatic number and divides the final stage shared buffer memory.By realizing that the gentle double dimensions for depositing line number of chromatic number according to page divides, and has refined granularity of division, dilatancy is more preferable to last level cache.Present invention also provides a kind of system for dividing final stage shared buffer memory, there is above-mentioned beneficial effect.
Description
Technical field
The application is related to server field, more particularly to a kind of method and system for dividing final stage shared buffer memory.
Background technology
The chip that server is carried at present is mostly chip multi-core processor CMP, as Intel E5-2600v4 contains 22
Individual core, Cavium ThanderX contains 48 cores etc., to ensure the service quality of the application program performed parallel in different IPs,
The indexs of correlation such as transmission bandwidth, generally it can all be taken some measures in system software or hardware view to reduce between application program
Interfere.Wherein, it is a kind of maximally efficient measure to divide final stage shared buffer memory on piece, and it not only can reasonably be utilized
The resource of last level cache, also it can provide preferable service quality for the program higher to delay requirement.Currently proposed draws
Divide in shared buffer memory Cache management strategy, be broadly divided into two kinds, a kind of is the division based on cache lines, and another kind is to be based on
The division of caching group.
Final stage shared buffer memory is generally all connected the mapping mode of (set-associative) using group on piece, and caching is divided into
Multiple groups of set, every group includes equal number of cache lines line.Current cache generally all uses and is not used by (LRU, English recently
Literary full name:Least Recently Used) management strategy is managed, and it can be divided into three substrategys:
1) insertion strategy, the data accessed for the first time are inserted into the cache lines of the limit priority of corresponding caching group
(MRU, English full name:Most Recently Used) in;
2) Promotion Strategy, the accessed hit of some cache lines in caching, just the cache lines lifting of hit to currently organizing
Limit priority position (MRU);
3) replacement policy, all cache lines of caching group are all filled data, when needing to insert new data, selection
The cache line replacement of lowest priority position is gone out caching.
Partition strategy based on cache lines, it is that " vertical " division is carried out to caching, limiting the data of processor core can only put
Put in certain corresponding sub-fraction cache lines, different internuclear data are not interfere with each other.This is the most commonly used mitigation different application
The method disturbed between program, but its autgmentability is bad, it is not suitable for the application scenarios of high volume applications program parallelization.Than
Such as Cavium Thunder X processors, it has 48 cores, but the line number of each caching group of last level cache is on its piece
Simply 16, also Intel E5-2600v4 processors contain 22 cores, but last level cache only allows have 20 different cachings
Division.
Another partition strategy based on caching group, it is to carry out " level " to caching to divide, by for each processor
Core distributes the memory pages (page dye technology) of particular address, limits its corresponding data and is only placed into corresponding caching group
In, to reach the purpose of division caching.But it is this strategy there is also the limitation of autgmentability, such as with Cavium 48 cores
Exemplified by ThunderX, physical address is 48, page-size 64KB, so page bias internal is 16, page number is 32.Caching
Row size is 128B, and every group of caching line number is 16, cache size 16MB, so row bias internal is 7, group number is 13.Its
Middle page number and group number have 4 overlapping, are color bits, only distribute 16 kinds of colors.In fact can be by reducing the big of the page
It is small, increase number of colours, but proved through market, the effect that the setting of the big page can more obtain.If page-size is reduced
Afterwards, number of color increase, but the number of pages corresponding to each color can be caused to reduce, i.e., each assignable page of processor core
Face number is less, and this causes memory allocation error there may be that can be distributed without the page, even if it is idle other in current system to be present
The page of color.In addition, the program that each processor core is run can constantly change, reasonably to distribute spatial cache,
Its corresponding page color number can also increase or decrease.The adjustment of each processor core page color number, and a generation
The very big process of valency, for example reduce by a color, then this processor verification answers the page of this color just to need to migrate, accordingly
TLB (Translation Lookaside Buffer, transition detection buffering area) and cache lines just need to refresh and write-back, to being
System performance has no small influence.
The content of the invention
The purpose of the application is to provide a kind of method and system for dividing final stage shared buffer memory, solves existing partitioning technology
The problem of dilatancy difference.
In order to solve the above technical problems, the application provides a kind of method for dividing final stage shared buffer memory, technical scheme is as follows:
It is determined that most suitable caching during each processor core operation;
According to the most suitable caching, it is determined that each processor core page to be allocated chromatic number;
Chromatic number according to the most suitable caching and the page, each processor core cache lines to be allocated are calculated
Number;
The page as corresponding to the processor core chromatic number and the caching line number divides institute by small order is arrived greatly
State final stage shared buffer memory.
Wherein, the most suitable caching determined during each processor core operation includes:
Different line numbers and different pages into chromatic number and carry out combination of two;
Gather every thousand instruction missing numbers of the last level cache of each processor core under every kind of combination;
Most suitable caching during each processor core operation is determined according to described every thousand instruction missing numbers.
Wherein, according to the most suitable caching, it is determined that each processor core page to be allocated chromatic number and included:
Judge whether the most suitable caching M of the processor core meets M≤S/4, if so, the then page of alignment processing device core
It is colored as K;Wherein, the S is the final stage shared buffer memory size of processor, and the K is the page coloring sum of processor;
If it is not, determine to meet M ∈ [S/2 according to the most suitable caching Mn+1,S/2n) corresponding parameter n, then alignment processing
It is K/2 that the page of device core, which chromatic number,n-1, wherein n≤2 and K/2n-1≧2。
Wherein, the page as corresponding to the processor core the gentle descending order of line number of depositing of chromatic number and divides end
Level shared buffer memory, including:
The page corresponding to the processor core into chromatic number to sort from big to small;
Processor core described in chromatic number identical to the page to sort from big to small by the caching line number, is finally arranged
Sequence;
The final stage shared buffer memory is divided according to the final sequence.
Wherein, methods described also includes:
When there is data insertion, the insertion position of the data is determined according to the caching line number of the processor core.
Wherein, methods described also includes:
When cache lines are accessed, the priority position of the cache lines is lifted according to default probability.
The application also provides a kind of system for dividing final stage shared buffer memory, including:
Determining module is cached, for determining most suitable caching during each processor core operation;
Page coloring determining module, for the most suitable caching according to, it is determined that each processor core page to be allocated
Chromatic number;
Cache lines determining module, chromatic number according to the most suitable caching and the page, each processor is calculated
Core caching line number to be allocated;
Division module, chromatic number for the page as corresponding to the processor core and the caching line number is small by arriving greatly
Order division final stage shared buffer memory.
Wherein, the caching determining module includes:
Assembled unit, combination of two is carried out for different line numbers and different pages into chromatic number;
Collecting unit, for gathering every thousand instruction missings of the last level cache of each processor core under every kind of combination
Number;
Determining unit, it is most suitable slow when determining that each processor core is run according to described every thousand instruction missing numbers
Deposit.
Wherein, the page coloring determining module includes:
Judging unit, for judging whether most suitable caching M meets M≤S/4 described in the processor core, if so, then right
The page of processor core is answered to be colored as K;Wherein, the S is the final stage shared buffer memory size of processor, and the K is the page of processor
Coloring sum;
Traversal Unit, if for when most suitable caching M is unsatisfactory for M≤S/4 described in the processor core, according to described most suitable
Caching M determines to meet M ∈ [S/2n+1,S/2n) corresponding parameter n, then it is K/2 that the page of alignment processing device core, which chromatic number,n-1, wherein n
≤ 2 and K/2n-1≧2。
Wherein, the division module includes:
First sequencing unit, sorted from big to small for the page corresponding to the processor core into chromatic number;
Second sequencing unit, for the page processor core described in chromatic number identical by it is described caching line number from greatly to
Small sequence, is finally sorted;
Division unit, for dividing the final stage shared buffer memory according to the final sequence.
A kind of method of division final stage shared buffer memory provided herein, including determine when each processor core is run
Most suitable caching;According to the most suitable caching, it is determined that each processor core page to be allocated chromatic number;According to described most suitable slow
Deposit and chromatic number with the page, each processor core caching line number to be allocated is calculated;It is corresponding by the processor core
The page the gentle descending order of line number of depositing of chromatic number and divide the final stage shared buffer memory.By real to last level cache
Now the gentle double dimensions for depositing line number of chromatic number according to page to divide, refined granularity of division, dilatancy is more preferable.Present invention also provides
A kind of system for dividing final stage shared buffer memory, has above-mentioned beneficial effect, here is omitted.
Brief description of the drawings
, below will be to embodiment or existing in order to illustrate more clearly of the embodiment of the present application or technical scheme of the prior art
There is the required accompanying drawing used in technology description to be briefly described, it should be apparent that, drawings in the following description are only this
The embodiment of application, for those of ordinary skill in the art, on the premise of not paying creative work, can also basis
The accompanying drawing of offer obtains other accompanying drawings.
A kind of method flow diagram for division final stage shared buffer memory that Fig. 1 is provided by the embodiment of the present application;
A kind of caching schematic diagram that Fig. 2 is provided by the embodiment of the present application;
The schematic diagram for the distribution processor core P1 that Fig. 3 is provided by the embodiment of the present application;
The schematic diagram for the distribution processor core P2 that Fig. 4 is provided by the embodiment of the present application;
The schematic diagram for the distribution processor core P3 that Fig. 5 is provided by the embodiment of the present application;
A kind of system schematic for division final stage shared buffer memory that Fig. 6 is provided by the embodiment of the present application.
Embodiment
To make the purpose, technical scheme and advantage of the embodiment of the present application clearer, below in conjunction with the embodiment of the present application
In accompanying drawing, the technical scheme in the embodiment of the present application is clearly and completely described, it is clear that described embodiment is
Some embodiments of the present application, rather than whole embodiments.Based on the embodiment in the application, those of ordinary skill in the art
The every other embodiment obtained under the premise of creative work is not made, belong to the scope of the application protection.
It refer to Fig. 1, a kind of method flow diagram for division final stage shared buffer memory that Fig. 1 is provided by the embodiment of the present application,
It may include steps of:
S101:It is determined that most suitable caching during each processor core operation;
Server run when, the caching required for each processor core operation program is different identical, now it needs to be determined that
Most suitable caching during each processor core operation.The application does not limit the side for obtaining most suitable caching corresponding to each processor core
Method.A kind of preferable method for determining most suitable caching is provided here:
Different line numbers and different pages into chromatic number and carry out combination of two;Gather each processor under every kind of combination
Every thousand instruction missing numbers of the last level cache of core;Each processor core fortune is determined according to described every thousand instruction missing numbers
Most suitable caching during row.
Specifically, different line numbers and page coloring number dividing mode are taken processor core, such as the division side of line number
Formula { 1,2,4,8,16 } and page coloring species number { 2,4,8,12,16 } carry out combination of two, gather alignment processing under various combination
The last level cache MPKI (Miss-Per-Kilo-Instructions every thousand instruction missing numbers) of device core, in this, as index,
Determine the suitable cache size of alignment processing device core.There can certainly be the most suitable caching that other method determines processor core, all
Should be in the protection domain of the application.
S102:According to the most suitable caching, it is determined that each processor core page to be allocated chromatic number;
Page chromatic number can according to most it is suitable caching be calculated, the purpose of this step is to determine that page chromatic number, as long as can really
Determine page corresponding to processor core and chromatic number, all should be in the protection domain of the application.One kind is provided herein preferably by most suitable
Caching determines that processor core corresponds to the method that page chromatic number:
Assuming that the final stage shared buffer memory size of processor is S, the page coloring sum of processor is K, judges the processor
Whether the most suitable caching M of core meets M≤S/4, if so, then the page of alignment processing device core is colored as K;If it is not, determine full
Sufficient M ∈ [S/2n+1,S/2n) corresponding parameter n, then it is K/2 that the page of alignment processing device core, which chromatic number,n-1, wherein n≤2 and K/2n-1
≧2。
The benefit of this method is to mitigate the performance cost processor core page colours again when.Above-mentioned classifying rules is taken,
When the cache size of processor core distribution has change, but still falls within same category space, its page distributed coloring species can not
Become, avoid restaining for the page.There can certainly be other determination processor cores to correspond to the method that page chromatic number, the application
It is not limited thereto.
S103:Chromatic number according to the most suitable caching and the page, it is to be allocated that each processor core is calculated
Cache line number;
Here computational methods are most suitable cache size divided by page coloring number, result to be rounded downwards.Such as cache lines
Size is 22, and page coloring number is 4, then the caching line number divided is
S104:The page as corresponding to the processor core chromatic number and the caching line number is drawn by small order is arrived greatly
Divide the final stage shared buffer memory.
After obtaining page corresponding to processor core and that chromatic number is gentle and deposit line number, you can be ranked up preparation division final stage and be total to
Enjoy caching.Spatial cache based on distribution, from big to small, caching is divided successively.For division every time, selection first is worked as
It is preceding to divide the maximum continuous caching group in space, make the region of division adjacent to each other, avoid wasting spatial cache.Because page colours
Number is determined according to most suitable caching, and caching line number is chromatic number by most suitable caching and page and determined, therefore page the priority of chromatic number
Higher than caching line number, page corresponding to the larger processor core of most suitable caching chromatic number necessarily not less than the most suitable less place of caching
Page corresponding to reason device core chromatic number, and page the big processor core of chromatic number and small processor core elder generation storage allocation is coloured than page.
Specifically method can be:
The page corresponding to the processor core into chromatic number to sort from big to small;The page described in chromatic number identical
Processor core sorts from big to small by the caching line number, is finally sorted;The final stage is divided according to the final sequence
Shared buffer memory.
Further, it is also possible to the direct order descending according to most suitable caching for processor core to divide final stage shared slow
Deposit, now page the gentle parameter deposited when line number caches as division of chromatic number.
It is worth noting that, when obtained most suitable caching is not the integral multiple that page chromatic number, it can now ignore remainder
The caching of quantity, when above-mentioned cache lines are 22 exemplified by, page chromatic number as 4, and caching line number is 5, the caching now distributed is 4 ×
5 caching section, that two cache lines can be ignored.Advantage of this is that last level cache can be utilized as far as possible.Certainly,
Select continuous caching group to distribute for processor core as far as possible in distribution to cache.
The embodiment of the present application provides a kind of method for dividing final stage shared buffer memory, be able to can be realized by the above method
The division last level cache of double dimensions, solve the partition strategy based on cache lines in the prior art and be not suitable with high volume applications program
The problem of parallel or partition strategy based on caching group easily causes memory allocation error, from two dimensions of caching group and cache lines
Final stage shared buffer memory is divided, improves the utilization rate of final stage shared buffer memory, efficiently solves the scaling concern of caching division,
Preferably interfering between reduction application program.
Based on above-described embodiment, present invention also provides division last level cache after be directed to the management strategy of caching group,
It is described in detail below:
Assuming that cache lines dividing mode corresponding to a certain caching group is { A1, A2 ... An }, wherein An is a certain processor core
The caching line number that caching component is fitted on herein, thenWherein W is cache lines sum.Caching group management method can be with
It is divided into three sub- policy depictions:
Insertion strategy:The caching line number of processor core distribution determines the position that its data is inserted in this cache lines.Such as
Processor core has been assigned to 5 cache lines in the caching group containing 8 cache lines, and caching group priority is inserted into its data
For 5 this position.
Promotion Strategy:When a certain cache lines are accessed hit, then it lifts a priority position with certain probability P.Its
Middle probability P is the parameter pre-set, can be taking human as determination, the method for not limiting its determination herein.Such as can be according to full
The number that is connected with counter with the group of last level cache determines.Certainly can also be according to the species for the cache lines being hit (if mixed
Close host when, judge that cache lines belong to dynamic random access memory or nonvolatile storage etc.) and access species (press
Need to access or write back access etc.) different Promotion Strategies is set.
Replacement policy:It is consistent with LRU, select the data of lowest priority position to replace out caching group.
Based on above-described embodiment, the present embodiment provides a kind of method of specific division last level cache, and technical scheme is as follows;
As shown in Fig. 2 a caching, includes the caching group that 8 pages colour, each caching group has 8 cache lines, total size
For 64 cache lines.There are three processor cores to need to divide caching at present, being computed processor core P1 needs 40 cache lines, place
It is respectively 12 and 10 cache lines to manage device core P2 and processor core P3.Since 40 > (64 ÷ 4)=16, therefore P1 page chromatic number
For 8, and 64/23The < 16=64/2 of=8 < 122, 64/23The < 16=64/2 of=8 < 102, i.e. n=2, therefore corresponding page colours
Number is 8/2=4.And 40/8=5,12/4=3,Therefore the caching line number of P1, P2 and P3 distribution is respectively 5,3
With 2.Then first distribution page the larger P1 of chromatic number, and reallocate P2, P3.
Such as Fig. 3 to Fig. 5, the processor core order of distribution and the process of distribution are indicated.Before unallocated, each caching group quilt
The line number of division is 0.Fig. 3 is after distributing P1, and it is 5 to update the allocated caching line number of each caching group.Fig. 4 be distribution P2 after,
Continuous caching group is selected, and cache lines count value has been distributed corresponding to renewal as 8.Cache lines similarly are distributed to P3, such as Fig. 5 institutes
Show.
As it was earlier mentioned, the data for belonging to P1 are inserted into caching group priority as 5 position, P2 and P3 data are inserted respectively
Enter to the position that caching group priority is 3 and 2.Data hit just lifts forward a priority with certain probability P, and P can use
For 3/4.The position of lowest priority is selected to be replaced during replacement.
A kind of system of the division final stage shared buffer memory provided below the embodiment of the present application is introduced, described below
The method of the system and above-described division final stage shared buffer memory that divide final stage shared buffer memory can be mutually to should refer to.
Referring to Fig. 6, a kind of system schematic for division final stage shared buffer memory that Fig. 6 is provided by the embodiment of the present application should
System can include:
Determining module 100 is cached, for determining most suitable caching during each processor core operation;
Page coloring determining module 200, for the most suitable caching according to, it is determined that each processor core page to be allocated
Chromatic number;
Cache lines determining module 300, chromatic number according to the most suitable caching and the page, each processing is calculated
Device core caching line number to be allocated;
Division module 400, chromatic number for the page as corresponding to the processor core, gentle to deposit line number descending
Order division final stage shared buffer memory.
Based on above-described embodiment, as preferred embodiment, the caching determining module can include:
Assembled unit, combination of two is carried out for different line numbers and different pages into chromatic number;
Collecting unit, for gathering every thousand instruction missings of the last level cache of each processor core under every kind of combination
Number;
Determining unit, it is most suitable slow when determining that each processor core is run according to described every thousand instruction missing numbers
Deposit.
Based on above-described embodiment, as preferred embodiment, the page coloring determining module can include:
Judging unit, for judging whether most suitable caching M meets M≤S/4 described in the processor core, if so, then right
The page of processor core is answered to be colored as K;Wherein, the S is the final stage shared buffer memory size of processor, and the K is the page of processor
Coloring sum;
Traversal Unit, if for when most suitable caching M is unsatisfactory for M≤S/4 described in the processor core, it is determined that meeting M ∈
[S/2n+1,S/2n) corresponding parameter n, then it is K/2 that the page of alignment processing device core, which chromatic number,n-1, wherein n≤2 and K/2n-1≧2。
Based on above-described embodiment, as preferred embodiment, the division module can include:
First sequencing unit, sorted from big to small for the page corresponding to the processor core into chromatic number;
Second sequencing unit, for the page processor core described in chromatic number identical by it is described caching line number from greatly to
Small sequence, is finally sorted;
Division unit, for dividing the final stage shared buffer memory according to the final sequence.
Each embodiment is described by the way of progressive in specification, and what each embodiment stressed is and other realities
Apply the difference of example, between each embodiment identical similar portion mutually referring to.For embodiment provide system and
Speech, because it is corresponding with the method that embodiment provides, so description is fairly simple, related part is referring to method part illustration
.
A kind of method and system of division final stage shared buffer memory provided herein is described in detail above.This
Apply specific case in text to be set forth the principle and embodiment of the application, the explanation of above example is only intended to
Help understands the present processes and its core concept.It should be pointed out that for those skilled in the art,
On the premise of not departing from the application principle, some improvement and modification can also be carried out to the application, these are improved and modification also falls
Enter in the application scope of the claims.
It should also be noted that, in this manual, such as first and second or the like relational terms be used merely to by
One entity or operation make a distinction with another entity or operation, and not necessarily require or imply these entities or operation
Between any this actual relation or order be present.Moreover, term " comprising ", "comprising" or its any other variant meaning
Covering including for nonexcludability, so that process, method, article or equipment including a series of elements not only include that
A little key elements, but also the other element including being not expressly set out, or also include for this process, method, article or
The intrinsic key element of equipment.In the absence of more restrictions, the key element limited by sentence "including a ...", is not arranged
Except other identical element in the process including the key element, method, article or equipment being also present.
Claims (10)
- A kind of 1. method for dividing final stage shared buffer memory, it is characterised in that including:It is determined that most suitable caching during each processor core operation;According to the most suitable caching, it is determined that each processor core page to be allocated chromatic number;Chromatic number according to the most suitable caching and the page, each processor core caching line number to be allocated is calculated;The page as corresponding to the processor core chromatic number and the caching line number divides the end by small order is arrived greatly Level shared buffer memory.
- 2. according to the method for claim 1, it is characterised in that the most suitable caching determined when each processor core is run Including:Different line numbers and different pages into chromatic number and carry out combination of two;Gather every thousand instruction missing numbers of the last level cache of each processor core under every kind of combination;Most suitable caching during each processor core operation is determined according to described every thousand instruction missing numbers.
- 3. method according to claim 1 or 2, it is characterised in that according to the most suitable caching, it is determined that each processing Device core page to be allocated, which chromatic number, to be included:Judge whether the most suitable caching M of the processor core meets M≤S/4, if so, the then page coloring of alignment processing device core For K;Wherein, the S is the final stage shared buffer memory size of processor, and the K is the page coloring sum of processor;If it is not, determine to meet M ∈ [S/2 according to the most suitable caching Mn+1,S/2n) corresponding parameter n, then alignment processing device core Page chromatic number be K/2n-1, wherein n≤2 and K/2n-1≧2。
- 4. according to the method for claim 3, it is characterised in that the page chromatic number and eased up as corresponding to the processor core The descending order division final stage shared buffer memory of line number is deposited, including:The page corresponding to the processor core into chromatic number to sort from big to small;Processor core described in chromatic number identical to the page to sort from big to small by the caching line number, is finally sorted;The final stage shared buffer memory is divided according to the final sequence.
- 5. according to the method for claim 4, it is characterised in that also include:When there is data insertion, the insertion position of the data is determined according to the caching line number of the processor core.
- 6. according to the method for claim 5, it is characterised in that also include:When cache lines are accessed, the priority position of the cache lines is lifted according to default probability.
- A kind of 7. system for dividing final stage shared buffer memory, it is characterised in that including:Determining module is cached, for determining most suitable caching during each processor core operation;Page coloring determining module, for the most suitable caching according to, it is determined that each processor core page to be allocated chromatic number;Cache lines determining module, chromatic number according to the most suitable caching and the page, each processor core is calculated and treats The caching line number of distribution;Division module, it is small suitable by arriving greatly chromatic number and the caching line number for the page as corresponding to the processor core Sequence divides final stage shared buffer memory.
- 8. system according to claim 7, it is characterised in that the caching determining module includes:Assembled unit, combination of two is carried out for different line numbers and different pages into chromatic number;Collecting unit, for gathering every thousand instruction missing numbers of the last level cache of each processor core under every kind of combination;Determining unit, most suitable caching when determining that each processor core is run according to described every thousand instruction missing numbers.
- 9. the system according to claim 7 or 8, it is characterised in that the page coloring determining module includes:Judging unit, for judging whether most suitable caching M meets M≤S/4 described in the processor core, if so, then corresponding position The page of reason device core is colored as K;Wherein, the S is the final stage shared buffer memory size of processor, and the K is that the page of processor colours Sum;Traversal Unit, if for when most suitable caching M is unsatisfactory for M≤S/4 described in the processor core, according to the most suitable caching M determines to meet M ∈ [S/2n+1,S/2n) corresponding parameter n, then it is K/2 that the page of alignment processing device core, which chromatic number,n-1, wherein n≤2 And K/2n-1≧2。
- 10. system according to claim 9, it is characterised in that the division module includes:First sequencing unit, sorted from big to small for the page corresponding to the processor core into chromatic number;Second sequencing unit, arranged from big to small by the caching line number for processor core described in chromatic number identical to the page Sequence, finally sorted;Division unit, for dividing the final stage shared buffer memory according to the final sequence.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710791546.7A CN107577616B (en) | 2017-09-05 | 2017-09-05 | Method and system for dividing last-level shared cache |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710791546.7A CN107577616B (en) | 2017-09-05 | 2017-09-05 | Method and system for dividing last-level shared cache |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107577616A true CN107577616A (en) | 2018-01-12 |
CN107577616B CN107577616B (en) | 2020-09-18 |
Family
ID=61029865
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710791546.7A Active CN107577616B (en) | 2017-09-05 | 2017-09-05 | Method and system for dividing last-level shared cache |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107577616B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110188026A (en) * | 2019-05-31 | 2019-08-30 | 龙芯中科技术有限公司 | The determination method and device of fast table default parameters |
CN110688072A (en) * | 2019-09-30 | 2020-01-14 | 上海兆芯集成电路有限公司 | Cache system and operation method thereof |
CN111258927A (en) * | 2019-11-13 | 2020-06-09 | 北京大学 | Application program CPU last-level cache miss rate curve prediction method based on sampling |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102521150A (en) * | 2011-11-28 | 2012-06-27 | 华为技术有限公司 | Application program cache distribution method and device |
CN103077128A (en) * | 2012-12-29 | 2013-05-01 | 华中科技大学 | Method for dynamically partitioning shared cache in multi-core environment |
US20160170890A1 (en) * | 2013-11-01 | 2016-06-16 | Cisco Technology, Inc. | Bounded cache searches |
-
2017
- 2017-09-05 CN CN201710791546.7A patent/CN107577616B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102521150A (en) * | 2011-11-28 | 2012-06-27 | 华为技术有限公司 | Application program cache distribution method and device |
CN103077128A (en) * | 2012-12-29 | 2013-05-01 | 华中科技大学 | Method for dynamically partitioning shared cache in multi-core environment |
US20160170890A1 (en) * | 2013-11-01 | 2016-06-16 | Cisco Technology, Inc. | Bounded cache searches |
Non-Patent Citations (1)
Title |
---|
张栌丹等: "基于页着色的多核处理器共享Cache动态分区", 《计算机学报》 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110188026A (en) * | 2019-05-31 | 2019-08-30 | 龙芯中科技术有限公司 | The determination method and device of fast table default parameters |
CN110188026B (en) * | 2019-05-31 | 2023-05-12 | 龙芯中科技术股份有限公司 | Method and device for determining missing parameters of fast table |
CN110688072A (en) * | 2019-09-30 | 2020-01-14 | 上海兆芯集成电路有限公司 | Cache system and operation method thereof |
CN111258927A (en) * | 2019-11-13 | 2020-06-09 | 北京大学 | Application program CPU last-level cache miss rate curve prediction method based on sampling |
Also Published As
Publication number | Publication date |
---|---|
CN107577616B (en) | 2020-09-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7899994B2 (en) | Providing quality of service (QoS) for cache architectures using priority information | |
Jeannot et al. | Near-optimal placement of MPI processes on hierarchical NUMA architectures | |
CN105491138B (en) | Distributed load scheduling method based on load rate graded triggering | |
CN107577616A (en) | A kind of method and system for dividing final stage shared buffer memory | |
CN108845960B (en) | Memory resource optimization method and device | |
US20180373635A1 (en) | Managing cache partitions based on cache usage information | |
CN103455443B (en) | Buffer management method and device | |
CN105391654A (en) | Account activeness-based system resource allocation method and device | |
CN103023963B (en) | A kind of method for cloud storage resources configuration optimization | |
Liu et al. | Going vertical in memory management: Handling multiplicity by multi-policy | |
CN109582600B (en) | Data processing method and device | |
CN102663115B (en) | Main memory database access optimization method on basis of page coloring technology | |
US20100250890A1 (en) | Managing working set use of a cache via page coloring | |
CN102567077B (en) | Virtualized resource distribution method based on game theory | |
US20130191605A1 (en) | Managing addressable memory in heterogeneous multicore processors | |
CN107870871B (en) | Method and device for allocating cache | |
CN104572501B (en) | Access trace locality analysis-based shared buffer optimization method in multi-core environment | |
CN112148665A (en) | Cache allocation method and device | |
DE112016004367T5 (en) | Technologies for automatic processor core allocation management and communication using direct data placement in private buffers | |
CN107729267A (en) | The scattered distribution of resource and the interconnection structure for support by multiple engine execute instruction sequences | |
CN104346404A (en) | Method, equipment and system for accessing data | |
CN106126434B (en) | The replacement method and its device of the cache lines of the buffer area of central processing unit | |
CN112540934B (en) | Method and system for ensuring service quality when multiple delay key programs are executed together | |
US20170160959A1 (en) | Computer memory management method and system | |
CN110308965A (en) | The rule-based heuristic virtual machine distribution method and system of cloud data center |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20200821 Address after: 215100 No. 1 Guanpu Road, Guoxiang Street, Wuzhong Economic Development Zone, Suzhou City, Jiangsu Province Applicant after: SUZHOU LANGCHAO INTELLIGENT TECHNOLOGY Co.,Ltd. Address before: 450018 Henan province Zheng Dong New District of Zhengzhou City Xinyi Road No. 278 16 floor room 1601 Applicant before: ZHENGZHOU YUNHAI INFORMATION TECHNOLOGY Co.,Ltd. |
|
TA01 | Transfer of patent application right | ||
GR01 | Patent grant | ||
GR01 | Patent grant |