CN103345451B

CN103345451B - Data buffering method in multi-core processor

Info

Publication number: CN103345451B
Application number: CN201310301037.3A
Authority: CN
Inventors: 毛力; 容强
Original assignee: SICHUAN JIUCHENG INFORMATION TECHNOLOGY Co Ltd
Current assignee: Sichuan qianhang Technology Co., Ltd
Priority date: 2013-07-18
Filing date: 2013-07-18
Publication date: 2015-05-13
Anticipated expiration: 2033-07-18
Also published as: CN103345451A

Abstract

The invention provides a data buffering method in a multi-core processor. The data buffering method in the multi-core processor comprises the steps of receiving a command for concurrently executing multiple threads; independently assigning each of the multiple threads to multiple cores of the processor respectively, wherein each of the multiple cores of the processor is assigned with one thread at most; responding to caching requests regarding each core, with the assigned thread, of the processor during the period that the threads are executed, and storing caching data to a coupled special buffer storage; when caching storages which are larger than or equal to a threshold value t in number store the same caching data, storing the same caching data to a general buffer storage. Through the data buffering method in the multi-core processer, the caching assess speed and the replacement speed are improved, and the problem of false sharing is overcome.

Description

A kind of method of buffered data in polycaryon processor

Technical field

The present invention relates to field of data storage, particularly relate to a kind of method of buffered data in polycaryon processor, relate to a kind of method of multi-buffer data in polycaryon processor further.

Background technology

Gaps between their growth rates between processor and main memory are outstanding contradictions concerning polycaryon processor, therefore must use multi-level buffer buffer memory to alleviate.There is the polycaryon processor of shared level cache at present, share the polycaryon processor of L2 cache and the polycaryon processor of shared main memory.Usually, polycaryon processor adopts the polycaryon processor structure sharing L2 cache, and namely each processor core has privately owned level cache, and all processor cores share L2 cache.The architecture Design of buffer memory self is also directly connected to entire system performance.But in polycaryon processor structure, which is better and which is worse for shared buffer memory or exclusive buffer memory, need not need to set up multi-level buffer on one chip, and set up what buffer memory etc., because the size to whole chip, power consumption, layout, performance and operational efficiency etc. all have a great impact, thus these are all the problems needing conscientious research and exploitation.On the other hand, multi-level buffer causes consistency problem again.Adopt which kind of buffer consistency model and mechanism all will produce material impact to polycaryon processor overall performance.The buffer consistency model extensively adopted in conventional multi-processor system architecture has: sequential consistency model, weak consistency model, release consistency model etc.Associated cache coherence mechanisms mainly contains the snoopy protocol of bus and the directory protocol based on catalogue.Current multi-core processor system adopts the snoopy protocol based on bus mostly.

Sometimes need to carry out data sharing with synchronous between the program of each processor core execution of polycaryon processor processor, therefore its hardware configuration must support intercore communication.Efficient communication mechanism is the high performance important leverage of polycaryon processor processor, and on the sheet comparing main flow at present, efficient communication mechanism has two kinds, and a kind of is the buffer structure shared based on bus, and a kind of is based on the interconnection structure on sheet.Bus shared buffer memory structure refers to that each processor cores has shared secondary or three grades of buffer memorys, for preserving relatively more conventional data, and is communicated by the bus connecting core.The advantage of this system is that structure is simple, and communication speed is high, and shortcoming is poor based on the structure extensibility of bus.Structure based on on-chip interconnect refers to that each processor core has independently processing unit and buffer memory, and each processor core is linked together by the mode such as cross bar switch or network-on-chip.Each processor core passes through message communicating in the heart.The advantage of this structure is that extensibility is good, and data bandwidth is guaranteed; Shortcoming is that hardware configuration is complicated, and software alteration is larger.Perhaps the competition results of both is not replace mutually but work in coordination, such as, adopt network-on-chip in global scope and locally adopt bus mode, reaching the balance of performance and complicacy.

In conventional microprocessor, buffer memory does not hit or memory access event all can have a negative impact to the execution efficiency of processor, and the work efficiency of Bus Interface Unit can determine this effect.When multiple processor core require simultaneously access memory or the intracardiac privately owned buffer memory of multiple processor core occur simultaneously buffer memory not hit event time, the efficiency of changing the mechanism of BIU to the arbitration mechanism of this multiple request of access and externally memory access determines the overall performance of multi-core processor system.Therefore find efficient multiport Bus Interface Unit structure, transfer the individual character access of multi-core to main memory to more efficient Burst accessing; Find the important content that the quantitative model of a Burst accessing word of polycaryon processor processor whole efficiency the best and the arbitration mechanism of efficient multiport Bus Interface Unit access will be the research of polycaryon processor processor simultaneously.

In current multi-core processor system, no matter no matter are L2 cache or three grades of buffer memorys, be shared buffer memory or privately owned buffer memory, and reading and the replacement algorithm of its buffer memory all exist the technical matterss such as algorithm complex is high, hit duration is long.

In addition, in existing L2 cache mode, usually in privately owned level cache, all there is backup in the L2 cache data of shared buffer memory data, so, after data cached in level cache is changed by different processor cores, there will be the pseudo-sharing problem of high-speed cache, thus bring needs often to reload, add access delay, system newly can be declined.

Summary of the invention

In order to the reading and replacement algorithm that solve buffer memory in existing multi-core processor system all exist the technical matterss such as algorithm complex is high, hit duration is long, puppet is shared, the invention provides a kind of method of buffered data in polycaryon processor, wherein said polycaryon processor comprises multiple processor core, is formed centrally multiple dedicated buffer memory of coupled relation one by one with described multiple processor core and is coupled in a universal buffering storer of described multiple processor core respectively, and described method comprises:

Receive the instruction of the multiple thread of concurrence performance;

Each in described multiple thread is separately distributed to described multiple processor core, and wherein multiple processor core in the heart each is assigned with at most a thread;

The processor core of thread being assigned with for each, in response to the cache request during execution thread, being data cachedly stored in be coupled dedicated buffer memory by treating;

When all store in the dedicated buffer memory of quantity being not less than a threshold value t same data cached time, to be data cachedly stored into same in universal buffering storer.

Preferably, wherein by same data cached be stored in universal buffering storer after, remove that to store in same data cached dedicated buffer memory same data cached, and release removes the same data cached shared storage space stored in same data cached dedicated buffer memory.

Preferably, when described multiple processor core in the heart each needs reading cache data, read described data cached from universal buffering storer or dedicated buffer memory by query caching mapping table.

Preferably, wherein t=s or or t=2, wherein s is the total amount of the processor core be under state of activation, wherein expression rounds up.

The invention also discloses a kind of method of multi-buffer data in polycaryon processor, described polycaryon processor comprises 2 ⁿindividual processor core, wherein, and n+1 level buffer memory, wherein m level buffer memory comprises 2 ^n+1-mindividual memory buffer; Wherein, i-th memory buffer of the 1st grade of buffer memory only for the thread stored performed by i-th processor core cushion data cached; A jth memory buffer of s level buffer memory is only for storing the 2nd of s-1 level buffer memory ^j-1 and the 2nd ^jtotal data cached in individual memory buffer, wherein, n be greater than 1 integer, 1<=m<=n+1,2<=s<=n+1,1<=i<=2 ⁿ, 1<=j<=2 ^n+1-m; Described method comprises:

In response to the instruction of multiple threads of the same process of concurrence performance, each thread is distributed to different idle processor cores, activate the idle processor core being assigned with thread simultaneously, make the processor core being assigned with thread become busy condition from idle condition;

After i-th processor core is activated, in response to cache instruction, first check and whether treat data cachedly to be stored in of p level buffer memory in individual memory buffer, if existed, send the successful acknowledge message of instruction buffer memory, wherein 1<=i<=2 ⁿ, 2<=p<=n+1, represent downward floor operation, k=2 ^p-1;

If there is no, then be data cachedly stored in i-th memory buffer of the 1st grade of buffer memory by treating; Then judge step by step from the 1st grade of buffer memory to the n-th grade buffer memory, if of t level buffer memory * 2-1 and all store described data to be buffered in * 2 memory buffer, then by described data conversion storage to be buffered to of t+1 level buffer memory in individual memory buffer, and by of t level buffer memory * 2-1 and * what store in 2 memory buffer treats data cached removing, discharge of t level buffer memory simultaneously * 2-1 and data cached shared storage space is treated, wherein described in * 2 memory buffer 1<=t<=n, expression rounds up;

Send the successful acknowledge message of instruction buffer memory.

The present invention utilizes buffer memory mapping table, and treat based on execution thread and data cachedly read in real time and replace, especially in real time between dedicated buffer memory and universal buffering storer, data cached replacement is carried out, and utilize multi-level buffer to read in real time and replace data cached, achieve lower algorithm complex, decrease hit duration, improve the whole efficiency of the computer system with polycaryon processor.

Accompanying drawing explanation

Included accompanying drawing is used for understanding the present invention further, and its ingredient as instructions also explains principle of the present invention together with instructions, in the accompanying drawings:

Fig. 1 is the structured flowchart of the polycaryon processor of first embodiment of the invention;

Fig. 2 is the process flow diagram of the method for buffered data in polycaryon processor of first embodiment of the invention;

Fig. 3 is the structured flowchart of the polycaryon processor of second embodiment of the invention;

Fig. 4 is the process flow diagram of the method for buffered data in polycaryon processor of second embodiment of the invention.

Embodiment

Fig. 1 shows the structured flowchart of the polycaryon processor related in the present invention, as shown in Figure 1, polycaryon processor in the present invention comprises multiple processor core, multiple dedicated buffer memories of coupled relation are one by one formed centrally with described multiple processor core, be coupled in a universal buffering storer of described multiple processor core respectively, wherein said multiple dedicated buffer memory is only relevant to the thread performed by the processor core that described multiple dedicated buffer memory is coupled data cached for storing, a described universal buffering storer is relevant to the thread performed by multiple processor core data cached for storing.Described polycaryon processor also comprises mapped cache device, for memory buffers mapping table, at least store the storage relation between data cached and each memory buffer (comprising dedicated buffer memory and dedicated buffer memory) in described buffer memory mapping table, described storage relation comprises and is data cachedly stored in which memory buffer and which thread contexts that is data cached and which processor core.Wherein said polycaryon processor also comprises cache controller, for controlling described multiple dedicated buffer memory, universal buffering storer and mapped cache device, realize the operation such as write, reading, replacement, inquiry to described multiple dedicated buffer memory, universal buffering storer and mapped cache device.

In the present invention, the total amount of multiple threads of the same process proposed is not more than the total amount of described multiple processor core, and each in multiple threads of simultaneously same process distributes to different processor cores respectively to ensure multiple thread concurrence performance of same process.

Next, describe with reference to accompanying drawing 2 method of buffered data in polycaryon processor that the present invention proposes in detail, described method comprises: polycaryon processor receives the instruction of multiple threads of the same process of concurrence performance; Continuing when meeting the following conditions to perform subsequent step: the quantity of idle processor core is not less than the quantity of multiple threads of the concurrence performance of same process, there is course allocation mode simultaneously and making each thread to be distributed to different idle processor cores and the total resources being assigned with the idle processor core of thread is not less than processor resource total amount (namely all processing poweies being assigned with the idle processor core of thread can both meet the needs to the thread that it distributes) required for distributed thread; Otherwise return the information of indication mechanism inadequate resource.In response to the instruction of multiple threads of the same process of concurrence performance, each in described multiple thread is distributed to respectively described multiple processor core idle processor core in the heart, wherein each thread distributes to different idle processor cores and the processing power that allocation result is each has been assigned with the processor core of thread meets needs to the thread that it distributes completely, activate the idle processor core being assigned with thread simultaneously, make the processor core being assigned with thread become busy condition from idle condition.

After the idle processor core in polycaryon processor is activated, the processor core be activated enters the busy condition of execution thread, also therefore can need according to execution thread reading and the write of carrying out data, thus need to carry out data buffer storage, the processor core for this reason under busy condition often can send cache instruction to cache controller.In response to cache instruction, cache controller first query caching mapping table checks whether treat data cachedly to be stored in universal buffering storer, if existed, upgrade buffer memory mapping table and then send the successful acknowledge message of instruction buffer memory to the processor core sending cache instruction, wherein upgrade buffer memory mapping table and comprise the corresponding relation treated described in increase between processor core that is data cached and that be activated and the thread performed by processor core be activated, otherwise be data cachedly stored in the dedicated buffer memory being coupled to the processor core sending cache instruction by treating, to treat that data cached and storage relation that is dedicated buffer memory is updated in buffer memory mapping table simultaneously, then to treat data cachedly to carry out query analysis for querying condition in described buffer memory mapping table, if query analysis result for be not less than the quantity of a threshold value t dedicated buffer memory in all store described in treat data cached, wherein said threshold value t=s or or t=2, wherein s is just in the total amount of the processor core of execution thread, wherein expression rounds up, then described treating data cachedly is stored in universal buffering storer by cache controller, and treat data cached removing by what store in a described t dedicated buffer memory, discharge described in a described t dedicated buffer memory simultaneously and treat data cached shared storage space, final updating buffer memory mapping table, comprise the storage relation treated storage relation between data cached and universal buffering storer described in increase and treat described in deleting between data cached and t dedicated buffer memory, the successful confirmation of instruction buffer memory is sent by buffer control unit to the processor core sending cache instruction at the end of above-mentioned steps.Wherein said buffer memory mapping table is stored in mapped cache device.

In the present invention, when processor core performs after a thread terminates in the heart, first clear instruction is sent to cache controller.In response to clear instruction, cache controller first query caching mapping table checks in universal buffering storer that whether there is the thread terminated with execution is associated data cached.If exist and perform the thread that terminates and be associated data cached, then whether the data cached continuation query caching mapping table that the thread terminated for each and execution is associated is checked to exist and is greater than thread performed by threshold value t processor controller and is data cachedly associated with described.If there is no be greater than thread performed by threshold value t processor controller to be data cachedly associated with described, the dedicated buffer memory be then coupled at all and described data cached processor core be associated all has when storing described data cached ability, according in the dedicated buffer memory that described data cached unloading to all and the described data cached processor core be associated is coupled by the record of buffer memory mapping table, that deletes in universal buffering storer is described data cached, described data cached shared storage space in release universal buffering storer, upgrading buffer memory mapping table makes buffer memory mapping table reflect up-to-date storage relation.Those skilled in the art will know that after execution thread terminates, also need to remove the related data in main memory.After cache controller performs above step, send instruction to the processor core it being sent to clear instruction and remove successful confirmation instruction, in response to this confirmation instruction, this processor core enters idle condition by busy condition.

In the present invention, universal buffering storer may, owing to being constantly written into the data cached and state that reaches capacity, in this case, need to adjust cache policy.In order to effectively simplify the algorithm of high complexity of the prior art, the present invention proposes following methods, when needs write data cached in universal buffering storer, first judge whether the total amount of the idle storage space of universal buffering storer is not less than the data cached data volume of needs write, if the total amount of the idle storage space of universal buffering storer is not less than the data cached data volume of needs write, then directly data cachedly be written in universal buffering storer by what need write, if whether the total amount of the idle storage space of universal buffering storer is less than the data cached data volume of needs write, then first by the data cached unloading in universal buffering storer in main memory, then the storage space of universal buffering storer is discharged, finally will the data cached of suction be needed to be written in universal buffering storer.

In response to clear instruction, cache controller then query caching mapping table check in the dedicated buffer memory be coupled with this processor core whether exist with perform that the thread that terminates is associated data cached, if existed, remove inquire data cached and discharge this/these data cached shared by storage spaces.Cache controller in universal buffering storer; Upgrading buffer memory mapping table makes buffer memory mapping table reflect up-to-date storage relation.

When needs reading cache data, determine to read from universal buffering storer or dedicated buffer memory described data cached by query caching mapping table.Wherein for unloading to the buffer memory in main memory, when when query missed, reading described data cached further from main memory in buffer memory mapping table.Concrete read method does not repeat.

With reference to accompanying drawing 3-4, describe the method for multi-buffer data in polycaryon processor of second embodiment of the invention in detail, polycaryon processor comprises 2 ⁿindividual processor core (wherein n=2,3,4,5,6,7,8,9) and n+1 level buffer memory, wherein m level buffer memory comprises 2 ^n+1-mindividual memory buffer (1<m<n+1).Wherein, the i-th (1<=i<=2 of the 1st grade of buffer memory ⁿ) individual memory buffer is for storing the i-th (1<=i<=2 ⁿ) the required buffering of thread performed by individual processor core data cached, and only for storing the i-th (1<=i<=2 ⁿ) the required buffering of thread performed by individual processor core data cached; I-th (1<=i<=2 of level 2 cache memory ^n-1) individual memory buffer is for storing the 2nd of the 1st grade of buffer memory ⁱ-1 and the 2nd ⁱwhat have in individual memory buffer is data cached, and only for storing the 2nd of the 1st grade of buffer memory ⁱ-1 and the 2nd ⁱwhat have in individual memory buffer is data cached, further, when the 2nd of the 1st grade of buffer memory ⁱ-1 and the 2nd ⁱwhen all storing same buffered data in individual memory buffer, by cache controller by this same data cached unloading in i-th memory buffer of level 2 cache memory, simultaneously remove the 1st grade of buffer memory the 2nd ⁱ-1 and the 2nd ⁱsame data cached in individual memory buffer, the 2nd of release the 1st grade of buffer memory ⁱ-1 and the 2nd ⁱsame data cached shared storage space in individual memory buffer; In a word, the i-th (1<=i<=2 of the 1st grade of buffer memory ⁿ) individual memory buffer for the thread stored performed by i-th processor core cushion data cached, and only for the thread stored performed by i-th processor core cushion data cached; S(2<=s<=n+1) jth (1<=j<=2 of level buffer memory ^n+1-s) individual memory buffer is for storing the 2nd of s-1 level buffer memory ^j-1 and the 2nd ^jwhat have in individual memory buffer is data cached, and only for storing the 2nd of s level buffer memory ^j-1 and the 2nd ^jwhat have in individual memory buffer is data cached.Such as, as n=2, described polycaryon processor comprises 2 ²=4 processor cores and 3 grades of buffer memorys, wherein the 1st grade of buffer memory comprises 2 ²=4 first-level buffer storeies, level 2 cache memory comprises 2 ¹=2 level 2 buffering storeies, 3rd level buffer memory comprises 2 ⁰=1 three grades of memory buffer.Described polycaryon processor also comprises mapped cache device, for memory buffers mapping table, at least store the storage relation between data cached and each memory buffer (comprising each memory buffer of buffer memory at different levels) in described buffer memory mapping table, described storage relation comprises and is data cachedly stored in which memory buffer and which thread contexts that is data cached and which processor core.Wherein said polycaryon processor also comprises cache controller, for controlling each memory buffer and the mapped cache device of buffer memory at different levels, realizes the operation such as write, reading, replacement, inquiry of each memory buffer to buffer memory at different levels and mapped cache device.Wherein communicate with bus bar between each memory buffer of each processor core, buffer memory at different levels, mapped cache device and cache controller.

After polycaryon processor receives the instruction of multiple threads of the same process of concurrence performance; Continuing when meeting the following conditions to perform subsequent step: the quantity of idle processor core is not less than the quantity of multiple threads of the concurrence performance of same process, there is course allocation mode simultaneously and making each thread to be distributed to different idle processor cores and the total resources being assigned with the idle processor core of thread is not less than processor resource total amount (namely all processing poweies being assigned with the idle processor core of thread can both meet the needs to the thread that it distributes) required for distributed thread; Otherwise return the information of indication mechanism inadequate resource.In response to the instruction of multiple threads of the same process of concurrence performance, each in described multiple thread is distributed to respectively described multiple processor core idle processor core in the heart, wherein each thread distributes to different idle processor cores and the processing power that allocation result is each has been assigned with the processor core of thread meets needs to the thread that it distributes completely, activate the idle processor core being assigned with thread simultaneously, make the processor core being assigned with thread become busy condition from idle condition.

When polycaryon processor (2 ⁿindividual processor core, wherein n be greater than 1 integer, such as, n=2,3,4,5,6,7,8 or 9) in the i-th (1<=i<=2 ⁿ) after individual processor core (idle processor core) is activated, the processor core be activated enters the busy condition of execution thread, also therefore can need according to execution thread reading and the write of carrying out data, thus need to carry out data buffer storage, the processor core for this reason under busy condition often can send cache instruction to cache controller.In response to cache instruction, cache controller first query caching mapping table checks whether treat data cachedly to be stored in m(2<=m<=n+1) level buffer memory ( represent downward floor operation, k=2 ^m-1) in individual memory buffer, if existed, upgrade buffer memory mapping table and then send the successful acknowledge message of instruction buffer memory to i-th processor core, wherein upgrade buffer memory mapping table and comprise the corresponding relation treated described in increase between processor core that is data cached, that be activated and the thread performed by processor core be activated; Otherwise be data cachedly stored in i-th memory buffer of the 1st grade of buffer memory by treating, the storage relation of i-th memory buffer treating data cached and the 1st grade of buffer memory is updated in buffer memory mapping table simultaneously, then to treat data cachedly to carry out query analysis for querying condition in described buffer memory mapping table, if query analysis result is of the 1st grade of buffer memory * 2-1 and * 2(wherein represent and round up) all store same buffered data in individual memory buffer, then described treating data cachedly is dumped to the of level 2 cache memory by cache controller in individual memory buffer, and by of the 1st grade of buffer memory * 2-1 and * what store in 2 memory buffer treats data cached removing, discharge the of the 1st grade of buffer memory simultaneously * 2-1 and data cached shared storage space is treated described in * 2 memory buffer, final updating buffer memory mapping table, comprises the storage relation treated storage relation in data cached and level 2 cache memory between each memory buffer described in increase and treat described in deleting between data cached and each memory buffer of the 1st grade of buffer memory.The like, carry out step by step judging and processing.Such as when n=2 and polycaryon processor comprise 4 processor cores, except when carrying out above-mentioned steps, also need to treat data cachedly to carry out query analysis for querying condition in described buffer memory mapping table, if query analysis result is all store same buffered data in the 1st and the 2nd memory buffer of level 2 cache memory, then described treating data cachedly dumps in the 1st (only one) memory buffer of 3rd level buffer memory by cache controller, and treat data cached removing by what store in the 1st of level 2 cache memory the and 2 memory buffer, discharge described in the 1st and the 2nd memory buffer of level 2 cache memory simultaneously and treat data cached shared storage space, final updating buffer memory mapping table, comprise the storage relation treated storage relation in data cached and 3rd level buffer memory between first memory buffer described in increase and treat described in deleting between data cached and each memory buffer of level 2 cache memory.In a word, judge step by step from the 1st grade of buffer memory to the n-th grade buffer memory, if of t level buffer memory * 2-1 and * 2(wherein expression rounds up, 1<=r<=2 ^n+1-t, 1<=t<=n) all store described data to be buffered in individual memory buffer, then cache controller by described data conversion storage to be buffered to of t+1 level buffer memory in individual memory buffer, and by of t level buffer memory * 2-1 and * what store in 2 memory buffer treats data cached removing, discharge of t level buffer memory simultaneously * 2-1 and treat data cached shared storage space described in * 2 memory buffer, final updating buffer memory mapping table, comprise to treat described in increase data cached with t+1 level buffer memory in the storage relation between individual memory buffer and treat described in deleting data cached with t level buffer memory the * 2-1 and storage relation between * 2 memory buffer.The successful acknowledge message of instruction buffer memory is sent by buffer control unit to i-th processor core at the end of above-mentioned steps.Wherein said buffer memory mapping table is stored in mapped cache device.It is all integer, such as parameter m, n, p, q, r, s, k, t, i, j that those skilled in the art should specify all parameters used in the present invention.

In a second embodiment, when processor core perform in the heart a thread terminate after process be similar to disposal route in the first embodiment, and all have data cached ability is treated in enough spatial cache storages in memory buffer at different levels in this embodiment and implement.In this embodiment, the method for reading cache data also implement with first in the method for reading cache data similar.In addition, although be not shown specifically in accompanying drawing 4 about cycle criterion the 1st grade to of n-th grade of buffer memory * 2-1 and * all store described data to be buffered in 2 memory buffer, but art technology is according to the above-mentioned detailed record of instructions, should understands it is judge step by step herein, in order to the emphasis of the design of outstanding invention, concrete circulating treatment procedure is also not shown.

Data cached in the present invention in universal buffering storer and non-first-level buffer storer, processor core can only read and can not revise, if need amendment, need to re-start buffer memory write, that is when to need to revise in universal buffering storer or non-first-level buffer storer data cached for processor core, data cachedly data cached method of carrying out buffered data in polycaryon processor of the present invention is treated as new using amended, namely original data cached and amendedly data cachedly data cachedly carry out write operation as different treating.

Only exemplary about description of the invention above, and be described in detail mainly for the essential features involved by the technical problem to be solved in the present invention, it should be clearly know that for those skilled in the art or not repeating about other correlative details of the present invention of easily expecting, such as, when the idle storage space deficiency of dedicated buffer memory or universal buffering storer cause storing treat data cached when, need by replace algorithm replace formerly store data cached, do not repeat at this.

Should be appreciated that, above-described embodiment is the detailed description of carrying out for specific embodiment, but the present invention is not limited to this embodiment, without departing from the spirit and scope of the present invention, various improvement and modification can be made to the present invention, such as when weight information instruction is when data cached weight is low weight, middle weight and high weight, without departing from the spirit and scope of the present invention, method data cached in memory buffer of the present invention can be improved further.

Claims

1. the method for a buffered data in polycaryon processor, wherein said polycaryon processor comprises multiple processor core, is formed centrally multiple dedicated buffer memory of coupled relation one by one with described multiple processor core and is coupled in a universal buffering storer of described multiple processor core respectively, and described method comprises:

Receive the instruction of the multiple thread of concurrence performance;

When all store in the dedicated buffer memory of quantity being not less than a threshold value t same data cached time, be data cachedly stored in universal buffering storer by same;

Wherein t=s or or t=2, wherein s is the total amount of the processor core be under state of activation, wherein expression rounds up.

2. method according to claim 1, wherein by same data cached be stored in universal buffering storer after, remove that to store in same data cached dedicated buffer memory same data cached, and release removes the same data cached shared storage space stored in same data cached dedicated buffer memory.

3. method according to claim 1, when described multiple processor core in the heart each needs reading cache data, reads described data cached by query caching mapping table from universal buffering storer or dedicated buffer memory.

4. the method for multi-buffer data in polycaryon processor, described polycaryon processor comprises 2 ⁿindividual processor core and n+1 level buffer memory, wherein m level buffer memory comprises 2 ^n+1-mindividual memory buffer; Wherein, i-th memory buffer of the 1st grade of buffer memory only for the thread stored performed by i-th processor core cushion data cached; A jth memory buffer of s level buffer memory is only for storing the 2nd of s-1 level buffer memory ^j-1 and the 2nd ^jwhat have in individual memory buffer is data cached, wherein, n be greater than 1 integer, 1<=m<=n+1,2<=s<=n+1,1<=i<=2 ⁿ, 1<=j<=2 ^n+1-m; Described method comprises:

If there is no, then be data cachedly stored in i-th memory buffer of the 1st grade of buffer memory by treating; Then judge step by step from the 1st grade of buffer memory to the n-th grade buffer memory, if of t level buffer memory with treat data cached described in all storing in * 2 memory buffer, then described treating data cachedly is dumped to of t+1 level buffer memory in individual memory buffer, and by of t level buffer memory with what store in individual memory buffer treats data cached removing, discharges of t level buffer memory simultaneously with data cached shared storage space is treated, wherein described in individual memory buffer 1<=t<=n, expression rounds up;

Send the successful acknowledge message of instruction buffer memory.