CN103345451B - Data buffering method in multi-core processor - Google Patents

Data buffering method in multi-core processor Download PDF

Info

Publication number
CN103345451B
CN103345451B CN201310301037.3A CN201310301037A CN103345451B CN 103345451 B CN103345451 B CN 103345451B CN 201310301037 A CN201310301037 A CN 201310301037A CN 103345451 B CN103345451 B CN 103345451B
Authority
CN
China
Prior art keywords
buffer memory
memory
buffer
processor core
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201310301037.3A
Other languages
Chinese (zh)
Other versions
CN103345451A (en
Inventor
毛力
容强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan qianhang Technology Co., Ltd
Original Assignee
SICHUAN JIUCHENG INFORMATION TECHNOLOGY Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by SICHUAN JIUCHENG INFORMATION TECHNOLOGY Co Ltd filed Critical SICHUAN JIUCHENG INFORMATION TECHNOLOGY Co Ltd
Priority to CN201310301037.3A priority Critical patent/CN103345451B/en
Publication of CN103345451A publication Critical patent/CN103345451A/en
Application granted granted Critical
Publication of CN103345451B publication Critical patent/CN103345451B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Memory System Of A Hierarchy Structure (AREA)

Abstract

The invention provides a data buffering method in a multi-core processor. The data buffering method in the multi-core processor comprises the steps of receiving a command for concurrently executing multiple threads; independently assigning each of the multiple threads to multiple cores of the processor respectively, wherein each of the multiple cores of the processor is assigned with one thread at most; responding to caching requests regarding each core, with the assigned thread, of the processor during the period that the threads are executed, and storing caching data to a coupled special buffer storage; when caching storages which are larger than or equal to a threshold value t in number store the same caching data, storing the same caching data to a general buffer storage. Through the data buffering method in the multi-core processer, the caching assess speed and the replacement speed are improved, and the problem of false sharing is overcome.

Description

A kind of method of buffered data in polycaryon processor
Technical field
The present invention relates to field of data storage, particularly relate to a kind of method of buffered data in polycaryon processor, relate to a kind of method of multi-buffer data in polycaryon processor further.
Background technology
Gaps between their growth rates between processor and main memory are outstanding contradictions concerning polycaryon processor, therefore must use multi-level buffer buffer memory to alleviate.There is the polycaryon processor of shared level cache at present, share the polycaryon processor of L2 cache and the polycaryon processor of shared main memory.Usually, polycaryon processor adopts the polycaryon processor structure sharing L2 cache, and namely each processor core has privately owned level cache, and all processor cores share L2 cache.The architecture Design of buffer memory self is also directly connected to entire system performance.But in polycaryon processor structure, which is better and which is worse for shared buffer memory or exclusive buffer memory, need not need to set up multi-level buffer on one chip, and set up what buffer memory etc., because the size to whole chip, power consumption, layout, performance and operational efficiency etc. all have a great impact, thus these are all the problems needing conscientious research and exploitation.On the other hand, multi-level buffer causes consistency problem again.Adopt which kind of buffer consistency model and mechanism all will produce material impact to polycaryon processor overall performance.The buffer consistency model extensively adopted in conventional multi-processor system architecture has: sequential consistency model, weak consistency model, release consistency model etc.Associated cache coherence mechanisms mainly contains the snoopy protocol of bus and the directory protocol based on catalogue.Current multi-core processor system adopts the snoopy protocol based on bus mostly.
Sometimes need to carry out data sharing with synchronous between the program of each processor core execution of polycaryon processor processor, therefore its hardware configuration must support intercore communication.Efficient communication mechanism is the high performance important leverage of polycaryon processor processor, and on the sheet comparing main flow at present, efficient communication mechanism has two kinds, and a kind of is the buffer structure shared based on bus, and a kind of is based on the interconnection structure on sheet.Bus shared buffer memory structure refers to that each processor cores has shared secondary or three grades of buffer memorys, for preserving relatively more conventional data, and is communicated by the bus connecting core.The advantage of this system is that structure is simple, and communication speed is high, and shortcoming is poor based on the structure extensibility of bus.Structure based on on-chip interconnect refers to that each processor core has independently processing unit and buffer memory, and each processor core is linked together by the mode such as cross bar switch or network-on-chip.Each processor core passes through message communicating in the heart.The advantage of this structure is that extensibility is good, and data bandwidth is guaranteed; Shortcoming is that hardware configuration is complicated, and software alteration is larger.Perhaps the competition results of both is not replace mutually but work in coordination, such as, adopt network-on-chip in global scope and locally adopt bus mode, reaching the balance of performance and complicacy.
In conventional microprocessor, buffer memory does not hit or memory access event all can have a negative impact to the execution efficiency of processor, and the work efficiency of Bus Interface Unit can determine this effect.When multiple processor core require simultaneously access memory or the intracardiac privately owned buffer memory of multiple processor core occur simultaneously buffer memory not hit event time, the efficiency of changing the mechanism of BIU to the arbitration mechanism of this multiple request of access and externally memory access determines the overall performance of multi-core processor system.Therefore find efficient multiport Bus Interface Unit structure, transfer the individual character access of multi-core to main memory to more efficient Burst accessing; Find the important content that the quantitative model of a Burst accessing word of polycaryon processor processor whole efficiency the best and the arbitration mechanism of efficient multiport Bus Interface Unit access will be the research of polycaryon processor processor simultaneously.
In current multi-core processor system, no matter no matter are L2 cache or three grades of buffer memorys, be shared buffer memory or privately owned buffer memory, and reading and the replacement algorithm of its buffer memory all exist the technical matterss such as algorithm complex is high, hit duration is long.
In addition, in existing L2 cache mode, usually in privately owned level cache, all there is backup in the L2 cache data of shared buffer memory data, so, after data cached in level cache is changed by different processor cores, there will be the pseudo-sharing problem of high-speed cache, thus bring needs often to reload, add access delay, system newly can be declined.
Summary of the invention
In order to the reading and replacement algorithm that solve buffer memory in existing multi-core processor system all exist the technical matterss such as algorithm complex is high, hit duration is long, puppet is shared, the invention provides a kind of method of buffered data in polycaryon processor, wherein said polycaryon processor comprises multiple processor core, is formed centrally multiple dedicated buffer memory of coupled relation one by one with described multiple processor core and is coupled in a universal buffering storer of described multiple processor core respectively, and described method comprises:
Receive the instruction of the multiple thread of concurrence performance;
Each in described multiple thread is separately distributed to described multiple processor core, and wherein multiple processor core in the heart each is assigned with at most a thread;
The processor core of thread being assigned with for each, in response to the cache request during execution thread, being data cachedly stored in be coupled dedicated buffer memory by treating;
When all store in the dedicated buffer memory of quantity being not less than a threshold value t same data cached time, to be data cachedly stored into same in universal buffering storer.
Preferably, wherein by same data cached be stored in universal buffering storer after, remove that to store in same data cached dedicated buffer memory same data cached, and release removes the same data cached shared storage space stored in same data cached dedicated buffer memory.
Preferably, when described multiple processor core in the heart each needs reading cache data, read described data cached from universal buffering storer or dedicated buffer memory by query caching mapping table.
Preferably, wherein t=s or or t=2, wherein s is the total amount of the processor core be under state of activation, wherein expression rounds up.
The invention also discloses a kind of method of multi-buffer data in polycaryon processor, described polycaryon processor comprises 2 nindividual processor core, wherein, and n+1 level buffer memory, wherein m level buffer memory comprises 2 n+1-mindividual memory buffer; Wherein, i-th memory buffer of the 1st grade of buffer memory only for the thread stored performed by i-th processor core cushion data cached; A jth memory buffer of s level buffer memory is only for storing the 2nd of s-1 level buffer memory j-1 and the 2nd jtotal data cached in individual memory buffer, wherein, n be greater than 1 integer, 1<=m<=n+1,2<=s<=n+1,1<=i<=2 n, 1<=j<=2 n+1-m; Described method comprises:
In response to the instruction of multiple threads of the same process of concurrence performance, each thread is distributed to different idle processor cores, activate the idle processor core being assigned with thread simultaneously, make the processor core being assigned with thread become busy condition from idle condition;
After i-th processor core is activated, in response to cache instruction, first check and whether treat data cachedly to be stored in of p level buffer memory in individual memory buffer, if existed, send the successful acknowledge message of instruction buffer memory, wherein 1<=i<=2 n, 2<=p<=n+1, represent downward floor operation, k=2 p-1;
If there is no, then be data cachedly stored in i-th memory buffer of the 1st grade of buffer memory by treating; Then judge step by step from the 1st grade of buffer memory to the n-th grade buffer memory, if of t level buffer memory * 2-1 and all store described data to be buffered in * 2 memory buffer, then by described data conversion storage to be buffered to of t+1 level buffer memory in individual memory buffer, and by of t level buffer memory * 2-1 and * what store in 2 memory buffer treats data cached removing, discharge of t level buffer memory simultaneously * 2-1 and data cached shared storage space is treated, wherein described in * 2 memory buffer 1<=t<=n, expression rounds up;
Send the successful acknowledge message of instruction buffer memory.
The present invention utilizes buffer memory mapping table, and treat based on execution thread and data cachedly read in real time and replace, especially in real time between dedicated buffer memory and universal buffering storer, data cached replacement is carried out, and utilize multi-level buffer to read in real time and replace data cached, achieve lower algorithm complex, decrease hit duration, improve the whole efficiency of the computer system with polycaryon processor.
Accompanying drawing explanation
Included accompanying drawing is used for understanding the present invention further, and its ingredient as instructions also explains principle of the present invention together with instructions, in the accompanying drawings:
Fig. 1 is the structured flowchart of the polycaryon processor of first embodiment of the invention;
Fig. 2 is the process flow diagram of the method for buffered data in polycaryon processor of first embodiment of the invention;
Fig. 3 is the structured flowchart of the polycaryon processor of second embodiment of the invention;
Fig. 4 is the process flow diagram of the method for buffered data in polycaryon processor of second embodiment of the invention.
Embodiment
Fig. 1 shows the structured flowchart of the polycaryon processor related in the present invention, as shown in Figure 1, polycaryon processor in the present invention comprises multiple processor core, multiple dedicated buffer memories of coupled relation are one by one formed centrally with described multiple processor core, be coupled in a universal buffering storer of described multiple processor core respectively, wherein said multiple dedicated buffer memory is only relevant to the thread performed by the processor core that described multiple dedicated buffer memory is coupled data cached for storing, a described universal buffering storer is relevant to the thread performed by multiple processor core data cached for storing.Described polycaryon processor also comprises mapped cache device, for memory buffers mapping table, at least store the storage relation between data cached and each memory buffer (comprising dedicated buffer memory and dedicated buffer memory) in described buffer memory mapping table, described storage relation comprises and is data cachedly stored in which memory buffer and which thread contexts that is data cached and which processor core.Wherein said polycaryon processor also comprises cache controller, for controlling described multiple dedicated buffer memory, universal buffering storer and mapped cache device, realize the operation such as write, reading, replacement, inquiry to described multiple dedicated buffer memory, universal buffering storer and mapped cache device.
In the present invention, the total amount of multiple threads of the same process proposed is not more than the total amount of described multiple processor core, and each in multiple threads of simultaneously same process distributes to different processor cores respectively to ensure multiple thread concurrence performance of same process.
Next, describe with reference to accompanying drawing 2 method of buffered data in polycaryon processor that the present invention proposes in detail, described method comprises: polycaryon processor receives the instruction of multiple threads of the same process of concurrence performance; Continuing when meeting the following conditions to perform subsequent step: the quantity of idle processor core is not less than the quantity of multiple threads of the concurrence performance of same process, there is course allocation mode simultaneously and making each thread to be distributed to different idle processor cores and the total resources being assigned with the idle processor core of thread is not less than processor resource total amount (namely all processing poweies being assigned with the idle processor core of thread can both meet the needs to the thread that it distributes) required for distributed thread; Otherwise return the information of indication mechanism inadequate resource.In response to the instruction of multiple threads of the same process of concurrence performance, each in described multiple thread is distributed to respectively described multiple processor core idle processor core in the heart, wherein each thread distributes to different idle processor cores and the processing power that allocation result is each has been assigned with the processor core of thread meets needs to the thread that it distributes completely, activate the idle processor core being assigned with thread simultaneously, make the processor core being assigned with thread become busy condition from idle condition.
After the idle processor core in polycaryon processor is activated, the processor core be activated enters the busy condition of execution thread, also therefore can need according to execution thread reading and the write of carrying out data, thus need to carry out data buffer storage, the processor core for this reason under busy condition often can send cache instruction to cache controller.In response to cache instruction, cache controller first query caching mapping table checks whether treat data cachedly to be stored in universal buffering storer, if existed, upgrade buffer memory mapping table and then send the successful acknowledge message of instruction buffer memory to the processor core sending cache instruction, wherein upgrade buffer memory mapping table and comprise the corresponding relation treated described in increase between processor core that is data cached and that be activated and the thread performed by processor core be activated, otherwise be data cachedly stored in the dedicated buffer memory being coupled to the processor core sending cache instruction by treating, to treat that data cached and storage relation that is dedicated buffer memory is updated in buffer memory mapping table simultaneously, then to treat data cachedly to carry out query analysis for querying condition in described buffer memory mapping table, if query analysis result for be not less than the quantity of a threshold value t dedicated buffer memory in all store described in treat data cached, wherein said threshold value t=s or or t=2, wherein s is just in the total amount of the processor core of execution thread, wherein expression rounds up, then described treating data cachedly is stored in universal buffering storer by cache controller, and treat data cached removing by what store in a described t dedicated buffer memory, discharge described in a described t dedicated buffer memory simultaneously and treat data cached shared storage space, final updating buffer memory mapping table, comprise the storage relation treated storage relation between data cached and universal buffering storer described in increase and treat described in deleting between data cached and t dedicated buffer memory, the successful confirmation of instruction buffer memory is sent by buffer control unit to the processor core sending cache instruction at the end of above-mentioned steps.Wherein said buffer memory mapping table is stored in mapped cache device.
In the present invention, when processor core performs after a thread terminates in the heart, first clear instruction is sent to cache controller.In response to clear instruction, cache controller first query caching mapping table checks in universal buffering storer that whether there is the thread terminated with execution is associated data cached.If exist and perform the thread that terminates and be associated data cached, then whether the data cached continuation query caching mapping table that the thread terminated for each and execution is associated is checked to exist and is greater than thread performed by threshold value t processor controller and is data cachedly associated with described.If there is no be greater than thread performed by threshold value t processor controller to be data cachedly associated with described, the dedicated buffer memory be then coupled at all and described data cached processor core be associated all has when storing described data cached ability, according in the dedicated buffer memory that described data cached unloading to all and the described data cached processor core be associated is coupled by the record of buffer memory mapping table, that deletes in universal buffering storer is described data cached, described data cached shared storage space in release universal buffering storer, upgrading buffer memory mapping table makes buffer memory mapping table reflect up-to-date storage relation.Those skilled in the art will know that after execution thread terminates, also need to remove the related data in main memory.After cache controller performs above step, send instruction to the processor core it being sent to clear instruction and remove successful confirmation instruction, in response to this confirmation instruction, this processor core enters idle condition by busy condition.
In the present invention, universal buffering storer may, owing to being constantly written into the data cached and state that reaches capacity, in this case, need to adjust cache policy.In order to effectively simplify the algorithm of high complexity of the prior art, the present invention proposes following methods, when needs write data cached in universal buffering storer, first judge whether the total amount of the idle storage space of universal buffering storer is not less than the data cached data volume of needs write, if the total amount of the idle storage space of universal buffering storer is not less than the data cached data volume of needs write, then directly data cachedly be written in universal buffering storer by what need write, if whether the total amount of the idle storage space of universal buffering storer is less than the data cached data volume of needs write, then first by the data cached unloading in universal buffering storer in main memory, then the storage space of universal buffering storer is discharged, finally will the data cached of suction be needed to be written in universal buffering storer.
In response to clear instruction, cache controller then query caching mapping table check in the dedicated buffer memory be coupled with this processor core whether exist with perform that the thread that terminates is associated data cached, if existed, remove inquire data cached and discharge this/these data cached shared by storage spaces.Cache controller in universal buffering storer; Upgrading buffer memory mapping table makes buffer memory mapping table reflect up-to-date storage relation.
When needs reading cache data, determine to read from universal buffering storer or dedicated buffer memory described data cached by query caching mapping table.Wherein for unloading to the buffer memory in main memory, when when query missed, reading described data cached further from main memory in buffer memory mapping table.Concrete read method does not repeat.
With reference to accompanying drawing 3-4, describe the method for multi-buffer data in polycaryon processor of second embodiment of the invention in detail, polycaryon processor comprises 2 nindividual processor core (wherein n=2,3,4,5,6,7,8,9) and n+1 level buffer memory, wherein m level buffer memory comprises 2 n+1-mindividual memory buffer (1<m<n+1).Wherein, the i-th (1<=i<=2 of the 1st grade of buffer memory n) individual memory buffer is for storing the i-th (1<=i<=2 n) the required buffering of thread performed by individual processor core data cached, and only for storing the i-th (1<=i<=2 n) the required buffering of thread performed by individual processor core data cached; I-th (1<=i<=2 of level 2 cache memory n-1) individual memory buffer is for storing the 2nd of the 1st grade of buffer memory i-1 and the 2nd iwhat have in individual memory buffer is data cached, and only for storing the 2nd of the 1st grade of buffer memory i-1 and the 2nd iwhat have in individual memory buffer is data cached, further, when the 2nd of the 1st grade of buffer memory i-1 and the 2nd iwhen all storing same buffered data in individual memory buffer, by cache controller by this same data cached unloading in i-th memory buffer of level 2 cache memory, simultaneously remove the 1st grade of buffer memory the 2nd i-1 and the 2nd isame data cached in individual memory buffer, the 2nd of release the 1st grade of buffer memory i-1 and the 2nd isame data cached shared storage space in individual memory buffer; In a word, the i-th (1<=i<=2 of the 1st grade of buffer memory n) individual memory buffer for the thread stored performed by i-th processor core cushion data cached, and only for the thread stored performed by i-th processor core cushion data cached; S(2<=s<=n+1) jth (1<=j<=2 of level buffer memory n+1-s) individual memory buffer is for storing the 2nd of s-1 level buffer memory j-1 and the 2nd jwhat have in individual memory buffer is data cached, and only for storing the 2nd of s level buffer memory j-1 and the 2nd jwhat have in individual memory buffer is data cached.Such as, as n=2, described polycaryon processor comprises 2 2=4 processor cores and 3 grades of buffer memorys, wherein the 1st grade of buffer memory comprises 2 2=4 first-level buffer storeies, level 2 cache memory comprises 2 1=2 level 2 buffering storeies, 3rd level buffer memory comprises 2 0=1 three grades of memory buffer.Described polycaryon processor also comprises mapped cache device, for memory buffers mapping table, at least store the storage relation between data cached and each memory buffer (comprising each memory buffer of buffer memory at different levels) in described buffer memory mapping table, described storage relation comprises and is data cachedly stored in which memory buffer and which thread contexts that is data cached and which processor core.Wherein said polycaryon processor also comprises cache controller, for controlling each memory buffer and the mapped cache device of buffer memory at different levels, realizes the operation such as write, reading, replacement, inquiry of each memory buffer to buffer memory at different levels and mapped cache device.Wherein communicate with bus bar between each memory buffer of each processor core, buffer memory at different levels, mapped cache device and cache controller.
After polycaryon processor receives the instruction of multiple threads of the same process of concurrence performance; Continuing when meeting the following conditions to perform subsequent step: the quantity of idle processor core is not less than the quantity of multiple threads of the concurrence performance of same process, there is course allocation mode simultaneously and making each thread to be distributed to different idle processor cores and the total resources being assigned with the idle processor core of thread is not less than processor resource total amount (namely all processing poweies being assigned with the idle processor core of thread can both meet the needs to the thread that it distributes) required for distributed thread; Otherwise return the information of indication mechanism inadequate resource.In response to the instruction of multiple threads of the same process of concurrence performance, each in described multiple thread is distributed to respectively described multiple processor core idle processor core in the heart, wherein each thread distributes to different idle processor cores and the processing power that allocation result is each has been assigned with the processor core of thread meets needs to the thread that it distributes completely, activate the idle processor core being assigned with thread simultaneously, make the processor core being assigned with thread become busy condition from idle condition.
When polycaryon processor (2 nindividual processor core, wherein n be greater than 1 integer, such as, n=2,3,4,5,6,7,8 or 9) in the i-th (1<=i<=2 n) after individual processor core (idle processor core) is activated, the processor core be activated enters the busy condition of execution thread, also therefore can need according to execution thread reading and the write of carrying out data, thus need to carry out data buffer storage, the processor core for this reason under busy condition often can send cache instruction to cache controller.In response to cache instruction, cache controller first query caching mapping table checks whether treat data cachedly to be stored in m(2<=m<=n+1) level buffer memory ( represent downward floor operation, k=2 m-1) in individual memory buffer, if existed, upgrade buffer memory mapping table and then send the successful acknowledge message of instruction buffer memory to i-th processor core, wherein upgrade buffer memory mapping table and comprise the corresponding relation treated described in increase between processor core that is data cached, that be activated and the thread performed by processor core be activated; Otherwise be data cachedly stored in i-th memory buffer of the 1st grade of buffer memory by treating, the storage relation of i-th memory buffer treating data cached and the 1st grade of buffer memory is updated in buffer memory mapping table simultaneously, then to treat data cachedly to carry out query analysis for querying condition in described buffer memory mapping table, if query analysis result is of the 1st grade of buffer memory * 2-1 and * 2(wherein represent and round up) all store same buffered data in individual memory buffer, then described treating data cachedly is dumped to the of level 2 cache memory by cache controller in individual memory buffer, and by of the 1st grade of buffer memory * 2-1 and * what store in 2 memory buffer treats data cached removing, discharge the of the 1st grade of buffer memory simultaneously * 2-1 and data cached shared storage space is treated described in * 2 memory buffer, final updating buffer memory mapping table, comprises the storage relation treated storage relation in data cached and level 2 cache memory between each memory buffer described in increase and treat described in deleting between data cached and each memory buffer of the 1st grade of buffer memory.The like, carry out step by step judging and processing.Such as when n=2 and polycaryon processor comprise 4 processor cores, except when carrying out above-mentioned steps, also need to treat data cachedly to carry out query analysis for querying condition in described buffer memory mapping table, if query analysis result is all store same buffered data in the 1st and the 2nd memory buffer of level 2 cache memory, then described treating data cachedly dumps in the 1st (only one) memory buffer of 3rd level buffer memory by cache controller, and treat data cached removing by what store in the 1st of level 2 cache memory the and 2 memory buffer, discharge described in the 1st and the 2nd memory buffer of level 2 cache memory simultaneously and treat data cached shared storage space, final updating buffer memory mapping table, comprise the storage relation treated storage relation in data cached and 3rd level buffer memory between first memory buffer described in increase and treat described in deleting between data cached and each memory buffer of level 2 cache memory.In a word, judge step by step from the 1st grade of buffer memory to the n-th grade buffer memory, if of t level buffer memory * 2-1 and * 2(wherein expression rounds up, 1<=r<=2 n+1-t, 1<=t<=n) all store described data to be buffered in individual memory buffer, then cache controller by described data conversion storage to be buffered to of t+1 level buffer memory in individual memory buffer, and by of t level buffer memory * 2-1 and * what store in 2 memory buffer treats data cached removing, discharge of t level buffer memory simultaneously * 2-1 and treat data cached shared storage space described in * 2 memory buffer, final updating buffer memory mapping table, comprise to treat described in increase data cached with t+1 level buffer memory in the storage relation between individual memory buffer and treat described in deleting data cached with t level buffer memory the * 2-1 and storage relation between * 2 memory buffer.The successful acknowledge message of instruction buffer memory is sent by buffer control unit to i-th processor core at the end of above-mentioned steps.Wherein said buffer memory mapping table is stored in mapped cache device.It is all integer, such as parameter m, n, p, q, r, s, k, t, i, j that those skilled in the art should specify all parameters used in the present invention.
In a second embodiment, when processor core perform in the heart a thread terminate after process be similar to disposal route in the first embodiment, and all have data cached ability is treated in enough spatial cache storages in memory buffer at different levels in this embodiment and implement.In this embodiment, the method for reading cache data also implement with first in the method for reading cache data similar.In addition, although be not shown specifically in accompanying drawing 4 about cycle criterion the 1st grade to of n-th grade of buffer memory * 2-1 and * all store described data to be buffered in 2 memory buffer, but art technology is according to the above-mentioned detailed record of instructions, should understands it is judge step by step herein, in order to the emphasis of the design of outstanding invention, concrete circulating treatment procedure is also not shown.
Data cached in the present invention in universal buffering storer and non-first-level buffer storer, processor core can only read and can not revise, if need amendment, need to re-start buffer memory write, that is when to need to revise in universal buffering storer or non-first-level buffer storer data cached for processor core, data cachedly data cached method of carrying out buffered data in polycaryon processor of the present invention is treated as new using amended, namely original data cached and amendedly data cachedly data cachedly carry out write operation as different treating.
Only exemplary about description of the invention above, and be described in detail mainly for the essential features involved by the technical problem to be solved in the present invention, it should be clearly know that for those skilled in the art or not repeating about other correlative details of the present invention of easily expecting, such as, when the idle storage space deficiency of dedicated buffer memory or universal buffering storer cause storing treat data cached when, need by replace algorithm replace formerly store data cached, do not repeat at this.
Should be appreciated that, above-described embodiment is the detailed description of carrying out for specific embodiment, but the present invention is not limited to this embodiment, without departing from the spirit and scope of the present invention, various improvement and modification can be made to the present invention, such as when weight information instruction is when data cached weight is low weight, middle weight and high weight, without departing from the spirit and scope of the present invention, method data cached in memory buffer of the present invention can be improved further.

Claims (4)

1. the method for a buffered data in polycaryon processor, wherein said polycaryon processor comprises multiple processor core, is formed centrally multiple dedicated buffer memory of coupled relation one by one with described multiple processor core and is coupled in a universal buffering storer of described multiple processor core respectively, and described method comprises:
Receive the instruction of the multiple thread of concurrence performance;
Each in described multiple thread is separately distributed to described multiple processor core, and wherein multiple processor core in the heart each is assigned with at most a thread;
The processor core of thread being assigned with for each, in response to the cache request during execution thread, being data cachedly stored in be coupled dedicated buffer memory by treating;
When all store in the dedicated buffer memory of quantity being not less than a threshold value t same data cached time, be data cachedly stored in universal buffering storer by same;
Wherein t=s or or t=2, wherein s is the total amount of the processor core be under state of activation, wherein expression rounds up.
2. method according to claim 1, wherein by same data cached be stored in universal buffering storer after, remove that to store in same data cached dedicated buffer memory same data cached, and release removes the same data cached shared storage space stored in same data cached dedicated buffer memory.
3. method according to claim 1, when described multiple processor core in the heart each needs reading cache data, reads described data cached by query caching mapping table from universal buffering storer or dedicated buffer memory.
4. the method for multi-buffer data in polycaryon processor, described polycaryon processor comprises 2 nindividual processor core and n+1 level buffer memory, wherein m level buffer memory comprises 2 n+1-mindividual memory buffer; Wherein, i-th memory buffer of the 1st grade of buffer memory only for the thread stored performed by i-th processor core cushion data cached; A jth memory buffer of s level buffer memory is only for storing the 2nd of s-1 level buffer memory j-1 and the 2nd jwhat have in individual memory buffer is data cached, wherein, n be greater than 1 integer, 1<=m<=n+1,2<=s<=n+1,1<=i<=2 n, 1<=j<=2 n+1-m; Described method comprises:
In response to the instruction of multiple threads of the same process of concurrence performance, each thread is distributed to different idle processor cores, activate the idle processor core being assigned with thread simultaneously, make the processor core being assigned with thread become busy condition from idle condition;
After i-th processor core is activated, in response to cache instruction, first check and whether treat data cachedly to be stored in of p level buffer memory in individual memory buffer, if existed, send the successful acknowledge message of instruction buffer memory, wherein 1<=i<=2 n, 2<=p<=n+1, represent downward floor operation, k=2 p-1;
If there is no, then be data cachedly stored in i-th memory buffer of the 1st grade of buffer memory by treating; Then judge step by step from the 1st grade of buffer memory to the n-th grade buffer memory, if of t level buffer memory with treat data cached described in all storing in * 2 memory buffer, then described treating data cachedly is dumped to of t+1 level buffer memory in individual memory buffer, and by of t level buffer memory with what store in individual memory buffer treats data cached removing, discharges of t level buffer memory simultaneously with data cached shared storage space is treated, wherein described in individual memory buffer 1<=t<=n, expression rounds up;
Send the successful acknowledge message of instruction buffer memory.
CN201310301037.3A 2013-07-18 2013-07-18 Data buffering method in multi-core processor Active CN103345451B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310301037.3A CN103345451B (en) 2013-07-18 2013-07-18 Data buffering method in multi-core processor

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310301037.3A CN103345451B (en) 2013-07-18 2013-07-18 Data buffering method in multi-core processor

Publications (2)

Publication Number Publication Date
CN103345451A CN103345451A (en) 2013-10-09
CN103345451B true CN103345451B (en) 2015-05-13

Family

ID=49280249

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310301037.3A Active CN103345451B (en) 2013-07-18 2013-07-18 Data buffering method in multi-core processor

Country Status (1)

Country Link
CN (1) CN103345451B (en)

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI710899B (en) * 2015-10-14 2020-11-21 南韓商三星電子股份有限公司 Computing system and operation method thereof
CN105740170B (en) * 2016-01-22 2020-12-04 浪潮(北京)电子信息产业有限公司 Cache dirty page flashing method and device
CN110764710B (en) * 2016-01-30 2023-08-11 北京忆恒创源科技股份有限公司 Low-delay high-IOPS data access method and storage system
CN108121597A (en) * 2016-11-29 2018-06-05 迈普通信技术股份有限公司 A kind of big data guiding device and method
CN107729057B (en) * 2017-06-28 2020-09-22 西安微电子技术研究所 Data block multi-buffer pipeline processing method under multi-core DSP
CN108132834B (en) * 2017-12-08 2020-08-18 西安交通大学 Task allocation method and system under multi-level shared cache architecture
CN110609807B (en) * 2018-06-15 2023-06-23 伊姆西Ip控股有限责任公司 Method, apparatus and computer readable storage medium for deleting snapshot data
CN109117291A (en) * 2018-08-27 2019-01-01 惠州Tcl移动通信有限公司 Data dispatch processing method, device and computer equipment based on multi-core processor
CN110096455B (en) * 2019-04-26 2021-09-14 海光信息技术股份有限公司 Exclusive initialization method of cache space and related device
CN111858046B (en) * 2020-07-13 2024-05-24 海尔优家智能科技(北京)有限公司 Service request processing method and device, storage medium and electronic device
CN114138685B (en) * 2021-12-06 2023-03-10 海光信息技术股份有限公司 Cache resource allocation method and device, electronic device and storage medium

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7222343B2 (en) * 2003-01-16 2007-05-22 International Business Machines Corporation Dynamic allocation of computer resources based on thread type
US7624236B2 (en) * 2004-12-27 2009-11-24 Intel Corporation Predictive early write-back of owned cache blocks in a shared memory computer system
US8719507B2 (en) * 2012-01-04 2014-05-06 International Business Machines Corporation Near neighbor data cache sharing

Also Published As

Publication number Publication date
CN103345451A (en) 2013-10-09

Similar Documents

Publication Publication Date Title
CN103345451B (en) Data buffering method in multi-core processor
CN100421088C (en) Digital data processing device and method for managing cache data
US7941591B2 (en) Flash DIMM in a standalone cache appliance system and methodology
Kim et al. An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches
US20150127691A1 (en) Efficient implementations for mapreduce systems
Aila et al. Architecture considerations for tracing incoherent rays
US9336146B2 (en) Accelerating cache state transfer on a directory-based multicore architecture
JP2021190125A (en) System and method for managing memory resource
CN104765575B (en) information storage processing method
CN104699631A (en) Storage device and fetching method for multilayered cooperation and sharing in GPDSP (General-Purpose Digital Signal Processor)
CN105183662A (en) Cache consistency protocol-free distributed sharing on-chip storage framework
US9529622B1 (en) Systems and methods for automatic generation of task-splitting code
CN103246625B (en) A kind of method of data and address sharing pin self-adaptative adjustment memory access granularity
CN104917784B (en) A kind of data migration method, device and computer system
CN106843772A (en) A kind of system and method based on uniformity bus extension nonvolatile memory
CN104011690A (en) Multi-level memory with direct access
CN103345368B (en) Data caching method in buffer storage
US8914571B2 (en) Scheduler for memory
WO2012125149A1 (en) Memory interface
WO2013113206A1 (en) Smart cache and smart terminal
CN105938458A (en) Software-defined heterogeneous hybrid memory management method
WO2024045585A1 (en) Method for dynamically sharing storage space in parallel processor, and corresponding processor
CN105718242A (en) Processing method and system for supporting software and hardware data consistency in multi-core DSP (Digital Signal Processing)
CN101853218B (en) Method and system for reading redundant array of inexpensive disks (RAID)
CN115114186A (en) Techniques for near data acceleration for multi-core architectures

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C53 Correction of patent for invention or patent application
CB03 Change of inventor or designer information

Inventor after: Mao Li

Inventor after: Rong Qiang

Inventor before: Mao Li

COR Change of bibliographic data

Free format text: CORRECT: INVENTOR; FROM: MAO LI TO: MAO LI RONG QIANG

C14 Grant of patent or utility model
GR01 Patent grant
C41 Transfer of patent application or patent right or utility model
TR01 Transfer of patent right

Effective date of registration: 20151223

Address after: 610000, No. 2, unit 1, 57 community street, prosperous town, Tianfu New District, Sichuan, Chengdu, 8

Patentee after: Sichuan thousands of lines you and I Technology Co., Ltd.

Address before: 610041 A, building, No. two, Science Park, high tech Zone, Sichuan, Chengdu, China 103B

Patentee before: Sichuan Jiucheng Information Technology Co., Ltd.

CP01 Change in the name or title of a patent holder
CP01 Change in the name or title of a patent holder

Address after: 610000, No. 2, unit 1, 57 community street, prosperous town, Tianfu New District, Sichuan, Chengdu, 8

Patentee after: Sichuan qianhang Technology Co., Ltd

Address before: 610000, No. 2, unit 1, 57 community street, prosperous town, Tianfu New District, Sichuan, Chengdu, 8

Patentee before: Sichuan thousands of lines you and I Technology Co., Ltd.