CN101561766A - Low-expense block synchronous method supporting multi-core assisting thread - Google Patents

Low-expense block synchronous method supporting multi-core assisting thread Download PDF

Info

Publication number
CN101561766A
CN101561766A CNA2009100856020A CN200910085602A CN101561766A CN 101561766 A CN101561766 A CN 101561766A CN A2009100856020 A CNA2009100856020 A CN A2009100856020A CN 200910085602 A CN200910085602 A CN 200910085602A CN 101561766 A CN101561766 A CN 101561766A
Authority
CN
China
Prior art keywords
thread
looking ahead
computational threads
assisting
changeed
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CNA2009100856020A
Other languages
Chinese (zh)
Other versions
CN101561766B (en
Inventor
古志民
郑宁汉
张轶
黄艳
唐洁
刘昌定
陈嘉
周伟峰
张博
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Institute of Technology BIT
Original Assignee
Beijing Institute of Technology BIT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Institute of Technology BIT filed Critical Beijing Institute of Technology BIT
Priority to CN2009100856020A priority Critical patent/CN101561766B/en
Publication of CN101561766A publication Critical patent/CN101561766A/en
Application granted granted Critical
Publication of CN101561766B publication Critical patent/CN101561766B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Multi Processors (AREA)

Abstract

The invention relates to a low-expense block synchronous method supporting a multi-core assisting thread, which belongs to the technical field of multi-core computers. The method reduces data missing in the execution of a calculation thread, reduces the pollution to a shared cache, improves the execution performance of the calculation thread and achieves heteronuclear synergic irregular data push by introducing a mechanism of pre-acting and low-expense block synchronization and cycle control for a prefetch assisting thread aiming at the problem of irregular data missing in multi-core application on the basis of a multi-core structure for the sharing cache. The method can be widely applied to optimizing a multi-core compiler and the database performance in the future.

Description

A kind of block synchronization method of supporting the low expense of multi-core assisting thread
Technical field
The present invention relates to a kind of block synchronization method of supporting the low expense of multi-core assisting thread, belong to the multi-core computer technical field.
Background technology
Chip multi-core processor (Chip Multi-Processor) technology be with a plurality of calculating inner core organic integration in processor chips, utilize multithreading, improve a technology of the executed in parallel performance of application program.According to the Amdahl law, the performance that program parallelization is carried out is determined by the performance that its serial is partly carried out, and the expense that the long delay memory access causes in the serial part seriously affects the performance of application program.
Usually, the chip multi-core processor framework has shared L2 cache (Level 2 Cache) or afterbody buffer memory (Last level Cache).But the regular data in the conventional hardware prefetching technique application programs (as regular array) is looked ahead, and deliver in the shared buffer memory in advance, when the current computational threads of application program has access to these regular data, often in shared buffer memory, desired data will be had access to, and memory access need not be carried out again.Yet, to the discontinuous non-regular data in address (as the non-rule visit of non-regulation linked, array), because the uncontinuity of visit data address, the conventional hardware prefetching technique just can't accurately get access to the address information of prefetch data, so it does not have the effect of looking ahead.In this case, the look ahead method of assisting thread is suggested.This method is that computational threads extracts its assisting thread of looking ahead, and by using idle nuclear, make the assisting thread of looking ahead dynamically remain on computational threads visit data before, so that these data in time were pushed to before being visited by computational threads among the shared buffer memory, to improve the serial code execution performance.
Compiler or programmer can be application program and generate the computational threads and the assisting thread of looking ahead, the assisting thread of looking ahead calculates operation in the nuclear at one, computational threads is calculated operation in the nuclear at another, and the needed non-regular data of computational threads can be pushed in the shared buffer memory by prefetched assisting thread.Usually, computational threads and look ahead and need between assisting thread to work in coordination with, with the assisting thread that prevents to look ahead carry out too fast, cause the data of looking ahead too early, needing to occur the time spent may be buffered the situation that algorithm is replaced of replacing; The assisting thread that also will prevent to look ahead is too slow, and computational threads was visited, makes the situation that prefetch data is out of use.Computational threads need be known the implementation status of its assisting thread of looking ahead, otherwise the assisting thread of looking ahead also needs to know the implementation status of its computational threads.In order to determine the other side's run location, in the computational threads and the assisting thread of looking ahead, need to add synchronous operation.Present conventional synchronization method, under the cost of not considering thread scheduling, for data access each time, the assisting thread of looking ahead all can carry out one subsynchronous with computational threads, this precise synchronization mode has been brought serious synchronization overhead, and might offset the assisting thread of looking ahead in the data disappearance that reduces non-rule with reduce performance benefits in the long delay memory access, make the execution performance of computational threads to be improved.
Summary of the invention
The objective of the invention is for overcoming the problems referred to above, look ahead and propose a kind of block synchronization method of supporting the low expense of multi-core assisting thread at non-regular data.Its basic thought is: on the multicore architecture basis of shared buffer memory, problem at non-regular data disappearance in the multinuclear application, by introduce the piece synchronization mechanism of lead and low expense for the assisting thread of looking ahead, the data that reach when reducing the computational threads execution lack, reduce look ahead assisting thread and the synchronous expense of computational threads, improve the purpose of computational threads execution performance.The present invention can be widely used in multinuclear Compiler Optimization and database performance optimization etc.
In order to explain term implication relevant in the related step of our method, at first provide the definition of these technical terms:
Definition 1: the current position of looking ahead
In a computational threads code, the address of the non-regular data that current needs are looked ahead is called the current position of looking ahead;
Definition 2: the amount of calculation of the current position of looking ahead
In a computational threads code, current look ahead position and next code execution time of looking ahead between the position are called the amount of calculation of the current position of looking ahead.Wherein, if this time is 0, belong to the situation that does not have amount of calculation; If this time is very little,, belong to the situation of few amount of calculation as less than tens clock period;
Definition 3: the calculating burst of computational threads
In a computational threads, the code zone that will contain a large amount of non-regular data disappearances is called the calculating burst of computational threads;
Definition 4: the data stream of shared buffer memory disappearance
Calculating burst to a computational threads, if it has caused the data disappearance of shared buffer memory continuously in a large number, use miss1, miss2, missN represents the address of missing data, claims that then the pairing data access sequence of address stream that forms from miss1 to missN is the data stream of shared buffer memory disappearance.
Definition 5: historical address information
In thread is looked ahead in computational threads or help, carry out redirect efficiently in order to make its associated pointers, need partly remember the address of those jump cursor, we just call historical address information to the address of these reservations.Specifically, to a chained list need to preserve (the individual pointer (fraction part needs carry to round) of chained list length/k), i.e. head pointer, k+1 pointer, 2k+1 pointer etc., the subscript of array also can be regarded the special case of this situation as in addition.Here, k is a positive integer, can try to achieve by the step 1 in the following specific implementation step.
The general frame design cycle of a kind of low expense block synchronization method of supporting multi-core assisting thread of the present invention as shown in Figure 1, the specific implementation step is as follows:
The look ahead lead of assisting thread of step 1, structure
On the basis of above-mentioned relational language definition, construct the lead of the assisting thread of looking ahead.Basic thought is: when looking ahead non-regular data, by existing historical address information, dynamically keep looking ahead the work of the looking ahead pointer of assisting thread always in advance in k position of work at present pointer of computational threads, like this, be when being in beginning or synchronous regime no matter in the computational threads and the assisting thread of looking ahead, the assisting thread of looking ahead can dynamically remain on computational threads visit data before, and these data can in time be pushed among the shared buffer memory before being visited by computational threads.Its main constitution step is as follows:
(1) step: in computational threads, the amount of calculation of the current position of looking ahead is estimated, if belong to the situation that does not have amount of calculation or few amount of calculation, changeed for (2) step, otherwise changeed for (4) step;
(2) step: assisting thread is led over the computational threads Δ t time in order to guarantee to look ahead, the current position of looking ahead in computational threads, find the position of Δ t code calculated amount in advance, and in the assisting thread code of looking ahead, adjust the current work pointer of looking ahead and (be adjusted into this subscript value for the subscript of array and add k in advance in k position of work at present pointer of computational threads; For chained list, the work pointer of looking ahead is adjusted into k pointer behind the current chain list index); Here Δ t and k have the following formula relation:
Δt=f(k)=k*MissPenalty+c0
Wherein: k represents a positive integer, can determine according to estimated value or the measured value of Δ t;
MissPenalty represents the expense of a long delay memory access;
C0 represents the constant value of a setting;
(3) step:, changeed for (4) step when the burst end of this data push of computational threads; When synchronous, changeed for (2) step;
(4) step: finish.
The piece synchronization mechanism of step 2, the low expense of selection
Construct in step 1 on the lead basis of the assisting thread of looking ahead, step 2 is selected a kind of piece synchronization mechanism of low expense.
The piece synchronization mechanism that should hang down expense is divided into two kinds of situations, can select one of them according to test case:
A. the piece synchronization mechanism of double counters
The basic thought of the piece synchronization mechanism of low expense is: the piece synchronization mechanism will cause the data access stream of shared buffer memory disappearance in the computational threads, be divided into some sequentially, the size of each piece is set occurrence according to the selection test effect situation of application example; And synchronous operation only occurs over just the border of piece, reduces synchronous cost to reduce synchronous precision; The lead and the dynamic maintenance thereof of assisting thread of looking ahead can be finished by step 2, the size of representing piece for each piece with pushsize, the computational threads and the assisting thread of looking ahead all have the counter of oneself, item number certificate in their every visit shared buffer memory missing data stream, will add 1 to its counter, when the numerical value of counter reaches pushsize, its cross-thread must carry out synchronously, if another thread does not reach same progress with it, this thread must get clogged and wait for up to another thread synchronous with it.The operation steps of piece synchronization mechanism is made up of the calculating Fragmentation step of computational threads and the operation steps of the assisting thread of looking ahead:
1. the calculating burst concrete operations step of computational threads is as follows:
(1) step: beginning;
(2) step: counter puts 0, and computational threads begins to cooperate with the assisting thread of looking ahead;
(3) step: read data, counter adds 1, carries out and calculates;
(4) step: finish if calculate burst, changeed for (6) step;
(5) step:, otherwise changeed for (3) step if Counter Value, changeed for (2) step greater than pushsize;
(6) step: finish.
2. the concrete operations step of assisting thread of looking ahead is as follows:
(1) step: beginning;
(2) step: counter puts 0, and computational threads begins to cooperate with the assisting thread of looking ahead;
(3) step: if synchronous beginning, the work at present pointer of adjusting behind the lead by the thread that helps to look ahead pushes prefetch data, otherwise pushes prefetch data by the new work at present pointer of thread that helps to look ahead, and counter adds 1;
(4) step: finish if calculate, changeed for (6) step;
(5) step:, otherwise changeed for (3) step if Counter Value, changeed for (2) step greater than pushsize;
(6) step: finish.
B. the piece synchronization mechanism of single counter
The basic thought of the piece synchronization mechanism of single counter is: computational threads can not blocked by synchronous, do not add extra synchronous operation yet, the assisting thread of only looking ahead has a counter, when the value of counter reaches pushsize, the assisting thread of looking ahead can be made as the pointer of propelling data and computational threads identical, and the lead of the assisting thread of looking ahead and dynamic maintenance thereof can be finished by step 1.The operation steps of the piece synchronization mechanism of single counter, form by the calculating Fragmentation step of computational threads and the operation steps of the assisting thread of looking ahead:
1. the calculating burst concrete operations step of computational threads is as follows:
(1) step: beginning;
(2) step: computational threads begins to cooperate with the assisting thread of looking ahead;
(3) step: read data, carry out and calculate;
(4) step: finish if calculate burst, changeed for (5) step, otherwise changeed for (3) step;
(5) step: finish.
2. the concrete operations step of assisting thread of looking ahead is as follows:
(1) step: beginning;
(2) step: computational threads begins to cooperate with the assisting thread of looking ahead;
(3) step: counter puts 0, obtains the work at present pointer of computational threads;
(4) step: if synchronous beginning, the work at present pointer of adjusting behind the lead by the thread that helps to look ahead pushes prefetch data, otherwise pushes prefetch data by the new work at present pointer of thread that helps to look ahead, and counter adds 1;
(5) step: finish if calculate, changeed for (7) step;
(6) step:, otherwise changeed for (4) step if Counter Value, changeed for (3) step greater than pushsize;
(7) step: finish.
The piece synchronization mechanism of single counter can overcome the special circumstances of the piece synchronization mechanism of double counters at the synchronous points obstruction, and it has ensured the sustainable service ability of computational threads.The piece synchronization mechanism of double counters and the piece synchronization mechanism of single counter can in concrete applied environment, application program after the test, use according to qualifications.
Beneficial effect:
1. the present invention adopts selectable synchronization mechanism of low expense, and it comprises that the piece piece synchronous and single counter of double counters is synchronous, has effectively reduced the synchronization overhead that traditional precise synchronization mechanism is caused.All need synchronous traditional precise synchronization mode with looking ahead at every turn, incomparable low expense characteristics are arranged: select suitable Pushsize value, but the buffer memory that the excessive pushsize of active balance causes pollutes, but the also synchronization overhead brought of the too small pushsize of active balance has improved the execution performance of computational threads effectively.The piece synchronization mechanism of single counter can overcome the special circumstances of the piece synchronization mechanism of double counters at the synchronous points obstruction, and it has ensured the sustainable service ability of computational threads.These two kinds of mechanism can in concrete applied environment and application program after the test, be used according to qualifications.
2. introduced lead for the assisting thread of looking ahead, made in the work at present position of computational threads, no matter had or not enough evaluation works, can both allow the piece synchronization mechanism of low expense be carried out effectively.The lead building method can dynamically keep looking ahead the work pointer of assisting thread always in advance in the current pointer K of a computational threads amount of calculation, and it is the realization basis of the piece synchronization mechanism of the synchronous and single counter of the piece of double counters of low expense.Under the situation of less calculated amount, it has computational threads and the assisting thread of looking ahead still can carry out the characteristics that crossover calculates.
Description of drawings
Fig. 1 is a general frame design flow diagram of the present invention;
Embodiment
According to technique scheme, the present invention is described in detail below in conjunction with embodiment.
With following simple program is example, adds the ADDSCALE variable in header file ldsHeader.h, controls the amount of calculation of each node in the chained list by the value that changes this variable, being calculated as of chained list node:
while(iterator){
temp=iterator->i_data;
while(i++<ADDSCALE){
temp+=1;
}
res+=temp;
i=0;
iterator=iterator->next;
}
Value by continuous change ADDSCALE is come the accommodometer operator workload, is 0 o'clock from ADDSCALE, and the value of each ADDSCALE increases by 5, like this we ADDSCALE is arranged is 0,5,10,15,20 etc.
Give an example in conjunction with above-mentioned, provide relational language and be defined as follows:
Definition 1: the current position of looking ahead
In a computational threads code, the address of the non-regular data that current needs are looked ahead is called the current position of looking ahead;
Calculate the current position of looking ahead by following code:
temp=iterator->i_data;
Definition 2: the amount of calculation of the current position of looking ahead
In a computational threads code, current look ahead position and next code execution time of looking ahead between the position are called the amount of calculation of the current position of looking ahead.Wherein, if this time is 0, belong to the situation that does not have amount of calculation; If this time is very little,, belong to the situation of few amount of calculation as less than tens clock period;
Calculate the amount of calculation of the current position of looking ahead by following code:
while(i++<ADDSCALE){
temp+=1;
}
res+=temp;
i=0;
Definition 3: the calculating burst of computational threads
In a computational threads, the code zone that will contain a large amount of non-regular data disappearances is called the calculating burst of computational threads;
Calculate the calculating burst of the computational threads of the current position of looking ahead by following code:
while(iterator){
temp=iterator->i_data;
while(i++<ADDSCALE){
temp+=1;
}
res+=temp;
i=0;
iterator=iterator->next;
}
Definition 4: the data stream of shared buffer memory disappearance
Calculating burst to a computational threads, if it has caused the data disappearance of shared buffer memory continuously in a large number, use miss1, miss2, missN represents the address of missing data, claims that then the pairing data access sequence of address stream that forms from miss1 to missN is the data stream of shared buffer memory disappearance.
Iterator in the said procedure, indication data such as iterator->next are the data stream of shared buffer memory disappearance.
Definition 5: historical address information
In thread is looked ahead in computational threads or help, carry out redirect efficiently in order to make its associated pointers, need partly remember the address of those jump cursor, we just call historical address information to the address of these reservations.To the iterator chained list in this example, if chained list length=10000, k=20 need to preserve 500 pointers, i.e. head pointer, the 21st pointer, the 41st pointer etc.
The look ahead lead of assisting thread of step 1, structure
(1) step: in computational threads, the workload of the current position of looking ahead is estimated that if ADDSCALE is 0 or 5 or 10, this belongs to the situation that does not almost completely have amount of calculation or less amount of calculation, changeed for (2) step; Otherwise,, changeed for (4) step if ADDSCALE is 15 or 20;
(2) step: assisting thread is led over the computational threads Δ t time in order to guarantee to look ahead, the current position of looking ahead in computational threads, find the position of Δ t code calculated amount in advance, and in the assisting thread code of looking ahead, adjust the current work pointer of looking ahead in advance in k position of work at present pointer of computational threads;
As Δ t=6000, the MissPenalty=300 clock period, during c0=0:
k=Δt/MissPenalty=6000/300=20
(3) step: when the burst end of this data push of computational threads, changeed for (4) step, when needs are synchronous, changeed for (2) step;
(4) step: finish.
Selectable synchronization mechanism of step 2, low expense
A. calculating burst P1 with one of main computational threads is example, and its assisting thread of looking ahead is push, and the piece synchronization mechanism process of constructing its double counters is:
1. the calculating burst P1 concrete operations step of main computational threads is as follows:
(1) step: beginning main;
(2) step: counter counter puts 0, sem_post (﹠amp; Main), sem_wait (﹠amp; Push), main computational threads and the push assisting thread of looking ahead begins to cooperate;
(3) step: read data, counter counter adds 1, carries out and calculates;
(4) step: finish if main calculates burst P1, changeed for (6) step;
(5) step:, otherwise changeed for (3) step if counter counter value, changeed for (2) step greater than pushsize;
(6) step: finish.
2. the look ahead concrete operations step of assisting thread of push is as follows:
(1) step: beginning push;
(2) step: counter push_counter puts 0, sem_post (﹠amp; Push), sem_wait (﹠amp; Main), the push thread begins to cooperate with the main thread;
(3) step: if synchronous beginning, push prefetch data by the push work at present pointer that thread adjusts behind the lead that helps to look ahead, otherwise push prefetch data by the new work at present pointer of thread push that helps to look ahead, counter push_counter adds 1;
(4) step: finish if push calculates, changeed for (6) step;
(5) step:, otherwise changeed for (3) step if counter push_counter value, changeed for (2) step greater than pushsize;
(6) step: finish.
B. calculating burst P2 with one of main computational threads is example, and its assisting thread of looking ahead is push.The piece synchronization mechanism process of constructing its single counter is:
1. the calculating burst P2 concrete operations step of main computational threads is as follows:
(1) step: beginning main;
(2) step: sem_post (﹠amp; Main), sem_wait (﹠amp; Push), main computational threads and the push assisting thread of looking ahead begins to cooperate;
(3) step: read data, carry out and calculate;
(4) step: finish if main calculates burst P2, changeed for (5) step, otherwise changeed for (3) step;
(5) step: finish.
2. the look ahead concrete operations step of assisting thread of push is as follows:
(1) step: beginning push;
(2) step: sem_post (﹠amp; Push), sem_wait (﹠amp; Main), the push thread begins to cooperate with the main thread;
(3) step: counter push_counter puts 0, obtains the work at present pointer of main thread;
(4) step: if synchronous beginning, push prefetch data by the push work at present pointer that thread adjusts behind the lead that helps to look ahead, otherwise push prefetch data by the new work at present pointer of thread push that helps to look ahead, counter push_counter adds 1;
(5) step: finish if push calculates, changeed for (7) step;
(6) step:, otherwise changeed for (4) step if counter push_counter value, changeed for (3) step greater than pushsize;
(7) step: finish.
The above-mentioned integration test result who gives an example is as follows:
ADDSCALE 0(k=20, 5(k=20, 10(k=20, 15(k=0, 20(k=0,
(scale variable) pushsize=600) pushsize=600 pushsize=600 pushsize=600 pushsize=600))))
There is not the assisting thread 120.175 121.627 135.234 153.057 171.058 of looking ahead
Execution time
The assisting thread 80.117 82.474 89.839 118.076 117.697 of looking ahead
Execution time
(the inventive method)
Can see use the inventive method by test result, the program implementation time obviously shortens.

Claims (4)

1. block synchronization method of supporting the low expense of multi-core assisting thread, it is characterized in that its basic thought is on the multicore architecture basis of shared buffer memory, problem at non-regular data disappearance in the multinuclear application, by introduce the piece synchronization mechanism of lead and low expense for the assisting thread of looking ahead, the data that reach when reducing the computational threads execution lack, reduce look ahead assisting thread and the synchronous expense of computational threads, improve the purpose of computational threads execution performance; The specific implementation step is as follows:
The look ahead lead of assisting thread of step 1, structure
When looking ahead non-regular data, by historical address information, dynamically keep looking ahead the work of the looking ahead pointer of assisting thread always in advance in k position of work at present pointer of computational threads, like this, be when being in beginning or synchronous regime no matter in the computational threads and the assisting thread of looking ahead, the assisting thread of looking ahead can dynamically remain on computational threads visit data before, and these data can in time be pushed to before being visited by computational threads among the shared buffer memory;
The piece synchronization mechanism of step 2, the low expense of selection
Construct in step 1 on the lead basis of the assisting thread of looking ahead, step 2 is selected a kind of piece synchronization mechanism of low expense;
The piece synchronization mechanism that should hang down expense is divided into two kinds of situations, can select one of them according to qualifications according to test case:
A. the piece synchronization mechanism of double counters
The piece synchronization mechanism is divided into some sequentially with causing the data access stream of shared buffer memory disappearance in the computational threads, and the size of each piece is set occurrence according to the selection test effect situation of application example; And synchronous operation only occurs over just the border of piece, reduces synchronous cost to reduce synchronous precision; The lead and the dynamic maintenance thereof of assisting thread of looking ahead can be finished by step 1, the size of representing piece for each piece with pushsize, the computational threads and the assisting thread of looking ahead all have the counter of oneself, item number certificate in their every visit shared buffer memory missing data stream, will add 1 to its counter, when the numerical value of counter reaches pushsize, its cross-thread must carry out synchronously, if another thread does not reach same progress with it, this thread must get clogged and wait for up to another thread synchronous with it;
B. the piece synchronization mechanism of single counter
Computational threads can not blocked by synchronous, do not add extra synchronous operation yet, the assisting thread of only looking ahead has a counter, when the value of counter reaches pushsize, the assisting thread of looking ahead can be made as the pointer of propelling data and computational threads identical, and the lead of the assisting thread of looking ahead and dynamic maintenance thereof can be finished by step 1.
2. a kind of block synchronization method of supporting the low expense of multi-core assisting thread according to claim 1 is characterized in that the lead step of constructing the assisting thread of looking ahead in the step 1 is:
(1) step: in computational threads, the amount of calculation of the current position of looking ahead is estimated, if belong to the situation that does not have amount of calculation or few amount of calculation, changeed for (2) step, otherwise changeed for (4) step;
(2) step: assisting thread is led over the computational threads Δ t time in order to guarantee to look ahead, the current position of looking ahead in computational threads, find the position of Δ t code calculated amount in advance, and in the assisting thread code of looking ahead, adjust the current work pointer of looking ahead and (be adjusted into this subscript value for the subscript of array and add k in advance in k position of work at present pointer of computational threads; For chained list, the work pointer of looking ahead is adjusted into k pointer behind the current chain list index); Here Δ t and k have the following formula relation:
Δt=f(k)=k*MissPenalty+c0
Wherein: k represents a positive integer, can determine according to estimated value or the measured value of Δ t;
MissPenalty represents the expense of a long delay memory access;
C0 represents the constant value of a setting;
(3) step:, changeed for (4) step when the burst end of this data push of computational threads; When synchronous, changeed for (2) step;
(4) step: finish.
3. a kind of block synchronization method of supporting the low expense of multi-core assisting thread according to claim 1, it is characterized in that in the piece synchronization mechanism of the low expense of step 2 selection that the piece synchronization mechanism of double counters is made up of the calculating Fragmentation step of computational threads and the operation steps of the assisting thread of looking ahead:
1. the calculating burst concrete operations step of computational threads is as follows:
(1) step: beginning;
(2) step: counter puts 0, and computational threads begins to cooperate with the assisting thread of looking ahead;
(3) step: read data, counter adds 1, carries out and calculates;
(4) step: finish if calculate burst, changeed for (6) step;
(5) step:, otherwise changeed for (3) step if Counter Value, changeed for (2) step greater than pushsize;
(6) step: finish;
2. the concrete operations step of assisting thread of looking ahead is as follows:
(1) step: beginning;
(2) step: counter puts 0, and computational threads begins to cooperate with the assisting thread of looking ahead;
(3) step: if synchronous beginning, the work at present pointer of adjusting behind the lead by the thread that helps to look ahead pushes prefetch data, otherwise pushes prefetch data by the new work at present pointer of thread that helps to look ahead, and counter adds 1;
(4) step: finish if calculate, changeed for (6) step;
(5) step:, otherwise changeed for (3) step if Counter Value, changeed for (2) step greater than pushsize;
(6) step: finish.
4. a kind of block synchronization method of supporting the low expense of multi-core assisting thread according to claim 1, it is characterized in that in the piece synchronization mechanism of the low expense of step 2 selection that the piece synchronization mechanism of single counter is made up of the calculating Fragmentation step of computational threads and the operation steps of the assisting thread of looking ahead:
1. the calculating burst concrete operations step of computational threads is as follows:
(1) step: beginning;
(2) step: computational threads begins to cooperate with the assisting thread of looking ahead;
(3) step: read data, carry out and calculate;
(4) step: finish if calculate burst, changeed for (5) step, otherwise changeed for (3) step;
(5) step: finish;
2. the concrete operations step of assisting thread of looking ahead is as follows:
(1) step: beginning;
(2) step: computational threads begins to cooperate with the assisting thread of looking ahead;
(3) step: counter puts 0, obtains the work at present pointer of computational threads;
(4) step: if synchronous beginning, the work at present pointer of adjusting behind the lead by the thread that helps to look ahead pushes prefetch data, otherwise pushes prefetch data by the new work at present pointer of thread that helps to look ahead, and counter adds 1;
(5) step: finish if calculate, changeed for (7) step;
(6) step:, otherwise changeed for (4) step if Counter Value, changeed for (3) step greater than pushsize;
(7) step: finish.
CN2009100856020A 2009-05-26 2009-05-26 Low-expense block synchronous method supporting multi-core assisting thread Expired - Fee Related CN101561766B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2009100856020A CN101561766B (en) 2009-05-26 2009-05-26 Low-expense block synchronous method supporting multi-core assisting thread

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2009100856020A CN101561766B (en) 2009-05-26 2009-05-26 Low-expense block synchronous method supporting multi-core assisting thread

Publications (2)

Publication Number Publication Date
CN101561766A true CN101561766A (en) 2009-10-21
CN101561766B CN101561766B (en) 2011-06-15

Family

ID=41220578

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2009100856020A Expired - Fee Related CN101561766B (en) 2009-05-26 2009-05-26 Low-expense block synchronous method supporting multi-core assisting thread

Country Status (1)

Country Link
CN (1) CN101561766B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101807144A (en) * 2010-03-17 2010-08-18 上海大学 Prospective multi-threaded parallel execution optimization method
CN102334104A (en) * 2011-08-15 2012-01-25 华为技术有限公司 Synchronous processing method and device based on multicore system
CN104981787A (en) * 2013-03-05 2015-10-14 国际商业机器公司 Data prefetch for chip having parent core and scout core
CN105893319A (en) * 2014-12-12 2016-08-24 上海芯豪微电子有限公司 Multi-lane/multi-core system and method
CN106776047A (en) * 2017-01-19 2017-05-31 郑州轻工业学院 Towards the group-wise thread forecasting method of irregular data-intensive application
CN108874690A (en) * 2017-05-16 2018-11-23 龙芯中科技术有限公司 The implementation method and processor of data pre-fetching
CN114817087A (en) * 2022-05-12 2022-07-29 郑州轻工业大学 Prefetch distance self-adaptive adjusting method and device based on cache invalidation behavior

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101807144B (en) * 2010-03-17 2014-05-14 上海大学 Prospective multi-threaded parallel execution optimization method
CN101807144A (en) * 2010-03-17 2010-08-18 上海大学 Prospective multi-threaded parallel execution optimization method
CN102334104A (en) * 2011-08-15 2012-01-25 华为技术有限公司 Synchronous processing method and device based on multicore system
CN102334104B (en) * 2011-08-15 2013-09-11 华为技术有限公司 Synchronous processing method and device based on multicore system
US9424101B2 (en) 2011-08-15 2016-08-23 Huawei Technologies Co., Ltd. Method and apparatus for synchronous processing based on multi-core system
CN104981787B (en) * 2013-03-05 2017-11-17 国际商业机器公司 Data pre-fetching with parent nucleus and the chip for scouting core
CN104981787A (en) * 2013-03-05 2015-10-14 国际商业机器公司 Data prefetch for chip having parent core and scout core
CN105893319A (en) * 2014-12-12 2016-08-24 上海芯豪微电子有限公司 Multi-lane/multi-core system and method
CN106776047A (en) * 2017-01-19 2017-05-31 郑州轻工业学院 Towards the group-wise thread forecasting method of irregular data-intensive application
CN106776047B (en) * 2017-01-19 2019-08-02 郑州轻工业学院 Group-wise thread forecasting method towards irregular data-intensive application
CN108874690A (en) * 2017-05-16 2018-11-23 龙芯中科技术有限公司 The implementation method and processor of data pre-fetching
CN114817087A (en) * 2022-05-12 2022-07-29 郑州轻工业大学 Prefetch distance self-adaptive adjusting method and device based on cache invalidation behavior
CN114817087B (en) * 2022-05-12 2022-11-11 郑州轻工业大学 Prefetch distance self-adaptive adjustment method and device based on cache invalidation behavior

Also Published As

Publication number Publication date
CN101561766B (en) 2011-06-15

Similar Documents

Publication Publication Date Title
CN101561766B (en) Low-expense block synchronous method supporting multi-core assisting thread
US9477533B2 (en) Progress meters in parallel computing
Ebrahimi et al. Parallel application memory scheduling
Lucas et al. How a single chip causes massive power bills GPUSimPow: A GPGPU power simulator
Johnson et al. Decoupling contention management from scheduling
US9619290B2 (en) Hardware and runtime coordinated load balancing for parallel applications
Wu et al. Using performance-power modeling to improve energy efficiency of hpc applications
CN105159654B (en) Integrity measurement hashing algorithm optimization method based on multi-threaded parallel
Chen et al. Hybrid analytical modeling of pending cache hits, data prefetching, and MSHRs
CN104657219A (en) Application program thread count dynamic regulating method used under isomerous many-core system
Lee et al. Prefetching with helper threads for loosely coupled multiprocessor systems
CN101593132A (en) Multi-core parallel simulated annealing method based on thread constructing module
CN102662638A (en) Threshold boundary selecting method for supporting helper thread pre-fetching distance parameters
Xie et al. Adaptive preshuffling in Hadoop clusters
Breß et al. Self-Tuning Distribution of DB-Operations on Hybrid CPU/GPU Platforms.
Wang et al. Three-level performance optimization for heterogeneous systems based on software prefetching under power constraints
Wang et al. Energy optimization by software prefetching for task granularity in GPU-based embedded systems
Halimi et al. Forest-mn: Runtime DVFS beyond communication slack
CN105930209B (en) A kind of adaptive assisting thread prefetches method of quality control
CN106776047B (en) Group-wise thread forecasting method towards irregular data-intensive application
Uddin et al. Signature-based high-level simulation of microthreaded many-core architectures
Fu et al. A parallel CNC system architecture based on symmetric multi-processor
Chen et al. Weak execution ordering-exploiting iterative methods on many-core gpus
Moeng et al. Weighted-tuple synchronization for parallel architecture simulators
Wang et al. Design and Analysis of a Minimum Time Buckets Synchronization Algorithm for Parallel and Distributed Simulation

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
C17 Cessation of patent right
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20110615

Termination date: 20120526