CN104932947A - Barrier synchronization method and device - Google Patents

Barrier synchronization method and device Download PDF

Info

Publication number
CN104932947A
CN104932947A CN201410098952.1A CN201410098952A CN104932947A CN 104932947 A CN104932947 A CN 104932947A CN 201410098952 A CN201410098952 A CN 201410098952A CN 104932947 A CN104932947 A CN 104932947A
Authority
CN
China
Prior art keywords
fence
synchronous
processor core
threaded program
queue
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201410098952.1A
Other languages
Chinese (zh)
Other versions
CN104932947B (en
Inventor
徐卫志
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Institute of Computing Technology of CAS
Original Assignee
Huawei Technologies Co Ltd
Institute of Computing Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd, Institute of Computing Technology of CAS filed Critical Huawei Technologies Co Ltd
Priority to CN201410098952.1A priority Critical patent/CN104932947B/en
Publication of CN104932947A publication Critical patent/CN104932947A/en
Application granted granted Critical
Publication of CN104932947B publication Critical patent/CN104932947B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The invention discloses a barrier synchronization method and device, and relates to the field of communications, and solves the problem that under the condition that the thread number is increased, the chip processing performance of a multi-core or many-core processor is reduced due to access bottlenecks. The concrete scheme of the barrier synchronization method is that a first processor core determines that a present processed thread program is executed to a predetermined barrier synchronization point; the first processor core is any one of the processor cores included by the chip; a target barrier synchronization device is determined according to the barrier identification corresponding to the predetermined barrier synchronization point; a barrier synchronization message is sent to the target barrier synchronization device; and the barrier synchronization message comprises the barrier identification and the number of the thread programs which participate in the synchronization process. The barrier synchronization method and device are used for the barrier synchronization process.

Description

A kind of fence synchronous method and equipment
Technical field
The present invention relates to the communications field, particularly relate to a kind of fence synchronous method and equipment.
Background technology
The dominant frequency of traditional single core processor usually by adopting superscale and stream treatment technology to improve processor, to reach the object improving processor performance, but the raising of dominant frequency can cause the power consumption of processor to increase, and the heat radiation of processor can be caused bad.And, along with the development of semiconductor technology, on chip, the transistor size of accessible site increases gradually, architecture Design person in order to reduce the power consumption of processor while improving processor performance, and make processor have good heat radiation, propose the multinuclear or many-core processor that adopt thread-level coarse grain parallelism technology.
What adopt due to multinuclear or many-core processor is that multithreading carries out data processing, therefore the correct propagation of data between multiple multi-threaded program and multi-threaded program perform semantic correctness to need to adopt fence synchronously to guarantee, it can thus be appreciated that fence is synchronously very important for multinuclear or many-core processor.In the prior art, by arranging a synchronizing management device on chip, to realize fence synchronous, its concrete implementation procedure is: in the chip with multinuclear or many-core processor, when the multi-threaded program of certain processor core process performs predetermined synchronous point, this processor core sends for notifying the multi-threaded program executed that self the processes notification message to predetermined synchronous point to synchronizing management device, so that whether all synchronizing management device statistics to participate in synchronous multi-threaded program the equal executed of multi-threaded program to predetermined synchronous point, and when the equal executed of all multi-threaded program is to predetermined synchronous point, the processor core that each multi-threaded program in the multi-threaded program that all participations are synchronous is corresponding sends the instruction continuing to perform, so that all processor cores continue processing threads program.
In prior art, at least there are the following problems: owing to being only provided with a synchronizing management device on chip, therefore in the chip with multinuclear or many-core processor, when the multi-threaded program of processor core process performs predetermined synchronous point, all need to send in this same synchronizing management device for notifying that the multi-threaded program executed self processed reaches the notification message of predetermined synchronous point, like this, when number of threads increases, serious access bottleneck can be produced, multiple multi-threaded program is caused to work in coordination with slowing of execution, thus cause the handling property of the chip with multinuclear or many-core processor to decline.
Summary of the invention
The invention provides a kind of fence synchronous method and equipment, solve when number of threads increases, due to the problem with the chip handling property decline of multinuclear or many-core processor that access bottleneck causes.
For achieving the above object, the present invention adopts following technical scheme:
A first aspect of the present invention, provides a kind of fence synchronous method, is applied in the chip with multinuclear or many-core processor, and described chip is provided with at least two fence synchronous devices, described method comprises:
First processor core is determined when the multi-threaded program of pre-treatment performs predetermined fence synchronous point; Described first processor core is any one in all processor cores of comprising of described chip;
The fence mark corresponding according to described predetermined fence synchronous point determines target fence synchronous device;
Fence synchronization message is sent to described target fence synchronous device; Comprise described fence mark in described fence synchronization message and participate in the number of synchronous multi-threaded program.
In conjunction with first aspect, in a kind of possible implementation, the described fence corresponding according to described predetermined fence synchronous point mark determines target fence synchronous device, comprising:
The fence mark corresponding according to described predetermined fence synchronous point, determines described target fence synchronous device according to preset rules; Described preset rules comprises the mapping relations of fence mark and fence synchronous device.
In conjunction with first aspect and above-mentioned possible implementation, in the implementation that another kind is possible, described after described target fence synchronous device transmission fence synchronization message, also comprise:
Suspend the process to the described multi-threaded program when pre-treatment, enter waiting status.
In conjunction with first aspect and above-mentioned possible implementation, in the implementation that another kind is possible, in the process of described time-out to the described multi-threaded program when pre-treatment, after entering waiting status, also comprise:
Receive the acknowledge message that described target fence synchronous device sends; Described acknowledge message is for notifying that described first processor core continues the described multi-threaded program when pre-treatment of process;
Continue the described multi-threaded program when pre-treatment of process.
A second aspect of the present invention, provides a kind of fence synchronous method, is applied in the chip with multinuclear or many-core processor, and described chip is provided with at least two fence synchronous devices, described method comprises:
Target fence synchronous device receives the fence synchronization message that first processor core sends; Described fence synchronization message is that described first processor core is being determined to send when the multi-threaded program of pre-treatment performs predetermined fence synchronous point, described first processor core is any one in all processor cores of comprising of described chip, comprises fence mark corresponding to described predetermined fence synchronous point and participate in the number of synchronous multi-threaded program in described fence synchronization message; Described target fence synchronous device is the fence synchronous device of the fence synchronization message for the treatment of processor core transmission corresponding to described predetermined fence synchronous point;
The count value that the fence corresponding according to described predetermined fence synchronous point identifies the count area the first queue comprised adds 1; Described first team is classified as and identifies the corresponding queue for identifying the synchronous multi-threaded program state of all participations with described fence; Described first queue comprises described fence mark, quene state, described count area.
In conjunction with second aspect, in a kind of possible implementation, add before 1 in the described count value identifying the count area the first queue comprised according to described fence, also comprise:
Judge whether to there is described first queue;
When there is not described first queue, creating described first queue, and described quene state is updated to using state.
In conjunction with second aspect and above-mentioned possible implementation, in the implementation that another kind is possible, described first queue also comprises the identification information of executed to processor core corresponding to the multi-threaded program of described predetermined fence synchronous point;
After the fence synchronization message that described reception first processor core sends, also comprise:
The identification information of described first processor core is added in described first queue.
In conjunction with second aspect and above-mentioned possible implementation, in the implementation that another kind is possible, before the described identification information by described first processor core is added in described first queue, also comprise:
Judge whether described executed is less than predetermined threshold value to the number of the identification information of processor core corresponding to the multi-threaded program of described predetermined fence synchronous point; Described predetermined threshold value is less than or equal to the maximum thread order that described chip is supported;
When determining that the number of described executed to the identification information of processor core corresponding to the multi-threaded program of described predetermined fence synchronous point is less than described predetermined threshold value, performing the described identification information by described first processor core and being added in described first queue.
In conjunction with second aspect and above-mentioned possible implementation, in the implementation that another kind is possible, also comprise:
When determining that the number of described executed to the identification information of processor core corresponding to the multi-threaded program of described predetermined fence synchronous point is not less than described predetermined threshold value, the identification information of described first processor core is saved in internal memory.
In conjunction with second aspect and above-mentioned possible implementation, in the implementation that another kind is possible, described first queue also comprises the bit sequence whether performing described predetermined fence synchronous point for each multi-threaded program identified in the synchronous multi-threaded program of all participations, and each bit in described bit sequence and the identification information of processor core exist mapping relations;
After the fence synchronization message that described reception first processor core sends, also comprise:
Be the second mark by the bit corresponding with the identification information of described first processor core by the first identification renewal; Described first mark does not perform described predetermined fence synchronous point for identifying by the multi-threaded program of processor core process, and described second mark is for identifying by the multi-threaded program executed of processor core process to described predetermined fence synchronous point.
In conjunction with second aspect and above-mentioned possible implementation, in the implementation that another kind is possible, add after 1 in the described count value identifying the count area the first queue comprised according to described fence, also comprise:
Judge whether the count value of described count area equals the number of the synchronous multi-threaded program of described participation;
When the count value of described count area equals the number of the synchronous multi-threaded program of described participation, obtain the identification information of processor core corresponding to multi-threaded program that in the synchronous multi-threaded program of all participations, each participation is synchronous;
According to the identification information of processor core corresponding to the multi-threaded program that each participation in the multi-threaded program that described all participations are synchronous is synchronous, the processor core that the multi-threaded program that each participation is synchronous in the multi-threaded program that all participations are synchronous is corresponding sends acknowledge message; Described acknowledge message is for notifying that described processor core continues to process the multi-threaded program needing self to process.
In conjunction with second aspect and above-mentioned possible implementation, in the implementation that another kind is possible, the identification information of the processor core that the multi-threaded program that in the multi-threaded program that all participations of described acquisition are synchronous, each participation is synchronous is corresponding, comprising:
The identification information of processor core corresponding to multi-threaded program that in the synchronous multi-threaded program of described all participations, each participation is synchronous is obtained from described first queue.
In conjunction with second aspect and above-mentioned possible implementation, in the implementation that another kind is possible, the identification information of the processor core that the multi-threaded program that in the multi-threaded program that all participations of described acquisition are synchronous, each participation is synchronous is corresponding, comprising:
The identification information of processor core corresponding to multi-threaded program that in the synchronous multi-threaded program of described all participations, each participation is synchronous is obtained from described first queue and described internal memory.
A third aspect of the present invention, provides a kind of first processor core, is applied in the chip with multinuclear or many-core processor, described chip is provided with at least two fence synchronous devices, and described first processor core, comprising:
First determining unit, for determining that the multi-threaded program when pre-treatment performs predetermined fence synchronous point; Described first processor core is any one in all processor cores of comprising of described chip;
Second determining unit, determines target fence synchronous device for the fence mark corresponding according to described predetermined fence synchronous point;
Transmitting element, sends fence synchronization message for the described target fence synchronous device obtained to described second determining unit; Comprise described fence mark in described fence synchronization message and participate in the number of synchronous multi-threaded program.
In conjunction with the third aspect, in a kind of possible implementation, described second determining unit, specifically for:
The fence mark corresponding according to described predetermined fence synchronous point, determines described target fence synchronous device according to preset rules; Described preset rules comprises the mapping relations of fence mark and fence synchronous device.
In conjunction with the third aspect and above-mentioned possible implementation, in the implementation that another kind is possible, also comprise:
First processing unit, at described transmitting element to after described target fence synchronous device sends fence synchronization message, suspend the process to the described multi-threaded program when pre-treatment, enter waiting status.
In conjunction with the third aspect and above-mentioned possible implementation, in the implementation that another kind is possible, also comprise:
Receiving element, for suspending the process to the described multi-threaded program when pre-treatment at described first processing unit, after entering waiting status, receives the acknowledge message that described target fence synchronous device sends; Described acknowledge message is for notifying that described first processor core continues the described multi-threaded program when pre-treatment of process;
Second processing unit, processes the described multi-threaded program when pre-treatment for continuing.
A fourth aspect of the present invention, provides a kind of target fence synchronous device, is applied in the chip with multinuclear or many-core processor, described chip is provided with at least two fence synchronous devices, and described target fence synchronous device, comprising:
Receiving element, for receiving the fence synchronization message that first processor core sends; Described fence synchronization message is that described first processor core is being determined to send when the multi-threaded program of pre-treatment performs predetermined fence synchronous point, described first processor core is any one in all processor cores of comprising of described chip, comprises fence mark corresponding to described predetermined fence synchronous point and participate in the number of synchronous multi-threaded program in described fence synchronization message; Described target fence synchronous device is the fence synchronous device of the fence synchronization message for the treatment of processor core transmission corresponding to described predetermined fence synchronous point;
Processing unit, the count value identifying the count area the first queue comprised for the fence corresponding according to the described predetermined fence synchronous point comprised in the described fence synchronization message obtained of described receiving element adds 1; Described first team is classified as and identifies the corresponding queue for identifying the synchronous multi-threaded program state of all participations with described fence; Described first queue comprises described fence mark, quene state, described count area.
In conjunction with fourth aspect, in a kind of possible implementation, also comprise:
Judging unit, adds before 1 for the count value identifying the count area the first queue comprised at described processing unit according to described fence, judges whether to there is described first queue;
Create updating block, for when described judging unit is not existed described first queue, create described first queue, and described quene state is updated to using state.
In conjunction with fourth aspect and above-mentioned possible implementation, in the implementation that another kind is possible, described first queue also comprises the identification information of executed to processor core corresponding to the multi-threaded program of described predetermined fence synchronous point;
Described target fence synchronous device, also comprises:
Adding device, after receiving the fence synchronization message of first processor core transmission at described receiving element, is added into the identification information of described first processor core in described first queue.
In conjunction with fourth aspect and above-mentioned possible implementation, in the implementation that another kind is possible,
Described judging unit, also for before being added in described first queue at described adding device by the identification information of described first processor core, judge whether described executed is less than predetermined threshold value to the number of the identification information of processor core corresponding to the multi-threaded program of described predetermined fence synchronous point; Described predetermined threshold value is less than or equal to the maximum thread order that described chip is supported;
Described adding device, during specifically for determining that when described judging unit described executed is less than described predetermined threshold value to the number of the identification information of processor core corresponding to the multi-threaded program of described predetermined fence synchronous point, the identification information of described first processor core is added in described first queue.
In conjunction with fourth aspect and above-mentioned possible implementation, in the implementation that another kind is possible, also comprise:
Storage unit, during for determining that when described judging unit described executed is not less than described predetermined threshold value to the number of the identification information of processor core corresponding to the multi-threaded program of described predetermined fence synchronous point, the identification information of described first processor core is saved in internal memory.
In conjunction with fourth aspect and above-mentioned possible implementation, in the implementation that another kind is possible, described first queue also comprises the bit sequence whether performing described predetermined fence synchronous point for each multi-threaded program identified in the synchronous multi-threaded program of all participations, and each bit in described bit sequence and the identification information of processor core exist mapping relations;
Described target fence synchronous device, also comprises:
The bit corresponding with the identification information of described first processor core, for receive fence synchronization message that first processor core sends at described receiving element after, is the second mark by the first identification renewal by updating block; Described first mark does not perform described predetermined fence synchronous point for identifying by the multi-threaded program of processor core process, and described second mark is for identifying by the multi-threaded program executed of processor core process to described predetermined fence synchronous point.
In conjunction with fourth aspect and above-mentioned possible implementation, in the implementation that another kind is possible,
Described judging unit, the count value also for identify the count area the first queue comprised according to described fence at described processing unit adds after 1, judges whether the count value of described count area equals the number of the synchronous multi-threaded program of described participation;
Described target fence synchronous device, also comprises:
Acquiring unit, when count value for obtaining described count area when described judging unit equals the number of described participation synchronous multi-threaded program, obtain the identification information of processor core corresponding to multi-threaded program that in the synchronous multi-threaded program of all participations, each participation is synchronous;
Transmitting element, for the identification information of processor core corresponding to the multi-threaded program that each participation in the multi-threaded program that the described all participations obtained according to described acquiring unit are synchronous is synchronous, the processor core that the multi-threaded program that each participation is synchronous in the multi-threaded program that all participations are synchronous is corresponding sends acknowledge message; Described acknowledge message is for notifying that described processor core continues to process the multi-threaded program needing self to process.
In conjunction with fourth aspect and above-mentioned possible implementation, in the implementation that another kind is possible, described acquiring unit, specifically for:
The identification information of processor core corresponding to multi-threaded program that in the synchronous multi-threaded program of described all participations, each participation is synchronous is obtained from described first queue.
In conjunction with fourth aspect and above-mentioned possible implementation, in the implementation that another kind is possible, described acquiring unit, specifically for:
The identification information of processor core corresponding to multi-threaded program that in the synchronous multi-threaded program of described all participations, each participation is synchronous is obtained from described first queue and described internal memory.
Fence synchronous method provided by the invention and equipment, when first processor core is determined to perform predetermined fence synchronous point when the multi-threaded program of pre-treatment, the fence mark corresponding according to predetermined fence synchronous point determines target fence synchronous device, then fence synchronization message is sent to this target fence synchronous device, by determining according to predetermined fence synchronous point the target fence synchronous device processing self fence synchronization message, make different fence synchronous points can be mapped to different fence synchronous devices, thus when number of threads increases, avoid occurring access bottleneck, improve the chip handling property with multinuclear or many-core processor.
Accompanying drawing explanation
In order to be illustrated more clearly in the embodiment of the present invention or technical scheme of the prior art, be briefly described to the accompanying drawing used required in embodiment or description of the prior art below, apparently, accompanying drawing in the following describes is only some embodiments of the present invention, for those of ordinary skill in the art, under the prerequisite not paying creative work, other accompanying drawing can also be obtained according to these accompanying drawings.
A kind of fence synchronous method process flow diagram that Fig. 1 provides for the embodiment of the present invention 1;
A kind of fence synchronous method process flow diagram that Fig. 2 provides for the embodiment of the present invention 2;
A kind of structural representation with the chip of multinuclear or many-core processor being provided with 4 fence synchronous devices that Fig. 3 provides for the embodiment of the present invention 3;
A kind of hurdle synchronous method process flow diagram that Fig. 4 provides for the embodiment of the present invention 3;
The composition schematic diagram of a kind of first processor core that Fig. 5 provides for the embodiment of the present invention 4;
The composition schematic diagram of the another kind of first processor core that Fig. 6 provides for the embodiment of the present invention 4;
The composition schematic diagram of a kind of target fence synchronous device that Fig. 7 provides for the embodiment of the present invention 5;
The composition schematic diagram of the another kind of target fence synchronous device that Fig. 8 provides for the embodiment of the present invention 5;
The composition schematic diagram of a kind of fence synchronizer that Fig. 9 provides for the embodiment of the present invention 6;
The composition schematic diagram of a kind of fence synchronizer that Figure 10 provides for the embodiment of the present invention 7.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, be clearly and completely described the technical scheme in the embodiment of the present invention, obviously, described embodiment is only the present invention's part embodiment, instead of whole embodiments.Based on the embodiment in the present invention, those of ordinary skill in the art, not making the every other embodiment obtained under creative work prerequisite, belong to the scope of protection of the invention.
In addition, term " system " and " network " are often used interchangeably in this article herein.Term "and/or" herein, being only a kind of incidence relation describing affiliated partner, can there are three kinds of relations in expression, and such as, A and/or B, can represent: individualism A, exists A and B simultaneously, these three kinds of situations of individualism B.In addition, character "/" herein, general expression forward-backward correlation is to the relation liking a kind of "or".
Embodiment 1
The embodiment of the present invention 1 provides a kind of fence synchronous method, is applied in the chip with multinuclear or many-core processor, this chip is provided with at least two fence synchronous devices, and as shown in Figure 1, the method can comprise:
101, first processor core is determined when the multi-threaded program of pre-treatment performs predetermined fence synchronous point.Wherein, first processor core is any one in all processor cores of comprising of chip.
102, the fence mark that first processor core is corresponding according to predetermined fence synchronous point determines target fence synchronous device.
Wherein, the fence synchronous method of the embodiment of the present invention, be applied in the chip with multinuclear or many-core processor, and this chip is provided with at least two fence synchronous devices, when any one processor core in all processor cores, namely, when first processor core determines that the multi-threaded program that it works as pre-treatment performs predetermined fence synchronous point, the fence that first processor core is first corresponding according to fence synchronous point identifies at least two the fence synchronous devices comprised from chip determines target fence synchronous device.
103, first processor core sends fence synchronization message to target fence synchronous device.
Wherein, comprise fence mark in fence synchronization message and participate in the number of synchronous multi-threaded program.First processor core sends for fence synchronization message to the target fence synchronous device determined.
The fence synchronous method that the embodiment of the present invention provides, when first processor core is determined to perform predetermined fence synchronous point when the multi-threaded program of pre-treatment, the fence mark corresponding according to predetermined fence synchronous point determines target fence synchronous device, then fence synchronization message is sent to this target fence synchronous device, by determining according to predetermined fence synchronous point the target fence synchronous device processing self fence synchronization message, make different fence synchronous points can be mapped to different fence synchronous devices, thus when number of threads increases, avoid occurring access bottleneck, improve the chip handling property with multinuclear or many-core processor.
Further, adopt hardware approach to realize fence synchronous, compared to software approach, there is higher processing speed, further improve the chip handling property with multinuclear or many-core processor.
Embodiment 2
The embodiment of the present invention 2 provides a kind of fence synchronous method, is applied in the chip with multinuclear or many-core processor, this chip is provided with at least two fence synchronous devices, and as shown in Figure 2, the method can comprise:
201, target fence synchronous device receives the fence synchronization message that first processor core sends.
Wherein, fence synchronization message is that first processor core is being determined to send when the multi-threaded program of pre-treatment performs predetermined fence synchronous point, first processor core is any one in all processor cores of comprising of chip, comprise fence mark corresponding to fence synchronous point in fence synchronization message and participate in the number of synchronous multi-threaded program, target fence synchronous device is the fence synchronous device of the fence synchronization message for the treatment of processor core transmission corresponding to predetermined fence synchronous point.
202, the count value that target fence synchronous device identifies the count area the first queue comprised according to fence adds 1.
Wherein, first team is classified as and identifies the corresponding queue for identifying the synchronous multi-threaded program state of all participations with fence, and the first queue comprises fence mark, quene state, count area.
Concrete, when the multi-threaded program that first processor core determines self to work as pre-treatment performs predetermined fence synchronous point, first corresponding according to predetermined fence synchronous point fence mark determines target fence synchronous device, then fence synchronization message is sent to this target fence synchronous device, now target fence synchronous device just can receive that first processor core sends for notifying the multi-threaded program executed that self the processes fence synchronization message to predetermined fence synchronous point, and add 1 according to the fence mark comprised in the fence synchronization message received by with the count value that this fence identifies the count area that the first corresponding queue comprises, so that executed is to the number of the multi-threaded program of predetermined fence synchronous point in the multi-threaded program that record participation is synchronous.
The fence synchronous method that the embodiment of the present invention provides, target fence synchronous device receives the fence synchronization message that first processor core sends, and add 1 according to the fence mark comprised in fence synchronization message by with the count value that fence identifies the count area that the first corresponding queue comprises, so that executed is to the number of the multi-threaded program of predetermined fence synchronous point in the multi-threaded program that record participation is synchronous, determine according to predetermined fence synchronous point the target fence synchronous device processing self fence synchronization message by first processor core, make different fence synchronous points can be mapped to different fence synchronous devices, thus when number of threads increases, avoid occurring access bottleneck, improve the chip handling property with multinuclear or many-core processor.
Further, adopt hardware approach to realize fence synchronous, compared to software approach, there is higher processing speed, further improve the chip handling property with multinuclear or many-core processor.
Embodiment 3
The embodiment of the present invention 3 provides a kind of fence synchronous method, be applied in the chip with multinuclear or many-core processor, this chip is provided with at least two fence synchronous devices, and these at least two fence synchronous devices are distributed in the diverse location of network-on-chip, a kind of structural representation with the chip of multinuclear or many-core processor being provided with 4 fence synchronous devices that the exemplary embodiment of the present invention as shown in Figure 3 provides.As shown in Figure 4, the fence synchronous method that the embodiment of the present invention provides can comprise:
301, first processor core is determined when the multi-threaded program of pre-treatment performs predetermined fence synchronous point.
Wherein, first processor core is any one in all processor cores of comprising of chip.
302, the fence mark that first processor core is corresponding according to predetermined fence synchronous point determines target fence synchronous device.
Wherein, determine by self after the multi-threaded program of pre-treatment performs predetermined fence synchronous point at first processor core, the fence mark that first processor core is corresponding according to this predetermined fence synchronous point, determine processing the fence synchronous device of the fence synchronization message of self at least two the fence synchronous devices comprised from chip, namely determine target fence synchronous device.
In a kind of possible implementation of the embodiment of the present invention, first processor core according to predetermined fence synchronous point corresponding fence mark determine target fence synchronous device concrete can be: first processor according to predetermined fence synchronous point corresponding fence mark, determine target fence synchronous device at least two the fence synchronous devices comprised from chip according to preset rules, this preset rules can comprise the mapping relations of fence mark and fence synchronous device.Such as, in the mode of the system of circulating, suppose the number of the number of fence synchronous point more than fence synchronous device, and chip comprises altogether 4 fence synchronous devices, processor core just can identify by the fence corresponding according to the fence synchronous point of self, the fence synchronous device of the fence synchronization message processing self is determined according to the mode of circulation system, fence as corresponding in certain fence synchronous point is designated 7, chip comprises 4 fence synchronous devices, now processor core just can to fence mark except 4 remainders, what just can obtain the fence synchronous device of the fence synchronization message for the treatment of self is numbered 3, even if can determine that target fence identifies.
It should be noted that, in embodiments of the present invention, adopt the method for the mode determination target fence synchronous device of circulation system, it is only a kind of possible implementation that the embodiment of the present invention provides, it is concrete that how the fence mark corresponding according to fence synchronous point determines target fence synchronous device, can determine according to the demand of practical application scene, the embodiment of the present invention does not do concrete restriction at this.
It should be noted that, in the possible implementation of the embodiment of the present invention, preset rules can also be the corresponding fence synchronous device of each processor core, also the corresponding fence synchronous device of each bank of high-speed cache can be shared, this preset rules can pre-set agreement, and the embodiment of the present invention does not do concrete restriction at this to preset rules.
303, first processor core sends fence synchronization message to target fence synchronous device.
Wherein, comprise fence mark in fence synchronization message and participate in the number of synchronous multi-threaded program.After the fence mark that first processor core is corresponding according to predetermined fence synchronous point determines target fence synchronous device, just can send for notifying the multi-threaded program executed that self the processes fence synchronization message to predetermined fence synchronous point to this target fence synchronous device.
It should be noted that, the mode that fence synchronization message can increase synchronic command by processor core informs target fence synchronous device, and the embodiment of the present invention does not do concrete restriction at this, and this synchronic command can be non-preemptive, can be preemptive type yet.
304, first processor core suspends the process to the multi-threaded program when pre-treatment, enters waiting status.
Wherein, at first processor core to after target fence synchronous device sends fence synchronization message, first processor is endorsed to suspend the process to the multi-threaded program when pre-treatment, enters waiting status, to realize the synchronous object of fence.
305, target fence synchronous device receives the fence synchronization message that first processor core sends.
Wherein, fence synchronization message is that first processor core is being determined to send when the multi-threaded program of pre-treatment performs predetermined fence synchronous point, comprises fence mark corresponding to predetermined fence synchronous point and participate in the number of synchronous multi-threaded program in fence synchronization message; Target fence synchronous device is the fence synchronous device of the fence synchronization message for the treatment of processor core transmission corresponding to predetermined fence synchronous point.
Wherein, when first processor core determines that the multi-threaded program self processed performs predetermined fence synchronous point, and after determining target fence synchronous device, just fence synchronization message can be sent to target fence synchronous device, now, what target fence synchronous device just can receive that first processor core sends comprises fence mark corresponding to predetermined fence synchronous point and participates in the fence synchronization message of number of synchronous multi-threaded program.
306, target fence synchronous device judges whether existence first queue.
Wherein, after target fence synchronous device receives the fence synchronization message of first processor core transmission, just the fence mark can answered according to the predetermined fence synchronous point one comprised in fence synchronization message, determine whether there is and identify the first corresponding queue with this fence, this first team is classified as and identifies the corresponding queue for identifying the synchronous multi-threaded program state of all participations with this fence, and this first queue includes fence mark, quene state and count area, wherein, fence mark is used for unique identification fence synchronous point, quene state is for identifying the state of this queue, can be using state or idle condition, count area is then for recording the number of executed to the multi-threaded program of predetermined fence synchronous point.
It should be noted that, in embodiments of the present invention, a fence synchronous device can by safeguarding that at least one is for identifying the queue of the same synchronous multi-threaded program state of all participations, the expansibility of the fence synchronous method that the embodiment of the present invention is provided is good, can better be applicable to the chip with polycaryon processor, and be more suitable for the chip with many-core processor.
307, when there is not described first queue, target fence synchronous device creates the first queue, and quene state is updated to using state.
Wherein, when target fence synchronous device determine not exist identify the first corresponding queue with fence time, the first queue comprising this fence mark can be created, and the quene state of this first queue is updated to using state by idle condition.
It should be noted that, if target fence synchronous device is determined to exist identify the first corresponding queue with fence, directly perform step 308.
308, the count value that the fence that target fence synchronous device is corresponding according to predetermined fence synchronous point identifies the count area the first queue comprised adds 1.
It should be noted that, in implementation possible in embodiments of the present invention, when the corresponding fence synchronous device of each processor core, when target fence synchronous device receives the fence synchronization message of first processor core transmission, directly can perform step 308, and need not judge whether that existence identifies the first corresponding queue with fence.
In a kind of possible implementation of the embodiment of the present invention, further, the identification information of executed to processor core corresponding to the multi-threaded program of predetermined fence synchronous point is also comprised in first queue, the count value then identifying at the fence that target fence synchronous device is corresponding according to predetermined fence synchronous point the count area the first queue comprised adds after 1, can perform following steps 309-step 311.
309, target fence synchronous device judges whether executed is less than predetermined threshold value to the number of the identification information of processor core corresponding to the multi-threaded program of predetermined fence synchronous point.
Wherein, predetermined threshold value is less than or equal to the maximum thread order that chip is supported.The count value identifying the count area the first queue comprised at the fence that target Fencing system is corresponding according to the predetermined fence synchronous point in the fence synchronization message received adds after 1, just can continue to judge whether executed is less than default threshold value to the number of the identification information of processor core corresponding to the multi-threaded program of predetermined fence synchronous point, when determining that the number of executed to the identification information of processor core corresponding to the multi-threaded program of predetermined fence synchronous point is less than predetermined threshold value, perform following steps 310; When determining that the number of executed to the identification information of processor core corresponding to the multi-threaded program of predetermined fence synchronous point is not less than predetermined threshold value, perform following steps 311.
310, the identification information of first processor core is added in the first queue by target fence synchronous device.
311, the identification information of first processor core is saved in internal memory by target fence synchronous device.
It should be noted that, in embodiments of the present invention, be less than the maximum thread object situation of chip support in predetermined threshold value under, when executed is not less than predetermined threshold value to the number of the identification information of processor core corresponding to the multi-threaded program of predetermined fence synchronous point, by being saved in internal memory by the identification information of first processor core, save the space that target fence synchronous device storage of processor checks the identification information of answering.
In the implementation that the another kind of the embodiment of the present invention is possible, further, the bit sequence whether performing predetermined fence synchronous point for each multi-threaded program identified in the synchronous multi-threaded program of all participations is also comprised in first queue, and there are mapping relations in the identification information of each bit in this bit sequence and processor core, the count value then identifying at the fence that target fence synchronous device is corresponding according to predetermined fence synchronous point the count area the first queue comprised adds after 1, can perform following steps 312.
312, the bit corresponding with the identification information of first processor core is the second mark by the first identification renewal by target fence synchronous device.
Wherein, the first mark does not perform predetermined fence synchronous point for identifying by the multi-threaded program of processor core process, and the second mark is for identifying by the multi-threaded program executed of processor core process to predetermined fence synchronous point.Such as, first is designated 0, second is designated 1, when establishment the first queue, each existence in the bit sequence of mapping relations of identification information with processor core is set to 0, when target fence synchronous device receives the fence synchronization message of first processor core, just the bit corresponding with the identification information of first processor core can be updated to 1 by 0, so that the multi-threaded program executed identifying the process of first processor core is to predetermined fence synchronous point.
It should be noted that, in the possible implementation of the embodiment of the present invention, adopt bit sequence to identify the synchronous multi-threaded program of participation and whether perform predetermined fence synchronous point, thus save the space that target fence synchronous device storage of processor checks the identification information of answering.
313, target fence synchronous device judges whether the count value of count area equals to participate in the number of synchronous multi-threaded program.
Wherein, the count value of the count area the first queue comprised at target Fencing system adds 1, and save first processor core identification information or after having upgraded bit corresponding to the identification information of first processor core, target fence synchronous device just can judge whether that the synchronous equal executed of multi-threaded program of all participations is to predetermined fence synchronous point, namely judges whether the count value of the count area that the first queue comprises equals to participate in the number of synchronous multi-threaded program.
314, when the count value of count area equals the number participating in synchronous multi-threaded program, target fence synchronous device obtains the identification information of processor core corresponding to multi-threaded program that in the synchronous multi-threaded program of all participations, each participation is synchronous.
Wherein, in a kind of possible implementation, when target Fencing system adopts bit sequence to identify to participate in synchronous multi-threaded program executed is to predetermined fence synchronous point, because the identification information of each bit in bit sequence and processor core exists mapping relations, now target fence synchronous device obtain the identification information of processor core corresponding to multi-threaded program that in the synchronous multi-threaded program of all participations, each participation is synchronous concrete can be: the identification information obtaining processor core corresponding to multi-threaded program that in the synchronous multi-threaded program of all participations, each participation is synchronous from the first queue.
In the implementation that another kind is possible, when target Fencing system adopts the identification information of recording processor core to identify to participate in synchronous multi-threaded program executed is to predetermined fence synchronous point, now target fence synchronous device obtain the identification information of processor core corresponding to multi-threaded program that in the synchronous multi-threaded program of all participations, each participation is synchronous concrete can be: the identification information obtaining processor core corresponding to multi-threaded program that the synchronous multi-threaded program of all participations, each participation is synchronous from the first queue, or, the identification information of processor core corresponding to multi-threaded program that in the synchronous multi-threaded program of all participations, each participation is synchronous is obtained from the first queue and internal memory.
It should be noted that, when the count value of count area is not equal to the number participating in synchronous multi-threaded program, represent and participate in synchronous multi-threaded program, also having part multi-threaded program not perform predetermined fence synchronous point, now target fence synchronous device can continue to receive the fence synchronization message that other participate in processor core transmission corresponding to synchronous multi-threaded program, until perform step 314 when the count value of count area equals the number participating in synchronous multi-threaded program.
315, target fence synchronous device is according to the identification information of processor core corresponding to the multi-threaded program that each participation in the synchronous multi-threaded program of all participations is synchronous, and the processor core that the multi-threaded program that each participation is synchronous in the multi-threaded program that all participations are synchronous is corresponding sends acknowledge message.
Wherein, after target fence synchronous device gets the identification information of processor core corresponding to multi-threaded program that in the synchronous multi-threaded program of all participations, each participation is synchronous, the identification information of processor core that just can be corresponding according to the multi-threaded program that each participation in the synchronous multi-threaded program of all participations is synchronous, the processor core that the multi-threaded program that each participation is synchronous in the multi-threaded program that all participations are synchronous is corresponding sends and is used for notification processor and endorses the acknowledge message continuing to process the multi-threaded program needing self to process.
316, the acknowledge message of first processor core receiving target fence synchronous device transmission.
317, first processor core continues the multi-threaded program of process when pre-treatment.
Wherein, after first processor core receives the acknowledge message of target fence synchronous device transmission, just can know that the synchronous equal executed of multi-threaded program of all participations is to predetermined fence synchronous point, now first processor core just can continue to process the multi-threaded program when pre-treatment.
The fence synchronous method that the embodiment of the present invention provides, target fence synchronous device receives the fence synchronization message that first processor core sends, and add 1 according to the fence mark comprised in fence synchronization message by with the count value that fence identifies the count area that the first corresponding queue comprises, so that executed is to the number of the multi-threaded program of predetermined fence synchronous point in the multi-threaded program that record participation is synchronous, determine according to predetermined fence synchronous point the target fence synchronous device processing self fence synchronization message by first processor core, make different fence synchronous points can be mapped to different fence synchronous devices, thus when number of threads increases, avoid occurring access bottleneck, improve the chip handling property with multinuclear or many-core processor.
And, hardware approach is adopted to realize fence synchronous, compared to software approach, there is higher processing speed, further improve the chip handling property with multinuclear or many-core processor, and a fence synchronous device can by safeguarding that at least one is for identifying the queue of the same synchronous multi-threaded program state of all participations, makes it have good expansibility.
Embodiment 4
The embodiment of the present invention 4 provides a kind of first processor core, be applied in the chip with multinuclear or many-core processor, described chip is provided with at least two fence synchronous devices, described first processor core, as shown in Figure 5, this first processor is endorsed to comprise: the first determining unit 41, second determining unit 42, transmitting element 43.
First determining unit 41, for determining that the multi-threaded program when pre-treatment performs predetermined fence synchronous point; Described first processor core is any one in all processor cores of comprising of described chip.
Second determining unit 42, determines target fence synchronous device for the fence mark corresponding according to described predetermined fence synchronous point.
Transmitting element 43, sends fence synchronization message for the described target fence synchronous device obtained to described second determining unit 42; Comprise described fence mark in described fence synchronization message and participate in the number of synchronous multi-threaded program.
In embodiments of the present invention, further alternative, described second determining unit 42, specifically for the fence mark corresponding according to described predetermined fence synchronous point, determines described target fence synchronous device according to preset rules; Described preset rules comprises the mapping relations of fence mark and fence synchronous device.
In embodiments of the present invention, further alternative, as shown in Figure 6, this first processor core can also comprise: the first processing unit 44.
First processing unit 44, at described transmitting element 43 to after described target fence synchronous device sends fence synchronization message, suspend the process to the described multi-threaded program when pre-treatment, enter waiting status.
In embodiments of the present invention, further alternative, this first processor core can also comprise: receiving element 45, second processing unit 46.
Receiving element 45, for suspending the process to the described multi-threaded program when pre-treatment at described first processing unit 44, after entering waiting status, receives the acknowledge message that described target fence synchronous device sends; Described acknowledge message is for notifying that described first processor core continues the described multi-threaded program when pre-treatment of process.
Second processing unit 46, processes the described multi-threaded program when pre-treatment for continuing.
It should be noted that, in the first processor core that the embodiment of the present invention provides, the specific descriptions of functional module can the specific descriptions of corresponding content in reference method embodiment, and in this not go into detail for the embodiment of the present invention.
The first processor core that the embodiment of the present invention provides, when first processor core is determined to perform predetermined fence synchronous point when the multi-threaded program of pre-treatment, the fence mark corresponding according to predetermined fence synchronous point determines target fence synchronous device, then fence synchronization message is sent to this target fence synchronous device, by determining according to predetermined fence synchronous point the target fence synchronous device processing self fence synchronization message, make different fence synchronous points can be mapped to different fence synchronous devices, thus when number of threads increases, avoid occurring access bottleneck, improve the chip handling property with multinuclear or many-core processor.
And, hardware approach is adopted to realize fence synchronous, compared to software approach, there is higher processing speed, further improve the chip handling property with multinuclear or many-core processor, and a fence synchronous device can by safeguarding that at least one is for identifying the queue of the same synchronous multi-threaded program state of all participations, makes it have good expansibility.
Embodiment 5
The embodiment of the present invention 5 provides a kind of target fence synchronous device, be applied in the chip with multinuclear or many-core processor, described chip is provided with at least two fence synchronous devices, described target fence synchronous device, as shown in Figure 7, this target fence synchronous device can comprise: receiving element 51, processing unit 52.
Receiving element 51, for receiving the fence synchronization message that first processor core sends; Described fence synchronization message is that described first processor core is being determined to send when the multi-threaded program of pre-treatment performs predetermined fence synchronous point, described first processor core is any one in all processor cores of comprising of described chip, comprises fence mark corresponding to described predetermined fence synchronous point and participate in the number of synchronous multi-threaded program in described fence synchronization message; Described target fence synchronous device is the fence synchronous device of the fence synchronization message for the treatment of processor core transmission corresponding to described predetermined fence synchronous point.
Processing unit 52, the count value identifying the count area the first queue comprised for the fence corresponding according to the described predetermined fence synchronous point comprised in the described fence synchronization message obtained of described receiving element 51 adds 1; Described first team is classified as and identifies the corresponding queue for identifying the synchronous multi-threaded program state of all participations with described fence; Described first queue comprises described fence mark, quene state, described count area.
In embodiments of the present invention, further alternative, as shown in Figure 8, this target fence synchronous device can also comprise: judging unit 53, establishment updating block 54.
Judging unit 53, adds before 1 for the count value identifying the count area the first queue comprised at described processing unit 52 according to described fence, judges whether to there is described first queue.
Create updating block 54, for when described judging unit 53 is not existed described first queue, create described first queue, and described quene state is updated to using state.
In embodiments of the present invention, further alternative, described first queue also comprises the identification information of executed to processor core corresponding to the multi-threaded program of described predetermined fence synchronous point.
Described target fence synchronous device, can also comprise: adding device 55.
Adding device 55, for receive at described receiving element 51 first processor core send fence synchronization message after, the identification information of described first processor core is added in described first queue.
In embodiments of the present invention, further alternative, described judging unit 53, also for before being added in described first queue at described adding device 55 by the identification information of described first processor core, judge whether described executed is less than predetermined threshold value to the number of the identification information of processor core corresponding to the multi-threaded program of described predetermined fence synchronous point; Described predetermined threshold value is less than or equal to the maximum thread order that described chip is supported.
Described adding device 55, during specifically for determining that when described judging unit 53 described executed is less than described predetermined threshold value to the number of the identification information of processor core corresponding to the multi-threaded program of described predetermined fence synchronous point, the identification information of described first processor core is added in described first queue.
In embodiments of the present invention, further alternative, described target fence synchronous device, can also comprise: storage unit 56.
Storage unit 56, during for determining that when described judging unit 53 described executed is not less than described predetermined threshold value to the number of the identification information of processor core corresponding to the multi-threaded program of described predetermined fence synchronous point, the identification information of described first processor core is saved in internal memory.
In embodiments of the present invention, further alternative, described first queue also comprises the bit sequence whether performing described predetermined fence synchronous point for each multi-threaded program identified in the synchronous multi-threaded program of all participations, and each bit in described bit sequence and the identification information of processor core exist mapping relations.
Described target fence synchronous device, can also comprise: updating block 57.
The bit corresponding with the identification information of described first processor core, for receive fence synchronization message that first processor core sends at described receiving element 51 after, is the second mark by the first identification renewal by updating block 57; Described first mark does not perform described predetermined fence synchronous point for identifying by the multi-threaded program of processor core process, and described second mark is for identifying by the multi-threaded program executed of processor core process to described predetermined fence synchronous point.
In embodiments of the present invention, further alternative, described judging unit 53, count value also for identify the count area the first queue comprised according to described fence at described processing unit 52 adds after 1, judges whether the count value of described count area equals the number of the synchronous multi-threaded program of described participation.
Described target fence synchronous device, can also comprise: acquiring unit 58, transmitting element 59.
Acquiring unit 58, for when the count value that described judging unit 53 obtains described count area equals the number of the synchronous multi-threaded program of described participation, obtain the identification information of processor core corresponding to multi-threaded program that in the synchronous multi-threaded program of all participations, each participation is synchronous.
Transmitting element 59, for the identification information of processor core corresponding to the multi-threaded program that each participation in the multi-threaded program that the described all participations obtained according to described acquiring unit 58 are synchronous is synchronous, the processor core that the multi-threaded program that each participation is synchronous in the multi-threaded program that all participations are synchronous is corresponding sends acknowledge message; Described acknowledge message is for notifying that described processor core continues to process the multi-threaded program needing self to process.
In embodiments of the present invention, further alternative, described acquiring unit 58, specifically for obtaining the identification information of processor core corresponding to multi-threaded program that in the synchronous multi-threaded program of described all participations, each participation is synchronous from described first queue.
In embodiments of the present invention, further alternative, described acquiring unit 58, specifically for obtaining the identification information of processor core corresponding to multi-threaded program that in the synchronous multi-threaded program of described all participations, each participation is synchronous from described first queue and described internal memory.
It should be noted that, in the target fence synchronous device that the embodiment of the present invention provides, the specific descriptions of functional module can the specific descriptions of corresponding content in reference method embodiment, and in this not go into detail for the embodiment of the present invention.
The target fence synchronous device that the embodiment of the present invention provides, receive the fence synchronization message that first processor core sends, and add 1 according to the fence mark comprised in fence synchronization message by with the count value that fence identifies the count area that the first corresponding queue comprises, so that executed is to the number of the multi-threaded program of predetermined fence synchronous point in the multi-threaded program that record participation is synchronous, determine according to predetermined fence synchronous point the target fence synchronous device processing self fence synchronization message by first processor core, make different fence synchronous points can be mapped to different fence synchronous devices, thus when number of threads increases, avoid occurring access bottleneck, improve the chip handling property with multinuclear or many-core processor.
And, hardware approach is adopted to realize fence synchronous, compared to software approach, there is higher processing speed, further improve the chip handling property with multinuclear or many-core processor, and a fence synchronous device can by safeguarding that at least one is for identifying the queue of the same synchronous multi-threaded program state of all participations, makes it have good expansibility.
Embodiment 6
The embodiment of the present invention 6 provides a kind of fence synchronizer, be applied in the chip with multinuclear or many-core processor, described chip is provided with at least two fence synchronous devices, as shown in Figure 9, described fence synchronizer can comprise: at least one processor core, storer 62, communication interface 63 and bus 64, this at least one processor core, storer 62 and communication interface 63 are connected by bus 64 and complete mutual communication, wherein:
Described bus 64 can be industry standard architecture (Industry StandardArchitecture, ISA) bus, peripheral component interconnect (Peripheral ComponentInterconnect, PCI) bus or extended industry-standard architecture (Extended IndustryStandard Architecture, EISA) bus etc.This bus 64 can be divided into address bus, data bus, control bus etc.For ease of representing, only representing with a thick line in Fig. 9, but not representing the bus only having a bus or a type.
Described storer 62 is for stores executable programs code, and this program code comprises computer-managed instruction.Storer 62 may comprise high-speed RAM storer, still may comprise nonvolatile memory (non-volatile memory), such as at least one magnetic disk memory.
Described processor core may be a central processing unit (Central Processing Unit, CPU), or specific integrated circuit (Application Specific Integrated Circuit, or be configured to implement one or more integrated circuit of the embodiment of the present invention ASIC).
Described communication interface 63, is mainly used in the communication realized between the equipment of the present embodiment.
Wherein, for any one processor core (the first processor core 61 namely described in the embodiment of the present invention) at least one processor core described, specifically for performing following functions:
Described first processor core 61, for determining that the multi-threaded program when pre-treatment performs predetermined fence synchronous point; Any one in all processor cores that described first processor core 61 comprises for described chip; The fence mark corresponding according to described predetermined fence synchronous point determines target fence synchronous device; Fence synchronization message is sent to described target fence synchronous device; Comprise described fence mark in described fence synchronization message and participate in the number of synchronous multi-threaded program.
In embodiments of the present invention, further alternative, described first processor core 61, specifically for the fence mark corresponding according to described predetermined fence synchronous point, determines described target fence synchronous device according to preset rules; Described preset rules comprises the mapping relations of fence mark and fence synchronous device.
In embodiments of the present invention, further alternative, described first processor core 61, also for described to after described target fence synchronous device sends fence synchronization message, suspend the process to the described multi-threaded program when pre-treatment, enter waiting status.
In embodiments of the present invention, further alternative, described first processor core 61, also in the process of described time-out to the described multi-threaded program when pre-treatment, after entering waiting status, receives the acknowledge message that described target fence synchronous device sends; Described acknowledge message is for notifying that described first processor core 61 continues the described multi-threaded program when pre-treatment of process; Continue the described multi-threaded program when pre-treatment of process.
It should be noted that, in the first processor core that the embodiment of the present invention provides, the specific descriptions of functional module can the specific descriptions of corresponding content in reference method embodiment, and in this not go into detail for the embodiment of the present invention.
The first processor core that the embodiment of the present invention provides, when first processor core is determined to perform predetermined fence synchronous point when the multi-threaded program of pre-treatment, the fence mark corresponding according to predetermined fence synchronous point determines target fence synchronous device, then fence synchronization message is sent to this target fence synchronous device, by determining according to predetermined fence synchronous point the target fence synchronous device processing self fence synchronization message, make different fence synchronous points can be mapped to different fence synchronous devices, thus when number of threads increases, avoid occurring access bottleneck, improve the chip handling property with multinuclear or many-core processor.
And, hardware approach is adopted to realize fence synchronous, compared to software approach, there is higher processing speed, further improve the chip handling property with multinuclear or many-core processor, and a fence synchronous device can by safeguarding that at least one is for identifying the queue of the same synchronous multi-threaded program state of all participations, makes it have good expansibility.
Embodiment 7
The embodiment of the present invention 7 provides a kind of fence synchronizer, be applied in the chip with multinuclear or many-core processor, described chip is provided with at least two fence synchronous devices, as shown in Figure 10, described fence synchronizer can comprise: at least one processor 71, storer 72, communication interface 73 and bus 74, this at least one processor 71, storer 72 and communication interface 73 are connected by bus 74 and complete mutual communication, wherein:
Described bus 74 can be isa bus, pci bus or eisa bus etc.This bus 74 can be divided into address bus, data bus, control bus etc.For ease of representing, only representing with a thick line in Figure 10, but not representing the bus only having a bus or a type.
Described storer 72 is for stores executable programs code, and this program code comprises computer-managed instruction.Storer 72 may comprise high-speed RAM storer, still may comprise nonvolatile memory (non-volatile memory), such as at least one magnetic disk memory.
Described processor 71 may be a CPU, or ASIC, or is configured to the one or more integrated circuit implementing the embodiment of the present invention.
Described communication interface 73, is mainly used in the communication realized between the equipment of the present embodiment.
Described processor 71, for performing the executable program code in described storer 72, specifically for performing following functions:
Described processor 71, for receiving the fence synchronization message that first processor core sends; Described fence synchronization message is that described first processor core is being determined to send when the multi-threaded program of pre-treatment performs predetermined fence synchronous point, described first processor core is any one in all processor cores of comprising of described chip, comprises fence mark corresponding to described predetermined fence synchronous point and participate in the number of synchronous multi-threaded program in described fence synchronization message; The count value that the fence corresponding according to described predetermined fence synchronous point identifies the count area the first queue comprised adds 1; Described first team is classified as and identifies the corresponding queue for identifying the synchronous multi-threaded program state of all participations with described fence; Described first queue comprises described fence mark, quene state, described count area.
In embodiments of the present invention, further alternative, described processor 71, also for adding before 1 in the described count value identifying the count area the first queue comprised according to described fence, judge whether to there is described first queue, when there is not described first queue, creating described first queue, and described quene state is updated to using state.
In embodiments of the present invention, further alternative, described first queue also comprises the identification information of executed to processor core corresponding to the multi-threaded program of described predetermined fence synchronous point.
Described processor 71, also for after the fence synchronization message of described reception first processor core transmission, is added into the identification information of described first processor core in described first queue.
In embodiments of the present invention, further alternative, described processor 71, also for before being added in described first queue at the described identification information by described first processor core, judge whether described executed is less than predetermined threshold value to the number of the identification information of processor core corresponding to the multi-threaded program of described predetermined fence synchronous point; Described predetermined threshold value is less than or equal to the maximum thread order that described chip is supported, when determining that the number of described executed to the identification information of processor core corresponding to the multi-threaded program of described predetermined fence synchronous point is less than described predetermined threshold value, performing the described identification information by described first processor core and being added in described first queue.
In embodiments of the present invention, further alternative, described processor 71, also for when determining that the number of described executed to the identification information of processor core corresponding to the multi-threaded program of described predetermined fence synchronous point is not less than described predetermined threshold value, the identification information of described first processor core is saved in internal memory.
In embodiments of the present invention, further alternative, described first queue also comprises the bit sequence whether performing described predetermined fence synchronous point for each multi-threaded program identified in the synchronous multi-threaded program of all participations, and each bit in described bit sequence and the identification information of processor core exist mapping relations.
The bit corresponding with the identification information of described first processor core, also for after the fence synchronization message that sends at described reception first processor core, is the second mark by the first identification renewal by described processor 71; Described first mark does not perform described predetermined fence synchronous point for identifying by the multi-threaded program of processor core process, and described second mark is for identifying by the multi-threaded program executed of processor core process to described predetermined fence synchronous point.
In embodiments of the present invention, further alternative, described processor 71, also for adding after 1 in the described count value identifying the count area the first queue comprised according to described fence, judge whether the count value of described count area equals the number of the synchronous multi-threaded program of described participation, when the count value of described count area equals the number of the synchronous multi-threaded program of described participation, obtain the identification information of processor core corresponding to multi-threaded program that in the synchronous multi-threaded program of all participations, each participation is synchronous, according to the identification information of processor core corresponding to the multi-threaded program that each participation in the multi-threaded program that described all participations are synchronous is synchronous, the processor core that the multi-threaded program that each participation is synchronous in the multi-threaded program that all participations are synchronous is corresponding sends acknowledge message, described acknowledge message is for notifying that described processor core continues to process the multi-threaded program needing self to process.
In embodiments of the present invention, further alternative, described processor 71, also for obtaining the identification information of processor core corresponding to multi-threaded program that in the synchronous multi-threaded program of described all participations, each participation is synchronous from described first queue.
In embodiments of the present invention, further alternative, described processor 71, also for obtaining the identification information of processor core corresponding to multi-threaded program that in the synchronous multi-threaded program of described all participations, each participation is synchronous from described first queue and described internal memory.
It should be noted that, in the target fence synchronous device that the embodiment of the present invention provides, the specific descriptions of functional module can the specific descriptions of corresponding content in reference method embodiment, and in this not go into detail for the embodiment of the present invention.
The target fence synchronous device that the embodiment of the present invention provides, receive the fence synchronization message that first processor core sends, and add 1 according to the fence mark comprised in fence synchronization message by with the count value that fence identifies the count area that the first corresponding queue comprises, so that executed is to the number of the multi-threaded program of predetermined fence synchronous point in the multi-threaded program that record participation is synchronous, determine according to predetermined fence synchronous point the target fence synchronous device processing self fence synchronization message by first processor core, make different fence synchronous points can be mapped to different fence synchronous devices, thus when number of threads increases, avoid occurring access bottleneck, improve the chip handling property with multinuclear or many-core processor.
And, hardware approach is adopted to realize fence synchronous, compared to software approach, there is higher processing speed, further improve the chip handling property with multinuclear or many-core processor, and a fence synchronous device can by safeguarding that at least one is for identifying the queue of the same synchronous multi-threaded program state of all participations, makes it have good expansibility.
Through the above description of the embodiments, those skilled in the art can be well understood to, for convenience and simplicity of description, only be illustrated with the division of above-mentioned each functional module, in practical application, can distribute as required and by above-mentioned functions and be completed by different functional modules, the inner structure by device is divided into different functional modules, to complete all or part of function described above.The specific works process of the device of foregoing description, with reference to the corresponding process in preceding method embodiment, can not repeat them here.
In several embodiments that the application provides, should be understood that disclosed apparatus and method can realize by another way.Such as, device embodiment described above is only schematic, such as, the division of described module or unit, be only a kind of logic function to divide, actual can have other dividing mode when realizing, such as multiple unit or assembly can in conjunction with or another device can be integrated into, or some features can be ignored, or do not perform.Another point, shown or discussed coupling each other or direct-coupling or communication connection can be by some interfaces, and the indirect coupling of device or unit or communication connection can be electrical, machinery or other form.
The described unit illustrated as separating component or can may not be and physically separates, and the parts as unit display can be a physical location or multiple physical location, namely can be positioned at a place, or also can be distributed to multiple different local.Some or all of unit wherein can be selected according to the actual needs to realize the object of the present embodiment scheme.
In addition, each functional unit in each embodiment of the present invention can be integrated in a processing unit, also can be that the independent physics of unit exists, also can two or more unit in a unit integrated.Above-mentioned integrated unit both can adopt the form of hardware to realize, and the form of SFU software functional unit also can be adopted to realize.
If described integrated unit using the form of SFU software functional unit realize and as independently production marketing or use time, can be stored in a read/write memory medium.Based on such understanding, the part that technical scheme of the present invention contributes to prior art in essence in other words or all or part of of this technical scheme can embody with the form of software product, this software product is stored in a storage medium, comprise all or part of step of some instructions in order to make an equipment (can be single-chip microcomputer, chip etc.) or processor (processor) perform method described in each embodiment of the present invention.And aforesaid storage medium comprises: USB flash disk, portable hard drive, ROM (read-only memory) (ROM, Read-Only Memory), random access memory (RAM, RandomAccess Memory), magnetic disc or CD etc. various can be program code stored medium.
The above; be only the specific embodiment of the present invention, but protection scope of the present invention is not limited thereto, is anyly familiar with those skilled in the art in the technical scope that the present invention discloses; change can be expected easily or replace, all should be encompassed within protection scope of the present invention.Therefore, protection scope of the present invention should described be as the criterion with the protection domain of claim.

Claims (26)

1. a fence synchronous method, is characterized in that, is applied in the chip with multinuclear or many-core processor, and described chip is provided with at least two fence synchronous devices, described method comprises:
First processor core is determined when the multi-threaded program of pre-treatment performs predetermined fence synchronous point; Described first processor core is any one in all processor cores of comprising of described chip;
The fence mark corresponding according to described predetermined fence synchronous point determines target fence synchronous device;
Fence synchronization message is sent to described target fence synchronous device; Comprise described fence mark in described fence synchronization message and participate in the number of synchronous multi-threaded program.
2. method according to claim 1, is characterized in that, the described fence corresponding according to described predetermined fence synchronous point mark determines target fence synchronous device, comprising:
The fence mark corresponding according to described predetermined fence synchronous point, determines described target fence synchronous device according to preset rules; Described preset rules comprises the mapping relations of fence mark and fence synchronous device.
3. method according to claim 1, is characterized in that, described after described target fence synchronous device transmission fence synchronization message, also comprises:
Suspend the process to the described multi-threaded program when pre-treatment, enter waiting status.
4. method according to claim 3, is characterized in that, in the process of described time-out to the described multi-threaded program when pre-treatment, after entering waiting status, also comprises:
Receive the acknowledge message that described target fence synchronous device sends; Described acknowledge message is for notifying that described first processor core continues the described multi-threaded program when pre-treatment of process;
Continue the described multi-threaded program when pre-treatment of process.
5. a fence synchronous method, is characterized in that, is applied in the chip with multinuclear or many-core processor, and described chip is provided with at least two fence synchronous devices, described method comprises:
Target fence synchronous device receives the fence synchronization message that first processor core sends; Described fence synchronization message is that described first processor core is being determined to send when the multi-threaded program of pre-treatment performs predetermined fence synchronous point, described first processor core is any one in all processor cores of comprising of described chip, comprises fence mark corresponding to described predetermined fence synchronous point and participate in the number of synchronous multi-threaded program in described fence synchronization message; Described target fence synchronous device is the fence synchronous device of the fence synchronization message for the treatment of processor core transmission corresponding to described predetermined fence synchronous point;
The count value that the fence corresponding according to described predetermined fence synchronous point identifies the count area the first queue comprised adds 1; Described first team is classified as and identifies the corresponding queue for identifying the synchronous multi-threaded program state of all participations with described fence; Described first queue comprises described fence mark, quene state, described count area.
6. method according to claim 5, is characterized in that, adds before 1, also comprise in the described count value identifying the count area the first queue comprised according to described fence:
Judge whether to there is described first queue;
When there is not described first queue, creating described first queue, and described quene state is updated to using state.
7. method according to claim 5, is characterized in that, described first queue also comprises the identification information of executed to processor core corresponding to the multi-threaded program of described predetermined fence synchronous point;
After the fence synchronization message that described reception first processor core sends, also comprise:
The identification information of described first processor core is added in described first queue.
8. method according to claim 7, is characterized in that, before the described identification information by described first processor core is added in described first queue, also comprises:
Judge whether described executed is less than predetermined threshold value to the number of the identification information of processor core corresponding to the multi-threaded program of described predetermined fence synchronous point; Described predetermined threshold value is less than or equal to the maximum thread order that described chip is supported;
When determining that the number of described executed to the identification information of processor core corresponding to the multi-threaded program of described predetermined fence synchronous point is less than described predetermined threshold value, performing the described identification information by described first processor core and being added in described first queue.
9. method according to claim 8, is characterized in that, also comprises:
When determining that the number of described executed to the identification information of processor core corresponding to the multi-threaded program of described predetermined fence synchronous point is not less than described predetermined threshold value, the identification information of described first processor core is saved in internal memory.
10. method according to claim 5, it is characterized in that, described first queue also comprises the bit sequence whether performing described predetermined fence synchronous point for each multi-threaded program identified in the synchronous multi-threaded program of all participations, and each bit in described bit sequence and the identification information of processor core exist mapping relations;
After the fence synchronization message that described reception first processor core sends, also comprise:
Be the second mark by the bit corresponding with the identification information of described first processor core by the first identification renewal; Described first mark does not perform described predetermined fence synchronous point for identifying by the multi-threaded program of processor core process, and described second mark is for identifying by the multi-threaded program executed of processor core process to described predetermined fence synchronous point.
11. methods according to claim 5, is characterized in that, add after 1, also comprise in the described count value identifying the count area the first queue comprised according to described fence:
Judge whether the count value of described count area equals the number of the synchronous multi-threaded program of described participation;
When the count value of described count area equals the number of the synchronous multi-threaded program of described participation, obtain the identification information of processor core corresponding to multi-threaded program that in the synchronous multi-threaded program of all participations, each participation is synchronous;
According to the identification information of processor core corresponding to the multi-threaded program that each participation in the multi-threaded program that described all participations are synchronous is synchronous, the processor core that the multi-threaded program that each participation is synchronous in the multi-threaded program that all participations are synchronous is corresponding sends acknowledge message; Described acknowledge message is for notifying that described processor core continues to process the multi-threaded program needing self to process.
12. methods according to claim 7 or 10, it is characterized in that, the identification information of the processor core that the multi-threaded program that in the multi-threaded program that all participations of described acquisition are synchronous, each participation is synchronous is corresponding, comprising:
The identification information of processor core corresponding to multi-threaded program that in the synchronous multi-threaded program of described all participations, each participation is synchronous is obtained from described first queue.
13. methods according to claim 8 or claim 9, it is characterized in that, the identification information of the processor core that the multi-threaded program that in the multi-threaded program that all participations of described acquisition are synchronous, each participation is synchronous is corresponding, comprising:
The identification information of processor core corresponding to multi-threaded program that in the synchronous multi-threaded program of described all participations, each participation is synchronous is obtained from described first queue and described internal memory.
14. 1 kinds of first processor cores, is characterized in that, are applied in the chip with multinuclear or many-core processor, described chip are provided with at least two fence synchronous devices, and described first processor core, comprising:
First determining unit, for determining that the multi-threaded program when pre-treatment performs predetermined fence synchronous point; Described first processor core is any one in all processor cores of comprising of described chip;
Second determining unit, determines target fence synchronous device for the fence mark corresponding according to described predetermined fence synchronous point;
Transmitting element, sends fence synchronization message for the described target fence synchronous device obtained to described second determining unit; Comprise described fence mark in described fence synchronization message and participate in the number of synchronous multi-threaded program.
15. first processor cores according to claim 14, is characterized in that, described second determining unit, specifically for:
The fence mark corresponding according to described predetermined fence synchronous point, determines described target fence synchronous device according to preset rules; Described preset rules comprises the mapping relations of fence mark and fence synchronous device.
16. first processor cores according to claim 14, is characterized in that, also comprise:
First processing unit, at described transmitting element to after described target fence synchronous device sends fence synchronization message, suspend the process to the described multi-threaded program when pre-treatment, enter waiting status.
17. first processor cores according to claim 16, is characterized in that, also comprise:
Receiving element, for suspending the process to the described multi-threaded program when pre-treatment at described first processing unit, after entering waiting status, receives the acknowledge message that described target fence synchronous device sends; Described acknowledge message is for notifying that described first processor core continues the described multi-threaded program when pre-treatment of process;
Second processing unit, processes the described multi-threaded program when pre-treatment for continuing.
18. 1 kinds of target fence synchronous devices, is characterized in that, are applied in the chip with multinuclear or many-core processor, described chip are provided with at least two fence synchronous devices, and described target fence synchronous device, comprising:
Receiving element, for receiving the fence synchronization message that first processor core sends; Described fence synchronization message is that described first processor core is being determined to send when the multi-threaded program of pre-treatment performs predetermined fence synchronous point, described first processor core is any one in all processor cores of comprising of described chip, comprises fence mark corresponding to described predetermined fence synchronous point and participate in the number of synchronous multi-threaded program in described fence synchronization message; Described target fence synchronous device is the fence synchronous device of the fence synchronization message for the treatment of processor core transmission corresponding to described predetermined fence synchronous point;
Processing unit, the count value identifying the count area the first queue comprised for the fence corresponding according to the described predetermined fence synchronous point comprised in the described fence synchronization message obtained of described receiving element adds 1; Described first team is classified as and identifies the corresponding queue for identifying the synchronous multi-threaded program state of all participations with described fence; Described first queue comprises described fence mark, quene state, described count area.
19. target fence synchronous devices according to claim 18, is characterized in that, also comprise:
Judging unit, adds before 1 for the count value identifying the count area the first queue comprised at described processing unit according to described fence, judges whether to there is described first queue;
Create updating block, for when described judging unit is not existed described first queue, create described first queue, and described quene state is updated to using state.
20. target fence synchronous devices according to claim 18, is characterized in that, described first queue also comprises the identification information of executed to processor core corresponding to the multi-threaded program of described predetermined fence synchronous point;
Described target fence synchronous device, also comprises:
Adding device, after receiving the fence synchronization message of first processor core transmission at described receiving element, is added into the identification information of described first processor core in described first queue.
21. target fence synchronous devices according to claim 20, is characterized in that,
Described judging unit, also for before being added in described first queue at described adding device by the identification information of described first processor core, judge whether described executed is less than predetermined threshold value to the number of the identification information of processor core corresponding to the multi-threaded program of described predetermined fence synchronous point; Described predetermined threshold value is less than or equal to the maximum thread order that described chip is supported;
Described adding device, during specifically for determining that when described judging unit described executed is less than described predetermined threshold value to the number of the identification information of processor core corresponding to the multi-threaded program of described predetermined fence synchronous point, the identification information of described first processor core is added in described first queue.
22. target fence synchronous devices according to claim 21, is characterized in that, also comprise:
Storage unit, during for determining that when described judging unit described executed is not less than described predetermined threshold value to the number of the identification information of processor core corresponding to the multi-threaded program of described predetermined fence synchronous point, the identification information of described first processor core is saved in internal memory.
23. target fence synchronous devices according to claim 18, it is characterized in that, described first queue also comprises the bit sequence whether performing described predetermined fence synchronous point for each multi-threaded program identified in the synchronous multi-threaded program of all participations, and each bit in described bit sequence and the identification information of processor core exist mapping relations;
Described target fence synchronous device, also comprises:
The bit corresponding with the identification information of described first processor core, for receive fence synchronization message that first processor core sends at described receiving element after, is the second mark by the first identification renewal by updating block; Described first mark does not perform described predetermined fence synchronous point for identifying by the multi-threaded program of processor core process, and described second mark is for identifying by the multi-threaded program executed of processor core process to described predetermined fence synchronous point.
24. target fence synchronous devices according to claim 18, is characterized in that,
Described judging unit, the count value also for identify the count area the first queue comprised according to described fence at described processing unit adds after 1, judges whether the count value of described count area equals the number of the synchronous multi-threaded program of described participation;
Described target fence synchronous device, also comprises:
Acquiring unit, when count value for obtaining described count area when described judging unit equals the number of described participation synchronous multi-threaded program, obtain the identification information of processor core corresponding to multi-threaded program that in the synchronous multi-threaded program of all participations, each participation is synchronous;
Transmitting element, for the identification information of processor core corresponding to the multi-threaded program that each participation in the multi-threaded program that the described all participations obtained according to described acquiring unit are synchronous is synchronous, the processor core that the multi-threaded program that each participation is synchronous in the multi-threaded program that all participations are synchronous is corresponding sends acknowledge message; Described acknowledge message is for notifying that described processor core continues to process the multi-threaded program needing self to process.
25. target fence synchronous devices according to claim 20 or 23, is characterized in that, described acquiring unit, specifically for:
The identification information of processor core corresponding to multi-threaded program that in the synchronous multi-threaded program of described all participations, each participation is synchronous is obtained from described first queue.
26. target fence synchronous devices according to claim 21 or 22, is characterized in that, described acquiring unit, specifically for:
The identification information of processor core corresponding to multi-threaded program that in the synchronous multi-threaded program of described all participations, each participation is synchronous is obtained from described first queue and described internal memory.
CN201410098952.1A 2014-03-17 2014-03-17 A kind of fence synchronous method and equipment Active CN104932947B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410098952.1A CN104932947B (en) 2014-03-17 2014-03-17 A kind of fence synchronous method and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410098952.1A CN104932947B (en) 2014-03-17 2014-03-17 A kind of fence synchronous method and equipment

Publications (2)

Publication Number Publication Date
CN104932947A true CN104932947A (en) 2015-09-23
CN104932947B CN104932947B (en) 2018-06-05

Family

ID=54120120

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410098952.1A Active CN104932947B (en) 2014-03-17 2014-03-17 A kind of fence synchronous method and equipment

Country Status (1)

Country Link
CN (1) CN104932947B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110716812A (en) * 2019-09-12 2020-01-21 无锡江南计算技术研究所 Distributed synchronous management method and device supporting high concurrency
CN112783663A (en) * 2021-01-15 2021-05-11 中国人民解放军国防科技大学 Extensible fence synchronization method and device

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050050374A1 (en) * 2003-08-25 2005-03-03 Tomohiro Nakamura Method for synchronizing processors in a multiprocessor system
CN101925881A (en) * 2008-01-25 2010-12-22 学校法人早稻田大学 Multiprocessor system and multiprocessor system synchronization method
US20120179896A1 (en) * 2011-01-10 2012-07-12 International Business Machines Corporation Method and apparatus for a hierarchical synchronization barrier in a multi-node system
CN102591722A (en) * 2011-12-31 2012-07-18 龙芯中科技术有限公司 NoC (Network-on-Chip) multi-core processor multi-thread resource allocation processing method and system
CN103116527A (en) * 2013-03-05 2013-05-22 中国人民解放军国防科学技术大学 Super-large-scale barrier synchronization method based on network controller
CN103336571A (en) * 2013-06-13 2013-10-02 中国科学院计算技术研究所 Method and system for reducing power consumption of multi-thread program

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050050374A1 (en) * 2003-08-25 2005-03-03 Tomohiro Nakamura Method for synchronizing processors in a multiprocessor system
CN101925881A (en) * 2008-01-25 2010-12-22 学校法人早稻田大学 Multiprocessor system and multiprocessor system synchronization method
US20120179896A1 (en) * 2011-01-10 2012-07-12 International Business Machines Corporation Method and apparatus for a hierarchical synchronization barrier in a multi-node system
CN102591722A (en) * 2011-12-31 2012-07-18 龙芯中科技术有限公司 NoC (Network-on-Chip) multi-core processor multi-thread resource allocation processing method and system
CN103116527A (en) * 2013-03-05 2013-05-22 中国人民解放军国防科学技术大学 Super-large-scale barrier synchronization method based on network controller
CN103336571A (en) * 2013-06-13 2013-10-02 中国科学院计算技术研究所 Method and system for reducing power consumption of multi-thread program

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110716812A (en) * 2019-09-12 2020-01-21 无锡江南计算技术研究所 Distributed synchronous management method and device supporting high concurrency
CN112783663A (en) * 2021-01-15 2021-05-11 中国人民解放军国防科技大学 Extensible fence synchronization method and device
CN112783663B (en) * 2021-01-15 2023-06-13 中国人民解放军国防科技大学 Extensible fence synchronization method and equipment

Also Published As

Publication number Publication date
CN104932947B (en) 2018-06-05

Similar Documents

Publication Publication Date Title
US9971635B2 (en) Method and apparatus for a hierarchical synchronization barrier in a multi-node system
US10411953B2 (en) Virtual machine fault tolerance method, apparatus, and system
US9639409B2 (en) Device and method for communicating between cores
CN107436809B (en) data processor
US8725873B1 (en) Multi-server round robin arbiter
US11868780B2 (en) Central processor-coprocessor synchronization
EP3230861A1 (en) Technologies for fast synchronization barriers for many-core processing
CN105210046A (en) Memory latency management
CN104102549A (en) Method, device and chip for realizing mutual exclusion operation of multiple threads
US10310909B2 (en) Managing execution of computer operations with non-competing computer resource requirements
US20180165008A1 (en) Memory transaction prioritization
US9697127B2 (en) Semiconductor device for controlling prefetch operation
US10452134B2 (en) Automated peripheral device handoff based on eye tracking
CN104932947A (en) Barrier synchronization method and device
CN104461957A (en) Method and device for heterogeneous multi-core CPU share on-chip caching
US20130238871A1 (en) Data processing method and apparatus, pci-e bus system, and server
CN104426624B (en) A kind of image synchronous display method and device
CN109002286A (en) Data asynchronous processing method and device based on synchronous programming
CN115934625B (en) Doorbell knocking method, equipment and medium for remote direct memory access
US9864604B2 (en) Distributed mechanism for clock and reset control in a microprocessor
CN103955397A (en) Virtual machine scheduling multi-strategy selection method based on micro-architecture perception
EP2750045A1 (en) Method and apparatus for allocating interrupts in a multi-core system
WO2016050059A1 (en) Shared storage concurrent access processing method and device, and storage medium
US20220147097A1 (en) Synchronization signal generating circuit, chip and synchronization method and device, based on multi-core architecture
US20130247065A1 (en) Apparatus and method for executing multi-operating systems

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant