TWI489393B - Applied Assignment Method for Multi - core System - Google Patents
Applied Assignment Method for Multi - core System Download PDFInfo
- Publication number
- TWI489393B TWI489393B TW102141764A TW102141764A TWI489393B TW I489393 B TWI489393 B TW I489393B TW 102141764 A TW102141764 A TW 102141764A TW 102141764 A TW102141764 A TW 102141764A TW I489393 B TWI489393 B TW I489393B
- Authority
- TW
- Taiwan
- Prior art keywords
- work
- processor
- group
- execution time
- processors
- Prior art date
Links
Landscapes
- Memory System Of A Hierarchy Structure (AREA)
- Multi Processors (AREA)
Description
本發明係與多核心系統有關;特別是指一種應用於多核心系統的工作分配方法。The present invention relates to a multi-core system; in particular, to a work distribution method applied to a multi-core system.
受到半導體製程、物理特性及散熱技術的限制,處理器的運算速度在短期內不太可能有顯著的突破,目前資訊產業普遍的因應之道,便是將多個處理器(在市場上通常稱之為「核心」)整合在單一裝置內,這樣的概念已廣泛應用在從智慧型手機到高階伺服器的各式電子產品。以單晶片多處理器(Chip Multi-Processor,CMP)產品為例,就包含有多個處理器,所以能夠進行多工處理;而大多數配備有多個處理器的系統普遍使用最後一級快取(Last Level Cache,LLC)技術,亦即讓多個處理器共用一個快取(Cache),以避免資源閒置,並加快系統效能。但由於每個工作(Task)在執行時對於存取快取的需求各有不同(此處所述之「工作」,依習知資訊領域之定義,係指包含一組相關聯的動作,而能完成特定功能的程式),如果共用同一個快取的該些處理器為了執行各自所負責的工作,而過度密集地存取快取,很可能會發生大量的快取誤失(Cache Miss),也就是處理器在快取裡找不到執行工作所需的資料,只能多費時間從記憶體裡搬移資料,造成每週期執行指令數(Instructions Per Cycle,IPC)降低,有礙系統效能。因此,在多核心系統裡,如何盡量避免多個處理器對其所共用的快 取進行資源搶奪,遂成為一重要課題。Due to the limitation of semiconductor manufacturing process, physical characteristics and heat dissipation technology, the computing speed of the processor is unlikely to have a significant breakthrough in the short term. At present, the general response of the information industry is to use multiple processors (commonly referred to in the market). The concept of "core" integrated into a single device has been widely used in a variety of electronic products ranging from smart phones to high-end servers. In the case of a Chip Multi-Processor (CMP) product, multiple processors are included, so multiplex processing can be performed. Most systems with multiple processors generally use the last-level cache. (Last Level Cache, LLC) technology, which allows multiple processors to share a cache to avoid resource idleness and speed up system performance. However, since each task (Task) has different requirements for accessing the cache during execution (the "work" described here, according to the definition of the field of knowledge, it refers to the inclusion of a set of associated actions, and A program that can perform a specific function. If the processors sharing the same cache access the cache in an excessively dense manner in order to perform their respective tasks, it is likely that a large number of cache misses (Cache Miss) will occur. That is, the processor can't find the data needed to perform the work in the cache, and can only take time to move the data from the memory, resulting in a decrease in the number of executions per cycle (Instructions Per Cycle, IPC), which hinders system performance. . Therefore, in a multi-core system, how to avoid multiple processors to share as fast as possible Taking resources to snatch it has become an important issue.
解決上述問題的最根本辦法,便是尋求一分配工作予各個處理器執行之較佳方法,目標是能使共用同一快取的多個處理器在執行各自負責的工作時,各個工作對快取的存取需求能夠盡可能得到滿足。習知的工作分配方法之一,係以工作的工作集大小(Working Set Size,WSS)做為分配依據,使得共用同一快取的多個處理器所負責執行的所有工作之工作集大小總和不超過該快取的容量,也就能減少快取誤失發生的可能;另一種方法則是將每指令誤失次數(Misses Per Instruction,MPI)較高的工作分散交由使用不同快取的處理器執行,以平均攤分這些工作對系統效能的不良影響,但這兩種方法皆無法真實反映各個工作對存取快取的需求程度,所以效果並不好。The most fundamental way to solve the above problems is to seek a better way to distribute work to each processor. The goal is to enable multiple processors sharing the same cache to perform cache operations for each job. The access requirements can be met as much as possible. One of the conventional methods of work distribution is to use the Working Set Size (WSS) as the basis for distribution, so that the total size of the working set of all the work that is shared by multiple processors sharing the same cache is not Exceeding the cache capacity can also reduce the possibility of cache misses; another method is to distribute the work with higher misses per instruction (Misses Per Instruction (MPI) to different caches. The implementation of the device to evenly distribute the adverse effects of these tasks on system performance, but neither of these methods can truly reflect the degree of demand for access to the cache, so the effect is not good.
另,任何工作在交由處理器執行時,一定會對該處理器所使用的快取有存取的需求,因此多少會受到對同一快取也有存取需求的其他工作所干擾,執行時間因而拉長;而不同工作抵抗其他工作干擾的能力並非全然相同,故可將「抵抗其他工作干擾的能力」視為工作的一種性質,亦即所謂的抗擾力(Anti-interference Ability)。簡單來說,抗擾力愈強的工作,就愈不易受到其他工作的影響,因此受影響而被拉長的執行時間也就愈短。目前習知的工作分配方法中,也有以工作的抗擾力為依據之做法,比如將具有強抗擾力的工作和具有弱抗擾力的工作分別分配給共用同一快取的不同處理器,直觀上這樣應能減少資源搶奪的現象發生;但事實上,工作的抗擾力強弱與否,與工作對其他工作的干擾能力(稱之為擾它力,Interference Ability)並無關聯;也就是說,具有強抗擾力的工作並不見得就比較容易干擾其他工作,此方法並未考量到這一點。雖然目前也有同時考量工 作的抗擾力以及擾它力之作法,不過需要額外使用特殊的二進制測試工具,不能立即適用於各式系統。In addition, any work that is performed by the processor must have access to the cache used by the processor, so that it is somewhat interfered with by other jobs that have access requirements for the same cache. The ability to resist different work disturbances is not the same, so the ability to resist other work disturbances can be regarded as a nature of work, the so-called Anti-interference Ability. In short, the harder the work, the less likely it is to be affected by other work, so the shorter the execution time that is affected and stretched. At present, the conventional work distribution method is also based on the work immunity, such as the work with strong immunity and the work with weak immunity are respectively allocated to different processors sharing the same cache. Intuitively, this should reduce the phenomenon of resource grabbing; but in fact, the strength of the work is not related to the ability of the work to interfere with other work (called Interference Ability); It is said that work with strong anti-interference is not easy to interfere with other work, this method does not consider this. Although there are also simultaneous considerations Immunity and disturbances, but the need to use special binary testing tools, can not be immediately applied to a variety of systems.
當然,還有許多學者提出各種不同的方法,像是將具有高快取誤失率的工作和具有低快取誤失率的工作分別分配給共用同一快取的處理器,或是避免將同樣具有低抗擾力的工作分配給共用同一快取的處理器。但前者需要在工作數量與處理器數量相同的情況下,後者則需要系統採用先進先出(First In First Out,FIFO)的排程方式處理工作,才能得到較佳的效果,但多核心系統的實際運作方式並非皆是如此,所以這兩種方法只能當成實驗室裡的初步研究,無法實際應用。Of course, there are many scholars who propose different methods, such as assigning work with high cache miss rate and work with low cache miss rate to processors sharing the same cache, or avoiding the same Work with low immunity is assigned to processors that share the same cache. However, the former needs to have the same number of jobs and the number of processors, while the latter requires the system to use First In First Out (FIFO) scheduling to process the work, in order to get better results, but the multi-core system This is not the case with the actual operation, so these two methods can only be used as preliminary research in the laboratory and cannot be applied in practice.
有鑑於此,本發明之目的在於提供一種應用於多核心系統的工作分配方法,能有效減少多個處理器對所共用的快取進行資源搶奪,其效果優於習知的各種方法,不受系統的排程方式限制,且能處理工作數量超出處理器許多的情況。In view of this, the object of the present invention is to provide a work distribution method applied to a multi-core system, which can effectively reduce resource robbing of a shared cache by multiple processors, and the effect thereof is superior to the conventional methods. The scheduling of the system is limited and can handle a lot of work than the processor.
緣以達成上述目的,本發明所提供應用於多核心系統的工作分配方法,該多核心系統包括有複數個快取(Cache)及複數個處理器群,該些處理器群各包含有至少二處理器,各該快取皆由一該處理器群內的該些處理器所共同使用;該工作分配方法係用於將複數個工作(Task)分配予各該處理器群內的該些處理器執行,且各該處理器可接受一個以上工作之分配;此處定義各該工作皆具有抗擾力(Anti-interference Ability)及擾它力(Interference Ability)二種性質,其中任一該工作的抗擾力定義為該工作在交由一該處理器群內的一該處理器執行時,該工作抵抗同一該處理 器群內的其他處理器所執行的其他工作對其執行時間的影響之能力,任一該工作的擾它力定義為該工作在交由一該處理器群內的一該處理器執行時,該工作影響同一該處理器群內的其他處理器所執行的其他工作的執行時間之能力;本工作分配方法包括有下列步驟:A、評估各該工作的抗擾力及擾它力,並同時以各該工作的抗擾力與擾它力為依據,將該些工作分成複數個工作群,且該些工作群的數量與該些處理器群內的該些處理器之總和數量相等;B、逐一將各該工作群分配予各該處理器群內的各該處理器,直到各該處理器群內的各該處理器皆分配有一該工作群為止。In order to achieve the above object, the present invention provides a work distribution method for a multi-core system, the multi-core system including a plurality of caches and a plurality of processor groups, each of the processor groups including at least two a processor, each of the caches being commonly used by the processors in the processor group; the work distribution method is for allocating a plurality of tasks to the processes in each of the processor groups Executing, and each processor can accept more than one assignment of work; each of the tasks defined herein has two properties: anti-interference ability and interference potential, and any of the tasks The immunity is defined as the work resists the same process when the work is performed by a processor within the processor group. The ability of other processors within the cluster to perform an impact on their execution time, any of which is defined as the operation being performed by a processor within a processor group. The work affects the execution time of other work performed by other processors within the same processor group; the work distribution method includes the following steps: A. Assessing the immunity and the disturbance of each work, and simultaneously Based on the immunity and the disturbance of each work, the work is divided into a plurality of work groups, and the number of the work groups is equal to the total number of the processors in the processor groups; Each of the working groups is allocated to each of the processors in each of the processor groups one by one until each of the processors in the processor group is assigned a working group.
本發明另提供另一種應用於多核心系統的工作分配方法,該多核心系統包括有複數個快取(Cache)及複數個處理器群,該些處理器群各包含有至少二處理器,各該快取皆由一該處理器群內的該些處理器所共同使用;該工作分配方法係用以將複數個工作(Task)分配予各該處理器群內的該些處理器執行,且各該處理器可接受一個以上工作之分配;此處定義各該工作皆具有抗擾力(Anti-interference Ability)及擾它力(Interference Ability)二種性質,其中任一該工作的抗擾力定義為該工作在交由一該處理器群內的一該處理器執行時,該工作抵抗同一該處理器群內的其他處理器所執行的其他工作對其執行時間的影響之能力,任一該工作的擾它力定義為該工作在交由一該處理器群內的一該處理器執行時,該工作影響同一該處理器群內的其他處理器所執行的其他工作的執行時間之能力;本工作分配方法包括有下列步驟:A、評估各該工作之抗擾力,係分別將各該工作與一進攻工作交由一該處理器群內的二該處理器執行,量測得到各該工作執行完畢所需的一第一執行時間,另量測各該工作在單獨交由一該處理器群內的一該處理器執行完畢 所需的一第二執行時間;對各該工作而言,該第一執行時間與該第二執行時間具有一差值,該差值與該第二執行時間的比值即為各該工作的抗擾力,其中該進攻工作為一受執行時會積極搶用快取資源的工作;B、評估各該工作之擾它力,係分別將各該工作與一防禦工作交由一該處理器群內的二該處理器執行,量測得到各該工作執行完畢所需的一第三執行時間,另量測該防禦工作在單獨交由一該處理器群內的一該處理器執行完畢所需的一第四執行時間;對各該工作而言,該第三執行時間與該第四執行時間具有一差值,該差值與該第四執行時間的比值即為各該工作的擾它力;其中該防禦工作為一受執行時會規律且穩定使用快取資源的工作;C、分別加總各該工作的抗擾力與擾它力,定義為各該工作的一減慢因子,將該些工作依各自的該減慢因子遞減排序後,依序分成複數個工作群,且該些工作群的數量與該些處理器群內的該些處理器之總和數量相等;D、逐一將各該工作群分配予各該處理器群內的各該處理器,直到各該處理器群內的各該處理器皆分配有一該工作群為止。The present invention further provides another work distribution method for a multi-core system, the multi-core system includes a plurality of caches and a plurality of processor groups, each of the processor groups including at least two processors, each The cache is used by a plurality of processors in the processor group; the work distribution method is configured to allocate a plurality of tasks to the processors in each of the processor groups, and Each of the processors can accept more than one assignment of work; each of the tasks defined herein has two properties, an anti-interference ability and an interference capability, and the immunity of any of the operations is defined. Defined as the ability of the work to resist the effects of other work performed by other processors within the same processor group on its execution time when executed by a processor within the processor group, either The scrambling force of the job is defined as the execution time of the other work performed by other processors within the same processor group when the work is performed by a processor within the processor group. The work distribution method includes the following steps: A. Assessing the immunity of each work, each of the work and an offensive work are respectively performed by a processor in the processor group, and the measurement is performed. Each of the first execution time required for the execution of the work is performed, and each of the operations is separately performed by a processor in the processor group. a second execution time required; for each of the work, the first execution time and the second execution time have a difference, and the ratio of the difference to the second execution time is an resistance of each work Disturbance, wherein the offensive work is a work that actively robs the cache resources when it is executed; B. Evaluate the disruptive power of each work, and each of the work and a defense work are assigned to the processor group. The second processor executes, measures a third execution time required for each work to be completed, and measures the defense work to be performed by a processor in the processor group. a fourth execution time; for each of the work, the third execution time and the fourth execution time have a difference, and the ratio of the difference to the fourth execution time is the disturbance of each work The defense work is a work that will regularly and stably use the cache resources when executed; C, respectively, the disturbance immunity and the disturbance power of each work are defined as a slowing factor of each work, The work is sorted in descending order of the respective slowing factors And sequentially dividing into a plurality of working groups, and the number of the working groups is equal to the total number of the processors in the processor groups; D, assigning each working group to each of the processor groups one by one Each of the processors until each of the processors in the processor group is assigned a working group.
藉此,本發明能夠使得多核心系統裡的多個處理器在執行工作時,能減少搶奪共用快取的資源,進而提升系統效能。Thereby, the present invention can enable multiple processors in the multi-core system to reduce the resources for robbing the shared cache when performing work, thereby improving system performance.
圖1係本發明一較佳實施例之流程圖;圖2係本發明實際應用的示範示意圖;圖3係本發明實際應用的另一示範示意圖。1 is a flow chart of a preferred embodiment of the present invention; FIG. 2 is an exemplary schematic view of a practical application of the present invention; and FIG. 3 is another exemplary schematic diagram of a practical application of the present invention.
為能更清楚地說明本發明,茲舉較佳實施例並配合圖示詳細說明如後,本發明應用於多核心系統的工作分配方法所適用的該多核心系統包括有複數個快取(Cache)及複數個處理器群,各該快取皆對應至一該處理器群,而各該處理器群皆包含有至少二處理器,亦即,各該快取皆由一該處理器群內的該些處理器所共同使用,合先敘明。In order to explain the present invention more clearly, the preferred embodiment is described in detail with reference to the accompanying drawings. The multi-core system to which the present invention is applied to the work distribution method of the multi-core system includes a plurality of caches (Cache). And a plurality of processor groups, each of the caches corresponding to a processor group, and each of the processor groups includes at least two processors, that is, each of the caches is included in a processor group These processors are used together and are described first.
本發明係用以將複數個工作(Task)分配予各該處理器群內的該些處理器執行,而各該處理器可以受分配有一個以上的工作。請參照圖1,本發明係藉由評估各該工作的抗擾力(Anti-interference Ability)及擾它力(Interference Ability)這二種性質,做為分配各該工作的依據。此處定義任一該工作的抗擾力,指的是該工作在交由一該處理器群內的一該處理器執行時,該工作抵抗同一該處理器群內的其它處理器所執行的其他工作對其執行時間的影響之能力;而任一該工作的擾它力,指的則是該工作在交由一該處理器群內的一該處理器執行時,該工作影響同一該處理器群內的其他處理器所執行的其他工作執行時間之能力。簡而言之,抗擾力愈強的工作,其執行時間愈不容易受到其他工作受執行時的影響;而擾它力愈強的工作,受執行時就愈容易影響其他工作的執行時間。The present invention is directed to assigning a plurality of tasks to the processors within each of the processor groups, and each of the processors can be assigned more than one job. Referring to FIG. 1, the present invention is based on evaluating the two properties of the anti-interference ability and the interference capability of each of the tasks. Determining the immunity of any of the operations herein means that the operation is performed by another processor within the same processor group when executed by a processor within the processor group. The ability of other work to affect its execution time; and the disturbing force of any such work means that the work affects the same process when it is executed by a processor within a processor group. The ability of other processors within the cluster to perform other work execution times. In short, the more robust the work, the less likely its execution time will be affected by the execution of other work; the more disruptive the work, the easier it will be to affect the execution time of other work.
本較佳實施例中,各該工作的抗擾力之評估方法,係分別將各該工作與一進攻工作交由一該處理器群內的二該處理器執行,量測各該工作執行完畢所需的一第一執行時間,另行量測各該工作在單獨交由一該處理群內的一該處理器執行完畢所需的一第二執行時間;換言之,各該工作執行完畢所需的該第二執行時間,即為各該工作在不受到其他工作及該進攻工作干擾的情況下,由一該處理器執行完畢所 需的時間。In the preferred embodiment, the method for evaluating the immunity of each of the operations is performed by respectively assigning each of the work and an offensive work to a processor in the processor group, and measuring the execution of each work. Determining a first execution time, separately measuring a second execution time required for each of the work to be performed by a processor in the processing group; in other words, each of the work required to complete the work The second execution time, that is, the execution of each of the work is performed by a processor without being interfered by other work and the offensive work. Time required.
該進攻工作為一積極搶用快取資源的工作,表1所示的虛擬碼為該進攻工作的一種示範,但並不以此為限。由虛擬碼第2至第7行可見,該進攻工作係執行複數次迴圈(Loop),每一次迴圈內皆進行二個動作,首先亂數產生三個無規律的隨機值x 1 、x 2 及x 3 (第3至第5行),再以這三個隨機值為參照,在一個二維陣列(Array)B 中取出位於座標[x 2 ][x 3 ]及[x 1 ][x 3 ]的值加總後,存入二維陣列B的座標[x 1 ][x 2 ]之處(第6行);由於x 1 、x 2 及x 3 之值皆為亂數產生,每一次迴圈內所產生的值可能差異極大,且請參照表1虛擬碼第1行,可知該進攻工作所使用的二維陣列B 之大小達400MB,遠遠超過目前常見的快取容量,使得負責執行該進攻工作的該處理器必須不停從記憶體搬移資料存入該處理器所使用的該快取,造成屬於同一該處理器群的另一該處理器所執行的一該工作之執行時間受到拖累。故,對各該工作而言,該第一執行時間必定較該第二執行時間更長,兩者間具有一差值,而該差值與該第二執行時間的比值,即定義為各該工作的抗擾力。The offensive work is a work of actively absorbing resources. The virtual code shown in Table 1 is a demonstration of the offensive work, but it is not limited to this. It can be seen from the second to seventh rows of the virtual code that the offensive work performs a plurality of loops, and each action performs two actions in each loop, firstly generating three irregular random values x 1 , x in random numbers. 2 and x 3 (lines 3 to 5), with reference to these three random values, in a two-dimensional array (Array) B , taken out at coordinates [ x 2 ][ x 3 ] and [ x 1 ][ After the values of x 3 ] are added together, they are stored in the coordinates [ x 1 ][ x 2 ] of the two-dimensional array B (line 6); since the values of x 1 , x 2 and x 3 are all random numbers, The value generated in each loop may vary greatly, and please refer to the first line of the virtual code in Table 1. It can be seen that the size of the two-dimensional array B used in the offensive work is 400MB, far exceeding the current common cache capacity. ???the processor responsible for performing the offensive work must continuously store the data from the memory into the cache used by the processor, causing a work performed by another processor belonging to the same processor group. Execution time is dragged down. Therefore, for each work, the first execution time must be longer than the second execution time, and there is a difference between the two, and the ratio of the difference to the second execution time is defined as each Immunity to work.
簡而言之,在本較佳實施例中,各該工作的抗擾力,代表了各該工作交由一該處理器群內的一該處理器執 行時,其所需的執行時間受到屬於同一該處理器群內的另一該處理器執行的該進攻工作影響而被拉長的比例。Briefly, in the preferred embodiment, the immunity of each of the operations represents that the work is performed by a processor in the processor group. In the case of a line, the required execution time is stretched by the impact of the offensive work performed by another processor within the same processor group.
本較佳實施例中,各該工作的擾它力之評估方法,係分別將各該工作與一防禦工作交由一該處理器群內的二該處理器執行,量測各該工作執行完畢所需的一第三執行時間,另量測該防禦工作在單獨交由一該處理器執行完畢所需的一第四執行時間;換言之,該防禦工作執行完畢所需的該第四執行時間,即為該防禦工作在不受其他工作干擾的情況下,由一該處理器執行完畢所需的時間。In the preferred embodiment, the method for evaluating the disturbance of each work is performed by each of the two processors in the processor group, and each of the work is performed. a third execution time required, and measuring a fourth execution time required for the defense work to be performed by a single processor; in other words, the fourth execution time required for the defense work to be completed, That is, the time required for the defense work to be completed by a processor without being disturbed by other work.
該防禦工作為一受執行時會規律且穩定使用快取資源的工作,表2所示的虛擬碼為該防禦工作的一種示範,但並不以此為限。The defense work is a work that regularly and stably uses the cache resource when executed. The virtual code shown in Table 2 is a demonstration of the defense work, but is not limited thereto.
由表2虛擬碼第2至第9行可見,該防禦工作係執行複數次迴圈,在每一次迴圈內依一簡單規則修改一個二維陣列B 的元素值,表2虛擬碼第1行將該防禦工作所使用的二維陣列B 初始化為2M大小,於本較佳實施例中恰等於一該快取的容量。故,在一該處理器執行該防禦工作時,如果屬於同一該處理器群的其他處理器並無執行其他工作,那麼就沒有必要由記憶體搬移資料進入該處理器所使用的該快取,該第四執行時間就是該防禦工作理論上執行所需 的最短時間。所以,對各該工作而言,該第三執行時間必定較該第四執行時間更長,兩者間具有一差值,而該差值與該第四執行時間的比值,即定義為各該工作的擾它力。It can be seen from the 2nd to 9th rows of the virtual code of Table 2 that the defense work performs a plurality of loops, and the element value of a two-dimensional array B is modified according to a simple rule in each loop, and the first row of the virtual code of Table 2 The two-dimensional array B used for the defense work is initialized to a 2M size, which in the preferred embodiment is exactly equal to the capacity of the cache. Therefore, when the processor performs the defense work, if other processors belonging to the same processor group do not perform other work, there is no need to move the data from the memory into the cache used by the processor. The fourth execution time is the minimum time required for the theoretical implementation of the defense work. Therefore, for each of the work, the third execution time must be longer than the fourth execution time, and there is a difference between the two, and the ratio of the difference to the fourth execution time is defined as each The work is disturbing.
簡而言之,在本較佳實施例中,各該工作的擾它力,代表了各該工作交由一該處理器群內的一該處理器執行時,對同一該處理器群內的另一該處理器所執行的該防禦工作所需的執行時間造成影響,而致使其執行時間被拉長的比例。Briefly, in the preferred embodiment, the scrambling force of each of the operations represents that each of the tasks is performed by a processor within the processor group, and is within the same processor group. The execution time required for the defensive work performed by the other processor affects the proportion of execution time that is stretched.
本發明在評估各該工作的抗擾力及擾它力之後,便依據各該工作的抗擾力及擾它力,將該些工作分成複數個工作群,且該些工作群的數量等同於該多核心系統內所有處理器的數量。為求平均,任二個工作群內所包含的該些工作,其數量差異不超過一;至於各該工作群內應包含有多少個工作,端視該些工作的總數及預計分成的該些工作群之數量而定,此為簡單的數學運算,於此不再贅述。After evaluating the immunity and the disturbance of each work, the present invention divides the work into a plurality of work groups according to the immunity and the disturbance of each work, and the number of the work groups is equal to The number of all processors in this multi-core system. For averaging, the number of jobs included in any two work groups differs by no more than one; as to how many jobs should be included in each work group, depending on the total number of jobs and the expected share of the work Depending on the number of groups, this is a simple mathematical operation and will not be described here.
該些工作群的數量,之所以必須與該多核心系統所有處理器的數量相同,是因為本發明應用於多核心系統的工作分配方法最終會將該些工作群一對一地分配給各該處理器;換句話說,在分配完成後,各該處理器皆可負責執行一該工作群內的該些工作,詳細的分配步驟及說明請容後再敘。The number of these work groups must be the same as the number of all processors of the multi-core system, because the work distribution method applied to the multi-core system of the present invention will eventually allocate the work groups one-to-one to each of the work groups. The processor; in other words, after the allocation is completed, each of the processors can be responsible for performing the work in the working group. The detailed allocation steps and descriptions are described later.
本較佳實施例所使用的分類依據,係分別加總各該工作的抗擾力及擾它力,定義為各該工作的一減慢因子。請參照圖2,為本較佳實施例的一個應用示範,圖2(a)中有A~G七個工作待分配,假設經前述步驟評估各該工作的抗擾力及擾它力,且加總得到各該工作的該減慢因子後,該些工作依其減慢因子遞減排序後得到如圖2(b)所示的順序E、G、A、D、B、F、C,各該工作的該減慢因子由左 至右遞減;如圖2(d)所示,本較佳實施例所應用的該多核心系統具有二快取,各該快取皆由二處理器所共同使用,(即,共同使用一該快取的二該處理器可視為一處理器群),所以該多核心系統總共有四個處理器,該些工作依該些處理器的數量,依據各該工作的該減慢因子之排序而分成如圖2(c)所示的四個工作群(E、G)、(A、D)、(B、F)、(C),由圖中可見,愈左邊的工作群,所包含的該些工作具有愈大的該減慢因子。The classification basis used in the preferred embodiment is to add up the interference immunity and the disturbance power of each of the work, respectively, and define it as a slowing down factor for each work. Please refer to FIG. 2 , which is an application example of the preferred embodiment. In FIG. 2( a ), there are seven jobs A to G to be allocated, and it is assumed that the anti-interference force and the disturbance power of each work are evaluated by the foregoing steps, and After summing up the slowing down factors of the work, the work is sorted according to the decreasing factor of the slowing factor to obtain the order E, G, A, D, B, F, C as shown in Fig. 2(b). The slowing factor for this work is left As shown in FIG. 2(d), the multi-core system to which the preferred embodiment is applied has two caches, each of which is commonly used by two processors (ie, a common use) The cached processor can be regarded as a processor group, so the multi-core system has a total of four processors, and the work depends on the number of the processors, according to the order of the slowing factors of the work. Divided into four working groups (E, G), (A, D), (B, F), (C) as shown in Figure 2 (c), as can be seen from the figure, the more left working group, including The work has a greater slowing factor.
本發明應用於多核心系統的工作分配方法由包含有具較大減慢因子之工作的工作群優先,逐一將各該工作群分配予各該處理器群的一該處理器。以圖2為例,最先接受分配的一該工作群即為圖2(c)中位於最左邊的工作群(E、G),接下來是工作群(A、D),然後是工作群(B、F),最後是工作群(C)。The work distribution method of the present invention applied to a multi-core system is prioritized by a work group including work having a large slowdown factor, and each of the work groups is assigned to each of the processors of the processor group one by one. Taking Figure 2 as an example, the first work group to accept the assignment is the leftmost work group (E, G) in Figure 2(c), followed by the work group (A, D), then the work group. (B, F), and finally the working group (C).
將各該工作群分配給各該處理群內的各該處理器之方法,大致上具有二個步驟。為求讓該多核心系統內的各該快取得到最有效的利用,首先輪流將各該工作群分配予各該處理器群內的一該處理器,直到各該處理器群內皆有一該處理器分配有一該工作群為止。再次以圖2為例,如圖2(d)所示,工作群(E、G)先被分配予位於左側的該處理器群內的一該處理器,接著工作群(A、D)被分配予位於右側的該處理器群內的一該處理器,使該多核心系統內的各該處理器群皆有一該處理器分配有一該工作。需注意的是,雖此處之示範是由左至右分配到各該處理器群,但由於各該處理器群的效能事實上皆相同,此方向性並非本發明的限制所在,於其他實施例中當然也可以由右至左,或甚至在具有更多處理器群的另一多核心系統內,亦可以有更多變化的分配方式,只要確保在依序分配該些工作群的過程中,該 些處理器群內皆會有一該處理器分配有一該工作群即可。The method of assigning each of the work groups to each of the processors in the processing group has substantially two steps. In order to obtain the most efficient utilization of each of the multi-core systems, each of the working groups is first allocated to each processor in the processor group in turn until there is one in each of the processor groups. The processor is assigned a working group. Taking FIG. 2 as an example, as shown in FIG. 2(d), the working group (E, G) is first allocated to a processor in the processor group located on the left side, and then the working group (A, D) is A processor is allocated to the processor group located on the right side, such that each of the processor groups in the multi-core system has a processor assigned the work. It should be noted that although the examples herein are allocated to each processor group from left to right, since the performance of each processor group is virtually the same, this directionality is not a limitation of the present invention, and other implementations. In the example, of course, from right to left, or even in another multi-core system with more processor groups, there may be more varied allocation methods, as long as the process of assigning the working groups in sequence is ensured. , the There will be one in the processor group to allocate the working group.
據此,該些處理器群必然具有一受分配的先後順序(如上述例中的由左至右),此處定義為一分配次序。尚未分配的該些工作群優先分配予位於該分配次序末端的一該處理器群,待該處理器群內的所有處理器皆分配有一該工作群後,剩餘的該些工作群再依同理沿該分配次序反向分配給其他處理器群內的該些處理器,直到該多核心系統所具有的所有處理器皆分配有一該工作群為止。由於該些工作群的數量與該核心系統的該些處理器之總和數量相同,所以至此該些工作群已各分配至一該處理器,亦即,各該工作群內的該些工作都已分配完成,分別交由一該處理器負責執行。Accordingly, the processor groups necessarily have an assigned sequence (as from left to right in the above example), which is defined herein as an allocation order. The work groups that have not been allocated are preferentially allocated to a processor group located at the end of the allocation sequence. After all the processors in the processor group are assigned the work group, the remaining work groups are reconciled. The processors in the other processor groups are reversely allocated along the allocation order until all processors owned by the multi-core system are assigned a working group. Since the number of the work groups is the same as the total number of the processors of the core system, the work groups have been assigned to the processor, that is, the work in each work group has been The allocation is completed and assigned to a processor for execution.
同樣以圖2的示範為例,該分配次序如前所述為由左至右,在各該處理器群內皆有一該處理器分配有一該工作群之後,剩餘的工作群(B、F)、(C)沿該分配次序反向進行,優先將工作群(B、F)分配予位於右側的該處理器群內的其他處理器(於此例中,在該處理器群內只餘下一處理器);最後再將工作群(C)分配予位於左側的該處理器群內剩下的最後一處理器,所有工作於是分配完成,圖2(d)所示即為最終結果。Also taking the example of FIG. 2 as an example, the allocation order is from left to right as described above, and after each processor group has one of the working groups assigned to the working group, the remaining working groups (B, F) And (C) proceeding in the reverse order of the allocation, preferentially assigning the working group (B, F) to other processors in the processor group located on the right side (in this example, only the remaining one in the processor group) Processor); finally, the work group (C) is assigned to the last remaining processor in the processor group on the left side, and all the work is then assigned, and the final result is shown in Figure 2(d).
請參照圖3,為本較佳實施例的另一個應用示範,圖3(a)中有A~N十四個工作待分配,假設經評估各該工作的抗擾力及擾它力,並加總得到各該工作的該減慢因子後,該些工作依其減慢因子由左至右遞減排序成如圖3(b)所示的順序E、I、G、A、B、F、C、J、D、H、N、L、M、K;請見圖3(d),本較佳實施例所應用的該多核心系統具有二快取,各該快取皆由四處理器所共同使用(即,可將共同使用一該快取的該些處理器視為一處理器群),故該多核心系統共具有八處理器,所以該些工作依據各該工作的 該減慢因子之排序而分成如圖3(c)所示的八個工作群(E、I)、(G、A)、(B、F)、(C、J)、(D、H)、(N、L)、(M)、(K),圖中愈靠左邊的工作群,所包含的該些工作之該減慢因子就愈大。Please refer to FIG. 3, which is another application example of the preferred embodiment. In FIG. 3(a), there are fourteen to four work to be assigned, and it is assumed that the immunity and the disturbance of each work are evaluated, and After summing up the slowing down factors for each of the jobs, the jobs are sorted by left and right according to their slowing factors into the order E, I, G, A, B, F, as shown in Figure 3(b). C, J, D, H, N, L, M, K; see FIG. 3(d), the multi-core system to which the preferred embodiment is applied has two caches, each of which is composed of four processors Used together (that is, the processors that use the cache together are regarded as a processor group), so the multi-core system has a total of eight processors, so the work is based on each work. The slowing factor is sorted into eight working groups (E, I), (G, A), (B, F), (C, J), (D, H) as shown in Fig. 3(c). , (N, L), (M), (K), the work group on the left side of the figure, the greater the slowing factor of the work involved.
依前述的分配步驟,由圖3(c)中最左側開始分配,與圖2提供的上例相同,同樣採由左至右的一分配次序將工作群(E、I)及(G、A)分別分配予各該處理器群的一該處理器,但該分配次序亦並非本發明之限制;此時該多核心系統的各該處理器群內皆有一該處理器分配有一該工作群,接下來沿該分配次序反向進行,將工作群(B、F)、(C、J)、(D、H)分別分配給位於右側的該處理器群內尚未受分配的該些處理器;最後再將工作群(N、L)、(M)、(K)分別分配給位於左側的該處理器群內剩餘的該些處理器。至此,所有工作皆已分配至該些處理器群內的該些處理器,並交由該些處理器負責執行。According to the foregoing allocation step, the allocation is started from the leftmost side in FIG. 3(c), which is the same as the above example provided in FIG. 2, and the working groups (E, I) and (G, A are also taken from the left-to-right allocation order. Each of the processors is allocated to each of the processor groups, but the order of allocation is not a limitation of the present invention; in this case, each processor group of the multi-core system has a processor assigned to the working group. Next, in the reverse of the allocation order, the working groups (B, F), (C, J), (D, H) are respectively allocated to the processors in the processor group located on the right side that have not been allocated; Finally, the work groups (N, L), (M), (K) are respectively assigned to the remaining processors in the processor group located on the left side. At this point, all work has been assigned to the processors in the processor groups and is executed by the processors.
需特別說明的是,在圖2及圖3的示範中,各該工作的數量、排列順序(以及受排列順序而影響的分群結果)純粹做為示範之用,只是為了便於說明及理解而已,並不具任何特殊意義。It should be specially noted that in the examples of FIG. 2 and FIG. 3, the number and arrangement order of each work (and the grouping result affected by the arrangement order) are purely for demonstration purposes, just for convenience of explanation and understanding. It does not have any special meaning.
綜上所述,本發明應用於多核心系統的工作分配方法能夠減少多核心系統裡多個處理器對快取資源的搶奪,對系統效能提升有正面幫助;經實際在具有四個處理器、二快取的一多核心系統上測試,本發明與Linux作業系統習知的預設排程方法比較,節省了43%的總執行時間,每週期執行指令數(IPC)更增加了51%。In summary, the working distribution method of the present invention applied to a multi-core system can reduce the robbing of cache resources by multiple processors in a multi-core system, and has positive effects on system performance improvement; The second cache is tested on a multi-core system. Compared with the conventional scheduling method known in the Linux operating system, the present invention saves 43% of the total execution time, and the number of execution instructions per cycle (IPC) is increased by 51%.
以上所述僅為本發明較佳可行實施例而已,舉凡應用本發明說明書及申請專利範圍所為之等效方法變化,理應包含在本發明之專利範圍內。The above description is only for the preferred embodiments of the present invention, and the equivalent method variations of the present invention and the scope of the patent application are intended to be included in the scope of the present invention.
Claims (12)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
TW102141764A TWI489393B (en) | 2013-11-15 | 2013-11-15 | Applied Assignment Method for Multi - core System |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
TW102141764A TWI489393B (en) | 2013-11-15 | 2013-11-15 | Applied Assignment Method for Multi - core System |
Publications (2)
Publication Number | Publication Date |
---|---|
TW201519102A TW201519102A (en) | 2015-05-16 |
TWI489393B true TWI489393B (en) | 2015-06-21 |
Family
ID=53720970
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
TW102141764A TWI489393B (en) | 2013-11-15 | 2013-11-15 | Applied Assignment Method for Multi - core System |
Country Status (1)
Country | Link |
---|---|
TW (1) | TWI489393B (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1302775A1 (en) * | 2001-10-16 | 2003-04-16 | Italtel s.p.a. | A clock generation system for a prototyping apparatus |
US20060190671A1 (en) * | 2005-02-23 | 2006-08-24 | Jeddeloh Joseph M | Memory device and method having multiple internal data buses and memory bank interleaving |
TW201333829A (en) * | 2011-11-08 | 2013-08-16 | Nvidia Corp | Compute work distribution reference counters |
TW201337769A (en) * | 2004-03-31 | 2013-09-16 | Coware Inc | Resource management in a multicore architecture |
-
2013
- 2013-11-15 TW TW102141764A patent/TWI489393B/en not_active IP Right Cessation
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1302775A1 (en) * | 2001-10-16 | 2003-04-16 | Italtel s.p.a. | A clock generation system for a prototyping apparatus |
TW201337769A (en) * | 2004-03-31 | 2013-09-16 | Coware Inc | Resource management in a multicore architecture |
US20060190671A1 (en) * | 2005-02-23 | 2006-08-24 | Jeddeloh Joseph M | Memory device and method having multiple internal data buses and memory bank interleaving |
TW201333829A (en) * | 2011-11-08 | 2013-08-16 | Nvidia Corp | Compute work distribution reference counters |
Also Published As
Publication number | Publication date |
---|---|
TW201519102A (en) | 2015-05-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Jog et al. | Anatomy of gpu memory system for multi-application execution | |
Ahn et al. | Dynamic virtual machine scheduling in clouds for architectural shared resources | |
JP2010204880A5 (en) | ||
WO2015117565A1 (en) | Methods and systems for dynamically allocating resources and tasks among database work agents in smp environment | |
KR102635453B1 (en) | Feedback-based partitioned task group dispatch for GPUs | |
RU2008138707A (en) | DECLARATIVE MODEL FOR MANAGING PARALLEL PERFORMANCE OF LIGHTWEIGHT PERFORMANCE FLOWS | |
CN101551761A (en) | Method for sharing stream memory of heterogeneous multi-processor | |
JP2008191949A (en) | Multi-core system, and method for distributing load of the same | |
US10198370B2 (en) | Memory distribution across multiple non-uniform memory access nodes | |
Serpa et al. | Optimizing machine learning algorithms on multi-core and many-core architectures using thread and data mapping | |
US20170286168A1 (en) | Balancing thread groups | |
KR20110075296A (en) | Job allocation method on multi-core system and apparatus thereof | |
Su et al. | Critical path-based thread placement for numa systems | |
Gilman et al. | Demystifying the placement policies of the NVIDIA GPU thread block scheduler for concurrent kernels | |
Kijsipongse et al. | Dynamic load balancing on GPU clusters for large-scale K-Means clustering | |
US9489201B2 (en) | Partitioned register file | |
CN108132834A (en) | Method for allocating tasks and system under multi-level sharing cache memory framework | |
US20160098345A1 (en) | Memory management apparatus and method | |
US20170371561A1 (en) | Reallocate memory pending queue based on stall | |
TWI489393B (en) | Applied Assignment Method for Multi - core System | |
Fox et al. | Weighted flowtime on capacitated machines | |
Maggioni et al. | An architecture-aware technique for optimizing sparse matrix-vector multiplication on GPUs | |
KR101755154B1 (en) | Method and apparatus for power load balancing for heterogeneous processors | |
KR20120086999A (en) | Multi Core System and Method for Processing Data | |
KR101109009B1 (en) | A method for parallelizing irregular reduction on explicitly managed memory hierarchy |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
MM4A | Annulment or lapse of patent due to non-payment of fees |