WO2016044980A1 - Thread migration method, apparatus and system - Google Patents

Thread migration method, apparatus and system Download PDF

Info

Publication number
WO2016044980A1
WO2016044980A1 PCT/CN2014/087101 CN2014087101W WO2016044980A1 WO 2016044980 A1 WO2016044980 A1 WO 2016044980A1 CN 2014087101 W CN2014087101 W CN 2014087101W WO 2016044980 A1 WO2016044980 A1 WO 2016044980A1
Authority
WO
WIPO (PCT)
Prior art keywords
cluster
processor core
thread
target
memory access
Prior art date
Application number
PCT/CN2014/087101
Other languages
French (fr)
Chinese (zh)
Inventor
李景超
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to PCT/CN2014/087101 priority Critical patent/WO2016044980A1/en
Priority to CN201480038263.XA priority patent/CN105637483B/en
Publication of WO2016044980A1 publication Critical patent/WO2016044980A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Debugging And Monitoring (AREA)
  • Advance Control (AREA)
  • Hardware Redundancy (AREA)

Abstract

Provided are a thread migration method, apparatus and system, which determine a target processor core suitable for thread migration via a cluster controller according to state information about various processor cores in a target cluster, and sends context information about a thread to the target processor core, thereby implementing thread migration between clusters.

Description

线程迁移方法、装置和系统Thread migration method, device and system 技术领域Technical field
本发明实施例涉及通信技术,尤其涉及一种线程迁移方法、装置和系统。The embodiments of the present invention relate to communication technologies, and in particular, to a thread migration method, apparatus, and system.
背景技术Background technique
在传统的片上多核处理器系统(on-Chip Multiple Processor System,以下简称:CMPs)中,当某个核的线程请求访问的数据不在该核内的缓存(Cache)时,则将数据通过片上网络搬运到该核的缓存中,以便线程进行访问,然而,当线程需要对数据进行连续或频繁的访问时,在核间搬运数据会产生大量的流量开销。In a traditional on-chip multi-processor system (hereinafter referred to as CMPs), when a thread requested by a core thread is not in a cache (Cache), the data is passed through the network on chip. Moved to the core's cache for thread access, however, when threads need continuous or frequent access to data, moving data between cores creates a large amount of traffic overhead.
现有技术中,通过将线程迁移到存储请求访问的数据所在的核中,对数据进行访问,在硬件实现上,线程迁移造成的流量开销远小于数据搬运造成的流量开销。然而,随着片上网络技术的发展,众核架构成为未来面向大数据应用的一个趋势,众核架构以集群(cluster)为架构单位,每个集群中包含多个核,当线程请求访问的数据不在本集群内的核中时,现有技术无法实现集群之间的线程迁移。In the prior art, by migrating a thread to a core in which the data requested to be accessed is stored, the data is accessed. In hardware implementation, the traffic cost caused by thread migration is much smaller than the traffic overhead caused by data handling. However, with the development of on-chip network technology, many-core architecture has become a trend for big data applications in the future. Many core architectures are clusters, each cluster contains multiple cores, and when the thread requests access data. When not in the core of the cluster, the prior art cannot implement thread migration between clusters.
发明内容Summary of the invention
本发明实施例提供一种线程迁移方法、装置和系统,以实现集群之间的线程迁移。Embodiments of the present invention provide a thread migration method, apparatus, and system to implement thread migration between clusters.
本发明实施例第一方面提供一种线程迁移方法,应用于众核架构的集群系统中,所述众核架构的集群系统至少包含两个集群,所述两个集群之间通过M个集群控制器相连,每个所述集群中包含若干个处理器核,每个所述集群控制器用于监控所直接连接的集群中的若干个处理器核的状态信息,所述M大于等于1且为整数,所述方法包括:A first aspect of the embodiments of the present invention provides a thread migration method, which is applied to a cluster system of a many-core architecture, where the cluster system of the many-core architecture includes at least two clusters, and the two clusters are controlled by M clusters. Each of the clusters includes a plurality of processor cores, each of the cluster controllers for monitoring status information of a plurality of processor cores in the directly connected cluster, wherein the M is greater than or equal to 1 and is an integer. , the method includes:
集群控制器接收源处理器核发送的线程的上下文信息和目标集群的标识,其中,所述线程的上下文信息包括:程序计数器值和寄存器值;The cluster controller receives the context information of the thread sent by the source processor core and the identifier of the target cluster, where the context information of the thread includes: a program counter value and a register value;
所述集群控制器若判断所述目标集群的标识为所述集群控制器直接 连接的集群的标识,所述集群控制器根据所述目标集群中的各处理器核的状态信息,确定所述线程迁移的目标处理器核;If the cluster controller determines that the identifier of the target cluster is directly the cluster controller An identifier of the connected cluster, the cluster controller determining, according to state information of each processor core in the target cluster, the target processor core of the thread migration;
所述集群控制器将所述线程的上下文信息发送到所述目标处理器核。The cluster controller sends context information of the thread to the target processor core.
结合第一方面,在第一种可能的实现方式中,所述集群控制器根据所述目标集群中各处理器核的状态信息,确定所述线程迁移的目标处理器核,包括下述任一种方式:With reference to the first aspect, in a first possible implementation, the cluster controller determines, according to state information of each processor core in the target cluster, the target processor core of the thread migration, including any one of the following Ways:
所述集群控制器根据所述目标集群中各处理器核的访存指令队列的状态信息,确定所述线程迁移的目标处理器核;Determining, by the cluster controller, the target processor core of the thread migration according to status information of the memory access instruction queue of each processor core in the target cluster;
所述集群控制器根据所述目标集群中各处理器核的线程上下文的存储单元的状态信息,确定所述线程迁移的目标处理器核;Determining, by the cluster controller, the target processor core of the thread migration according to state information of a storage unit of a thread context of each processor core in the target cluster;
所述集群控制器根据所述目标集群中各处理器核的访存指令队列的状态信息和线程上下文的存储单元的状态信息,确定所述线程迁移的目标处理器核。The cluster controller determines the target processor core of the thread migration according to the state information of the memory access instruction queue of each processor core in the target cluster and the state information of the storage unit of the thread context.
结合第一方面的第一种可能的实现方式,在第二种可能的实现方式中,所述访存指令队列的状态信息包括所述访存指令队列中所有访存指令的个数;With reference to the first possible implementation manner of the first aspect, in a second possible implementation manner, the status information of the memory access instruction queue includes the number of all memory access instructions in the memory access instruction queue;
所述集群控制器根据所述目标集群中各处理器核的访存指令队列的状态信息,确定所述线程迁移的目标处理器核,包括:Determining, by the cluster controller, the target processor core of the thread migration according to the state information of the memory access instruction queue of each processor core in the target cluster, including:
所述集群控制器根据所述目标集群中各处理器核的访存指令队列中所有访存指令的个数,确定访存指令队列中所有访存指令的个数最少的处理器核为线程迁移的目标处理器核。The cluster controller determines, according to the number of all memory access instructions in the memory access instruction queue of each processor core in the target cluster, that the processor core with the least number of memory access instructions in the memory queue is the thread migration Target processor core.
结合第一方面的第一种可能的实现方式,在第三种可能的实现方式中,所述线程上下文的存储单元包含若干表项,每个所述表项用于存储一个线程的上下文信息,所述线程上下文的存储单元的状态信息包括所述线程上下文的存储单元中空余表项的个数;In conjunction with the first possible implementation of the first aspect, in a third possible implementation, the storage unit of the thread context includes a plurality of entries, each of the entries is used to store context information of a thread. The state information of the storage unit of the thread context includes the number of empty storage entries of the storage unit of the thread context;
所述集群控制器根据所述目标集群中各处理器核的线程上下文的存储单元的状态信息,确定所述线程迁移的目标处理器核,包括:Determining, by the cluster controller, the target processor core of the thread migration according to the state information of the storage unit of the thread context of each processor core in the target cluster, including:
所述集群控制器根据所述目标集群中各处理器核的线程上下文的存储单元中空余表项的个数,确定线程上下文的存储单元中空余表项的个数最多的处理器核为线程迁移的目标处理器核。 Determining, by the cluster controller, the processor core with the largest number of empty remaining entries in the storage unit of the thread context according to the number of empty storage entries of the storage unit of the thread context of each processor core in the target cluster Target processor core.
结合第一方面的第一种可能的实现方式,在第四种可能的实现方式中,所述集群控制器根据所述目标集群中各处理器核的访存指令队列的状态信息和线程上下文的存储单元的状态信息,确定所述线程迁移的目标处理器核,包括:With reference to the first possible implementation manner of the first aspect, in a fourth possible implementation, the cluster controller is configured to: according to status information of a memory access queue of each processor core in the target cluster and a thread context The status information of the storage unit determines the target processor core of the thread migration, including:
所述集群控制器为所述访存指令队列中所有访存指令的个数分配第一权重系数,所述集群控制器为所述线程上下文的存储单元中空余表项的个数分配第二权重系数;The cluster controller allocates a first weight coefficient for the number of all the memory access instructions in the memory access instruction queue, and the cluster controller allocates a second weight to the number of the remaining space entries of the storage unit of the thread context. coefficient;
所述集群控制器对所述目标集群中各处理器核的访存指令队列中所有访存指令的个数和所述线程上下文的存储单元中空余表项的个数进行加权求和;The cluster controller weights and sums the number of all fetch instructions in the fetch instruction queue of each processor core in the target cluster and the number of storage unit empty residual entries in the thread context;
所述集群控制器根据所述加权求和结果确定所述线程迁移的目标处理器核。The cluster controller determines the target processor core of the thread migration according to the weighted summation result.
结合第一方面或第一方面的第一种至第四种可能的实现方式中任一种可能的实现方式,在第五种可能的实现方式中,所述集群控制器根据所述目标集群中的各处理器核的状态信息,确定所述线程迁移的目标处理器核之前,还包括:With reference to the first aspect, or any one of the first to the fourth possible implementation manners of the first aspect, in a fifth possible implementation, the cluster controller is configured according to the target cluster Before determining the target processor core of the thread migration, the status information of each processor core further includes:
所述集群控制器针对所述目标集群的每个处理器核,通过两个计数器分别记录所述处理器核的访存指令队列中所有访存指令的个数和所述线程上下文的存储单元中空余表项的个数。The cluster controller records, for each processor core of the target cluster, the number of all memory access instructions in the memory access instruction queue of the processor core and the storage unit of the thread context by two counters. The number of free entries.
本发明实施例第二方面提供一种线程迁移装置,应用于众核架构的集群系统中,所述众核架构的集群系统至少包含两个集群,所述两个集群之间通过M个集群控制器相连,每个所述集群中包含若干个处理器核,每个所述集群控制器用于监控所直接连接的集群中的若干个处理器核的状态信息,所述M大于等于1且为整数,所述装置包括:A second aspect of the embodiments of the present invention provides a thread migration apparatus, which is applied to a cluster system of a multi-core architecture, where the cluster system of the many-core architecture includes at least two clusters, and the two clusters are controlled by M clusters. Each of the clusters includes a plurality of processor cores, each of the cluster controllers for monitoring status information of a plurality of processor cores in the directly connected cluster, wherein the M is greater than or equal to 1 and is an integer. The device includes:
接收端口,用于接收源处理器核发送的线程的上下文信息和目标集群的标识,所述线程的上下文信息包括:程序计数器值和寄存器值;a receiving port, configured to receive context information of a thread sent by the source processor core and an identifier of the target cluster, where context information of the thread includes: a program counter value and a register value;
监控器,用于监控所述目标集群中的各处理器核的状态信息;a monitor, configured to monitor status information of each processor core in the target cluster;
处理器,用于若判断目标集群的标识为所直接连接的集群的标识,根据所述目标集群中的各处理器核的状态信息,确定所述线程迁移的目标处理器核; a processor, configured to determine, according to the state information of each processor core in the target cluster, the target processor core of the thread migration, if the identifier of the target cluster is determined to be an identifier of the directly connected cluster;
发送端口,用于将所述线程的上下文信息发送到所述目标处理器核。a sending port, configured to send context information of the thread to the target processor core.
结合第二方面,在第一种可能的实现方式中,所述处理器具体用于根据所述目标集群中各处理器核的访存指令队列的状态信息,确定所述线程迁移的目标处理器核;或者,With reference to the second aspect, in a first possible implementation, the processor is specifically configured to determine, according to state information of a memory access instruction queue of each processor core in the target cluster, a target processor that is migrated by the thread Nuclear; or,
所述处理器具体用于根据所述目标集群中各处理器核的线程上下文的存储单元的状态信息,确定所述线程迁移的目标处理器核;或者,The processor is specifically configured to determine, according to state information of a storage unit of a thread context of each processor core in the target cluster, a target processor core that is migrated by the thread; or
所述处理器具体用于根据所述目标集群中各处理器核的访存指令队列的状态信息和线程上下文的存储单元的状态信息,确定所述线程迁移的目标处理器核。The processor is specifically configured to determine, according to the state information of the memory access instruction queue of each processor core in the target cluster and the state information of the storage unit of the thread context, the target processor core of the thread migration.
结合第二方面的第一种可能的实现方式,在第二种可能的实现方式中,所述访存指令队列的状态信息包括所述访存指令队列中所有访存指令的个数;With reference to the first possible implementation manner of the second aspect, in a second possible implementation manner, the status information of the memory access instruction queue includes the number of all memory access instructions in the memory access instruction queue;
所述处理器具体用于根据所述目标集群中各处理器核的访存指令队列中所有访存指令的个数,确定访存指令队列中所有访存指令的个数最少的处理器核为线程迁移的目标处理器核。The processor is specifically configured to determine, according to the number of all memory access instructions in the memory access instruction queue of each processor core in the target cluster, a processor core having the least number of memory access instructions in the memory queue The target processor core for thread migration.
结合第二方面的第一种可能的实现方式,在第三种可能的实现方式中,所述线程上下文的存储单元包含若干表项,每个所述表项用于存储一个线程的上下文信息,所述线程上下文的存储单元的状态信息包括所述线程上下文的存储单元中空余表项的个数;With reference to the first possible implementation of the second aspect, in a third possible implementation, the storage unit of the thread context includes a plurality of entries, and each of the entries is used to store context information of a thread. The state information of the storage unit of the thread context includes the number of empty storage entries of the storage unit of the thread context;
所述处理器具体用于根据所述目标集群中各处理器核的线程上下文的存储单元中空余表项的个数,确定线程上下文的存储单元中空余表项的个数最多的处理器核为线程迁移的目标处理器核。The processor is specifically configured to determine, according to the number of empty storage entries of the storage unit of the thread context of each processor core in the target cluster, the processor core with the largest number of remaining storage entries in the thread context The target processor core for thread migration.
结合第二方面的第一种可能的实现方式,在第四种可能的实现方式中,所述处理器具体用于为所述访存指令队列中所有访存指令的个数分配第一权重系数,为所述线程上下文的存储单元中空余表项的个数分配第二权重系数;对所述目标集群中各处理器核的访存指令队列中所有访存指令的个数和所述线程上下文的存储单元中空余表项的个数进行加权求和;根据所述加权求和结果确定所述线程迁移的目标处理器核。With reference to the first possible implementation of the second aspect, in a fourth possible implementation, the processor is specifically configured to allocate a first weight coefficient for the number of all the memory access instructions in the memory access instruction queue Allocating a second weight coefficient for the number of storage unit empty table entries of the thread context; the number of all fetch instructions in the fetch instruction queue of each processor core in the target cluster and the thread context The number of empty storage table entries of the storage unit is weighted and summed; and the target processor core of the thread migration is determined according to the weighted summation result.
结合第二方面或第二方面的第一种至第四种可能的实现方式中任一种可能的实现方式,在第五种可能的实现方式中,所述监控器具体用于针 对所述目标集群的每个处理器核,通过两个计数器分别记录所述处理器核的访存指令队列中所有访存指令的个数和所述线程上下文的存储单元中空余表项的个数。With reference to the second aspect or any one of the possible implementations of the first to fourth possible implementations of the second aspect, in a fifth possible implementation, the monitor is specifically configured for a pin For each processor core of the target cluster, record the number of all memory access instructions in the memory instruction fetch queue of the processor core and the memory cell empty space entries of the thread context by two counters. number.
本发明实施例第三方面提供一种线程迁移系统,包括:A third aspect of the embodiments of the present invention provides a thread migration system, including:
至少两个集群,所述两个集群之间通过M个如第二方面的任一种可能的实现方式所述的集群控制器相连,每个所述集群中包含若干个处理器核,每个集群控制器用于监控所直接连接的集群中的若干个处理器核的状态信息,所述M大于等于1且为整数。At least two clusters are connected by the cluster controllers of any one of the possible implementations of the second aspect, each of the clusters comprising a plurality of processor cores, each of the clusters The cluster controller is configured to monitor status information of a number of processor cores in the directly connected cluster, the M being greater than or equal to 1 and being an integer.
本发明实施例提供的线程迁移方法、装置和系统,通过集群控制器根据目标集群中的各处理器核的状态信息,确定一个适合线程迁移的目标处理器核,将线程的上下文信息发送到目标处理器核,从而,实现集群之间的线程迁移。The thread migration method, device and system provided by the embodiment of the present invention determine, by the cluster controller, a target processor core suitable for thread migration according to state information of each processor core in the target cluster, and send thread context information to the target. The processor core, thereby enabling thread migration between clusters.
附图说明DRAWINGS
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the description of the prior art will be briefly described below. Obviously, the drawings in the following description are only It is a certain embodiment of the present invention, and other drawings can be obtained from those skilled in the art without any inventive labor.
图1为本发明众核架构的集群系统一的示意图;1 is a schematic diagram of a cluster system 1 of a multi-core architecture of the present invention;
图2为本发明访存指令队列的格式示意图;2 is a schematic diagram of a format of a memory access instruction queue according to the present invention;
图3为本发明线程上下文的存储单元的格式示意图;3 is a schematic diagram of a format of a storage unit of a thread context according to the present invention;
图4为本发明众核架构的集群系统二的示意图;4 is a schematic diagram of a cluster system 2 of a multi-core architecture of the present invention;
图5为本发明线程迁移方法实施例一的流程示意图;FIG. 5 is a schematic flowchart of Embodiment 1 of a thread migration method according to the present invention;
图6为本发明线程迁移装置实施例一的结构示意图。FIG. 6 is a schematic structural diagram of Embodiment 1 of a thread migration apparatus according to the present invention.
具体实施方式detailed description
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的 范围。The technical solutions in the embodiments of the present invention are clearly and completely described in the following with reference to the accompanying drawings in the embodiments of the present invention. It is obvious that the described embodiments are only a part of the embodiments of the present invention, but not all embodiments. All other embodiments obtained by those of ordinary skill in the art based on the embodiments of the present invention without creative efforts are protected by the present invention. range.
本发明实施例的线程迁移方法应用于众核架构的集群系统中,所谓众核架构的集群系统如图1所示,图1为本发明众核架构的集群系统一的示意图,上述众核架构的集群系统至少包括两个集群(图1中以两个集群示出,分别为集群1和集群2),每个集群中包含若干个处理器核(图1中以3个处理器核示出,分别为处理器核1、处理器核2和处理器核3),每个集群中包含的处理器核的个数可以相同也可以不同;将每个集群与集群控制器相连,具体地可以是所有的集群与同一个集群控制器相连,也可以是一个集群连接一个集群控制器,也可以是多个集群连接一个集群控制器;当一个集群连接一个集群控制器时,或者多个集群连接在一个集群控制器时,所有的集群控制器之间按照一定的拓扑关系连接;也就是,两个集群之间通过M个集群控制器相连,M大于等于1且为整数(图1以M=1示出),每个集群控制器用于监控所直接连接的集群中的若干个处理器核的状态信息。一个集群中的所有处理器核均可访问该集群的地址空间。The thread migration method of the embodiment of the present invention is applied to a cluster system of a multi-core architecture. The cluster system of the so-called multi-core architecture is shown in FIG. 1. FIG. 1 is a schematic diagram of a cluster system 1 of the multi-core architecture of the present invention, and the core architecture The cluster system includes at least two clusters (shown as two clusters in Figure 1, respectively, cluster 1 and cluster 2), each cluster containing several processor cores (shown by three processor cores in Figure 1) , respectively, processor core 1, processor core 2 and processor core 3), the number of processor cores included in each cluster may be the same or different; each cluster is connected to the cluster controller, specifically All clusters are connected to the same cluster controller, or one cluster can be connected to one cluster controller, or multiple clusters can be connected to one cluster controller; when one cluster is connected to one cluster controller, or multiple cluster connections In a cluster controller, all cluster controllers are connected according to a certain topological relationship; that is, two clusters are connected by M cluster controllers, and M is greater than or equal to 1 and is an integer ( 1 shown at M = 1), for each cluster monitor the cluster controller directly connected to several of the processor core state information. All processor cores in a cluster can access the address space of the cluster.
其中,处理器核的状态信息包括访存指令队列的状态信息和/或线程上下文的存储单元的状态信息,其中,访存指令队列的格式如图2所示,图2为本发明访存指令队列的格式示意图,访存指令的状态信息用于衡量处理器核的负载大小,访存指令队列的状态信息包括访存指令队列中所有访存指令的个数,访存指令队列中所有访存指令的个数越多,说明处理器核的负载越大。线程上下文的存储单元可以是栈、缓存单元(buffer)、寄存器堆等,以栈为例,线程上下文的存储单元的格式如图3所示,图3为本发明线程上下文的存储单元的格式示意图,线程上下文的存储单元包含若干表项,每个表项用于存储一个线程的上下文信息,根据应用需要,每个表项可以设计一个或者多个字段(图3中以两个字段为例,分别为程序计数器值和寄存器值),线程上下文的存储单元的状态信息用于衡量处理器核能够存储的线程的上下文信息的多少,线程上下文的存储单元的状态信息包括线程上下文的存储单元中空余表项的个数,空余表项的个数越多,说明处理器核能够存储的线程的上下文信息越多。The status information of the processor core includes the status information of the memory access instruction queue and/or the status information of the memory unit of the thread context. The format of the memory access instruction queue is as shown in FIG. 2, and FIG. 2 is the memory access instruction of the present invention. The format diagram of the queue, the status information of the memory access instruction is used to measure the load size of the processor core, and the status information of the memory access instruction queue includes the number of all memory access instructions in the memory access instruction queue, and all memory accesses in the memory access instruction queue. The greater the number of instructions, the greater the load on the processor core. The storage unit of the thread context may be a stack, a buffer, a register file, etc., taking the stack as an example, the format of the storage unit of the thread context is as shown in FIG. 3, and FIG. 3 is a schematic diagram of the format of the storage unit of the thread context of the present invention. The storage unit of the thread context contains a plurality of entries, each of which is used to store context information of one thread. Each entry can be designed with one or more fields according to the application requirement (for example, two fields in FIG. 3 are used. The program counter value and the register value respectively, the state information of the storage unit of the thread context is used to measure the amount of context information of the thread that the processor core can store, and the state information of the storage unit of the thread context includes the storage unit of the thread context. The number of entries, the more the number of vacant entries, the more context information that the processor core can store.
当集群中的某个处理器核的线程请求访问的数据不在该集群的地址空间内时,通过将线程迁移到数据所在的集群的某个核中来访问数据,相比在本 地访问远端数据的方法,具有提高带宽利用率,降低延迟等优点。因此,本发明提供一种线程迁移方法,以实现集群间的线程迁移。When the data requested by a thread of a processor core in the cluster is not in the address space of the cluster, the data is accessed by migrating the thread to a core of the cluster in which the data resides. The method of accessing remote data has the advantages of improving bandwidth utilization and reducing delay. Therefore, the present invention provides a thread migration method to implement thread migration between clusters.
图4为本发明众核架构的集群系统二的示意图,图4是在图1的基础上,更详细的示出每个处理器核的内部结构示意图,如图4所示,当处理器核中的某个线程请求访问数据时,线程向存储访问逻辑模块发送访问请求,其中,访问请求中包含请求访问数据的地址信息;为了便于描述,将发送访问请求的线程所在的处理器核称为源处理器核,源处理器核所在的集群称为源集群,存储访问逻辑模块根据上述请求访问数据的地址信息确定存储上述请求访问数据的位置。具体地,存储访问逻辑模块中可以记录每个集群数据存储的地址空间,可以根据访问请求数据的地址,确定请求访问数据的地址属于哪个集群的地址空间内,将请求访问数据的地址所属的集群称为目标集群。当存储访问逻辑模块判断存储请求访问的数据的位置不在源集群的地址空间时,存储访问逻辑模块向线程迁移逻辑模块发送控制信号,控制信号中包含目标集群的标识和请求访问的数据的地址,线程迁移逻辑模块判断是否进行线程迁移,通常根据历史访问情况,确定上述请求访问数据的地址对应的数据较大,或者对该请求访问的地址的数据以及该请求访问的地址邻近的地址空间的数据是连续访问,则确定进行线程迁移,否则,不进行线程迁移。当线程迁移逻辑模块确定进行线程迁移时,触发线程迁移单元对该线程进行迁移,线程迁移单元获取该线程的上下文信息和目标集群的标识,线程上下文信息是指执行线程所需要的信息,一般包括:程序计数器和寄存器值,线程迁移单元将线程的上下文信息和目标集群的标识发送给该处理器核的路由器,同一个集群中的各处理器核的路由器按一定的拓扑关系连接,其中,一个处理器核的路由器与集群控制器连接,源处理器核的路由器根据拓扑关系将线程的上下文信息和目标集群的标识发送给与源集群直接连接的集群控制器,将与源集群直接连接的集群控制器称为源集群控制器,源集群控制器根据集群控制器之间的拓扑关系将上述线程的上下文信息和目标集群的标识发送给目标集群的集群控制器,将M个集群控制器中与目标集群直接连接的集群控制器称为目标集群控制器。4 is a schematic diagram of a cluster system 2 of a multi-core architecture of the present invention, and FIG. 4 is a schematic diagram showing the internal structure of each processor core in more detail on the basis of FIG. 1, as shown in FIG. When a thread requests access to data, the thread sends an access request to the storage access logic module, where the access request includes address information requesting access to the data; for convenience of description, the processor core where the thread that sends the access request is called The source processor core, the cluster in which the source processor core is located is referred to as a source cluster, and the storage access logic module determines the location where the requested access data is stored according to the address information of the requested access data. Specifically, the storage access logic module may record an address space of each cluster data storage, and may determine, according to an address of the access request data, a cluster in which the address of the requesting access data belongs, and a cluster to which the address requesting the access data belongs. Called the target cluster. When the storage access logic module determines that the location of the data requested by the storage request is not in the address space of the source cluster, the storage access logic module sends a control signal to the thread migration logic module, where the control signal includes the identifier of the target cluster and the address of the data requested to be accessed, The thread migration logic module determines whether to perform thread migration. Generally, according to the historical access situation, it is determined that the data corresponding to the address requesting the access data is larger, or the data of the address accessed by the request and the data of the address space adjacent to the address accessed by the request. If it is continuous access, it is determined to perform thread migration. Otherwise, no thread migration is performed. When the thread migration logic module determines that the thread migration is performed, the thread migration unit is triggered to migrate the thread, the thread migration unit acquires the context information of the thread and the identifier of the target cluster, and the thread context information refers to information required by the execution thread, generally including The program counter and the register value, the thread migration unit sends the context information of the thread and the identifier of the target cluster to the router of the processor core, and the routers of the processor cores in the same cluster are connected according to a certain topological relationship, wherein one The router of the processor core is connected to the cluster controller, and the router of the source processor core sends the context information of the thread and the identifier of the target cluster to the cluster controller directly connected to the source cluster according to the topology relationship, and the cluster directly connected to the source cluster The controller is called a source cluster controller. The source cluster controller sends the context information of the thread and the identifier of the target cluster to the cluster controller of the target cluster according to the topology relationship between the cluster controllers, and the M cluster controllers Cluster controller directly connected to the target cluster Called the target cluster controller.
需要说明的是,在图4所示的处理器核的结构中,线程迁移逻辑模块 也可以省略,也就是,存储访问逻辑模块确定请求访问的数据的地址的目标集群之后,直接触发线程迁移单元进行线程迁移。It should be noted that, in the structure of the processor core shown in FIG. 4, the thread migration logic module It can also be omitted, that is, after the storage access logic module determines the target cluster of the address of the data requested to be accessed, the thread migration unit is directly triggered to perform thread migration.
在集群中的处理器核较少的情况下,每个处理器核的线程迁移单元也可以直接与源集群控制器连接,而无需经过每个处理器核的路由器进行转发到集群控制器,从而,减小时延。In the case of fewer processor cores in the cluster, the thread migration unit of each processor core can also directly connect to the source cluster controller without forwarding to the cluster controller through the router of each processor core. , reduce the delay.
当所有的集群连接在同一个集群控制器上时,源集群控制器和目标集群控制器为同一个集群控制器,则省略源集群控制器根据集群控制器的拓扑关系将线程的上下文信息发送到目标集群控制器的过程。When all clusters are connected to the same cluster controller, the source cluster controller and the target cluster controller are the same cluster controller, and the source cluster controller is omitted to send the context information of the thread according to the topology relationship of the cluster controller. The process of the target cluster controller.
目标集群控制器接收到源核发送的线程的上下文信息和目标集群的标识之后,如何选择适合线程迁移的目标处理器核,通过下面的实施例进行详细的说明。After the target cluster controller receives the context information of the thread sent by the source core and the identifier of the target cluster, how to select the target processor core suitable for thread migration is described in detail in the following embodiments.
图5为本发明线程迁移方法实施例一的流程示意图,图5所示实施例的方法应用于上述的众核架构的集群系统中,如图5所示,本实施例的方法如下:FIG. 5 is a schematic flowchart of the first embodiment of the thread migration method of the present invention. The method of the embodiment shown in FIG. 5 is applied to the cluster system of the multi-core architecture. As shown in FIG. 5, the method in this embodiment is as follows:
S501:集群控制器接收源处理器核发送的线程的上下文信息和目标集群的标识。S501: The cluster controller receives the context information of the thread sent by the source processor core and the identifier of the target cluster.
其中,线程的上下文信息包括:程序计数器值和寄存器值。The context information of the thread includes: a program counter value and a register value.
S502:集群控制器若判断目标集群的标识为集群控制器直接连接的集群的标识,集群控制器根据目标集群中各处理器核的状态信息,确定线程迁移的目标处理器核。S502: If the cluster controller determines that the identifier of the target cluster is the identifier of the cluster directly connected to the cluster controller, the cluster controller determines the target processor core of the thread migration according to the state information of each processor core in the target cluster.
若集群控制器若判断目标集群的标识为集群控制器直接连接的集群的标识,则说明该集群控制器为目标集群控制器。If the cluster controller determines that the identifier of the target cluster is the identifier of the cluster directly connected to the cluster controller, the cluster controller is the target cluster controller.
针对不同的场景,目标集群控制器可以通过不同的方式接收源处理器核发送的线程的上下文信息,其中,一种方式为:当源集群控制器与目标集群控制器为同一集群控制器时,并且,源处理器核的线程迁移单元直接与目标集群控制器相连,则目标集群控制器直接从源处理器核的线程迁移单元接收线程的上下文信息;另一种方式为,当源集群控制器与目标集群控制器为同一集群控制器时,但各处理器核具有处理器核的路由器,各处理器核的路由器按一定拓扑关系连接,其中一个处理器核与集群控制器连接,则目标集群控制器通过与其连接的处理器核的路由器接收线程的上下 文信息。再一种方式为:当源集群控制器与目标集群控制器为不同的集群控制器时,源集群控制器接收到源处理器核发送的线程的上下文信息之后,根据集群控制器之间的拓扑关系将线程的上下文信息发送到目标集群控制器。For different scenarios, the target cluster controller can receive the context information of the thread sent by the source processor core in different manners. One way is: when the source cluster controller and the target cluster controller are the same cluster controller, Moreover, the thread migration unit of the source processor core is directly connected to the target cluster controller, and the target cluster controller directly receives the context information of the thread from the thread migration unit of the source processor core; the other way is, when the source cluster controller When the target cluster controller is the same cluster controller, but each processor core has a processor core router, the routers of each processor core are connected according to a certain topological relationship, and one processor core is connected to the cluster controller, and the target cluster is connected. The controller receives the thread up and down through the router of the processor core connected to it Text information. Another way is: when the source cluster controller and the target cluster controller are different cluster controllers, after the source cluster controller receives the context information of the thread sent by the source processor core, according to the topology between the cluster controllers The relationship sends the thread's context information to the target cluster controller.
集群控制器根据目标集群中的各处理器核的状态信息,从中选择一个最适合运行上述线程的处理器核作为目标处理器核。The cluster controller selects, according to the state information of each processor core in the target cluster, a processor core that is most suitable for running the above thread as the target processor core.
在执行S502之前,还可以包括集群控制器获取目标集群中各处理器核的状态信息的步骤。Before performing S502, the step of acquiring, by the cluster controller, status information of each processor core in the target cluster may also be included.
具体地,集群控制器针对目标集群的每个处理器核,通过两个计数器分别记录处理器核的访存指令队列中所有访存指令的个数和线程上下文的存储单元中空余表项的个数。Specifically, the cluster controller records, for each processor core of the target cluster, the number of all the memory access instructions in the memory core instruction fetch queue of the processor core and the memory cell empty space entries in the thread context by two counters. number.
S503:集群控制器将线程的上下文信息发送到目标处理器核。S503: The cluster controller sends the context information of the thread to the target processor core.
集群控制器将线程的上下文信息发送到目标处理器核,具体地,是发送到目标处理器核的线程迁移单元,目标处理器核的线程迁移单元将线程的上下信息存储在线程上下文的存储单元中,以使线程在目标处理器核中访问请求访问的数据。The cluster controller sends the context information of the thread to the target processor core, specifically, the thread migration unit sent to the target processor core, and the thread migration unit of the target processor core stores the upper and lower information of the thread in the storage unit of the thread context. In order to allow the thread to access the data requested for access in the target processor core.
本实施例中,通过集群控制器根据目标集群中的各处理器核的状态信息,确定一个适合线程迁移的目标处理器核,将线程的上下文信息发送到目标处理器核,从而,实现集群之间的线程迁移。In this embodiment, the cluster controller determines, according to the state information of each processor core in the target cluster, a target processor core suitable for thread migration, and sends the context information of the thread to the target processor core, thereby implementing the cluster. Thread migration between.
在上述实施例中,集群控制器根据目标集群中各处理器核的状态信息,确定线程迁移的目标处理器核,包括但不限于下述几种方式:In the foregoing embodiment, the cluster controller determines the target processor core of the thread migration according to the state information of each processor core in the target cluster, including but not limited to the following manners:
第一种方式,集群控制器根据目标集群中各处理器核的访存指令队列的状态信息,确定线程迁移的目标处理器核。In the first mode, the cluster controller determines the target processor core of the thread migration according to the state information of the memory access instruction queue of each processor core in the target cluster.
具体地,访存指令队列的状态信息包括访存指令队列中所有访存指令的个数,访存指令队列中访存指令的个数越多,处理器核的负载越大,因此,集群控制器根据目标集群中各处理器核的访存指令队列中所有访存指令的个数(已有访存指令的个数),确定访存指令队列中所有访存指令的个数最少的处理器核为线程迁移的目标处理器核,也就是,选择负载最小的处理器核作为迁移的目标处理器核,当线程迁移到目标处理器核中后,可尽早的处理到该线程。 Specifically, the status information of the memory access instruction queue includes the number of all memory access instructions in the memory access instruction queue, and the more the number of memory access instructions in the memory access instruction queue, the larger the load of the processor core, therefore, the cluster control According to the number of all memory access instructions (the number of existing memory access instructions) in the memory access instruction queue of each processor core in the target cluster, the processor determines the processor with the least number of memory access instructions in the memory queue. The core is the target processor core for thread migration, that is, the processor core with the least load is selected as the target processor core of the migration. When the thread is migrated to the target processor core, the thread can be processed as early as possible.
第二种方式,集群控制器根据目标集群中各处理器核的线程上下文的存储单元的状态信息,确定线程迁移的目标处理器核。In the second mode, the cluster controller determines the target processor core of the thread migration according to the state information of the storage unit of the thread context of each processor core in the target cluster.
线程上下文的存储单元的结构如图3所示,包括若干表项,每个表项用于存储一个线程的上下文信息,线程上下文的存储单元的状态信息包括线程上下文的存储单元中剩余表项的个数。集群控制器根据目标集群中各处理器核的线程上下文的存储单元中空余表项的个数,确定线程上下文的存储单元中空余表项的个数最多的处理器核为线程迁移的目标处理器核。The structure of the storage unit of the thread context is as shown in FIG. 3, and includes several entries, each of which is used to store context information of one thread, and the state information of the storage unit of the thread context includes the remaining entries in the storage unit of the thread context. Number. The cluster controller determines, according to the number of empty space entries of the storage unit of the thread context of each processor core in the target cluster, the processor core with the largest number of empty storage entries in the thread context is the target processor of the thread migration. nuclear.
需要说明的是,目标集群中各处理器核的线程上下文的存储单元均已满,则随机选择一个处理器核作为目标处理器核,将线程迁移单元将目标处理器核中原有的线程上下文信息按原路切出。若目标集群中各处理器核的线程上下文的存储单元中空余表项的个数最多的处理器核存在至少两个时,按照轮询的方式从所述至少两个处理器核中选择一个作为目标处理器核。It should be noted that, when the storage unit of the thread context of each processor core in the target cluster is full, a processor core is randomly selected as the target processor core, and the thread migration unit will have the original thread context information in the target processor core. Cut out the original way. If at least two processor cores having the largest number of slots in the storage unit of the thread context of each processor core in the target cluster have at least two, select one of the at least two processor cores according to a polling manner. Target processor core.
第三种方式,集群控制器根据目标集群中各处理器核的访存指令队列的状态信息和线程上下文的存储单元的状态信息,确定线程迁移的目标处理器核。In the third mode, the cluster controller determines the target processor core of the thread migration according to the state information of the memory access instruction queue of each processor core in the target cluster and the state information of the storage unit of the thread context.
集群控制器为访存指令队列中所有访存指令的个数分配第一权重系数,为线程上下文的存储单元中空余表项的个数分配第二权重系数,集群控制器对目标集群中各处理器核的访存指令队列中所有访存指令的个数和线程上下文的存储单元中空余表项的个数进行加权求和,集群控制器根据加权求和结果确定线程迁移的目标处理器核。通常,第一权重系数为负数,第二权重系数为正数,确定加权求和结果最大的处理器核为目标处理器核。The cluster controller allocates a first weight coefficient for the number of all the memory access instructions in the memory access instruction queue, and allocates a second weight coefficient for the number of the remaining space entries of the storage unit in the thread context, and the cluster controller processes each of the target clusters. The number of all memory access instructions in the memory fetch instruction queue and the number of memory cell empty table entries in the thread context are weighted and summed, and the cluster controller determines the target processor core of the thread migration according to the weighted summation result. Generally, the first weight coefficient is a negative number, the second weight coefficient is a positive number, and the processor core that determines the weighted summation result is the target processor core.
本发明对集群控制器具体如何根据目标集群中各处理器核的访存指令队列的状态信息和/或线程上下文的存储单元的状态信息,确定线程迁移的目标处理器核不作限定。只要是根据目标集群中各处理器核的访存指令队列的状态信息和/或线程上下文的存储单元的状态信息确定线程迁移的目标处理器核,都在本发明的保护范围内。The present invention determines, by the cluster controller, whether the target processor core of the thread migration is not limited according to the state information of the memory access instruction queue of each processor core in the target cluster and/or the state information of the storage unit of the thread context. It is within the scope of the present invention to determine the target processor core for thread migration based on the state information of the memory access instruction queues of the processor cores in the target cluster and/or the state information of the memory cells of the thread context.
图6为本发明线程迁移装置实施例一的结构示意图,本实施例的装置为集群控制器,本实施例的装置应用于众核架构的集群系统中,上述众核架构的集群系统至少包含两个集群,上述两个集群之间通过M个集群控 制器相连,每个上述集群中包含若干个处理器核,每个上述集群控制器用于监控所直接连接的集群中的若干个处理器核的状态信息,上述M大于等于1且为整数,上述装置包括接收端口601、监控器602、处理器603和发送端口604,其中,接收端口601用于接收源处理器核发送的线程的上下文信息和目标集群的标识,上述线程的上下文信息包括:程序计数器值和寄存器值;监控器602用于监控上述目标集群中的各处理器核的状态信息;处理器603用于若判断目标集群的标识为所直接连接的集群的标识,根据上述目标集群中的各处理器核的状态信息,确定上述线程迁移的目标处理器核;发送端口604用于将上述线程的上下文信息发送到上述目标处理器核。FIG. 6 is a schematic structural diagram of Embodiment 1 of a thread migration apparatus according to the present invention. The apparatus in this embodiment is a cluster controller. The apparatus in this embodiment is applied to a cluster system of a multi-core architecture, and the cluster system of the multi-core architecture includes at least two Clusters, through the M clusters between the two clusters Each of the clusters includes a plurality of processor cores, each of the cluster controllers for monitoring status information of a plurality of processor cores in the directly connected cluster, wherein the M is greater than or equal to 1 and is an integer. The device includes a receiving port 601, a monitor 602, a processor 603, and a sending port 604, where the receiving port 601 is configured to receive context information of a thread sent by the source processor core and an identifier of the target cluster, where the context information of the thread includes: a program a counter value and a register value; the monitor 602 is configured to monitor status information of each processor core in the target cluster; and the processor 603 is configured to determine, if the identifier of the target cluster is an identifier of the directly connected cluster, according to the target cluster The status information of each processor core determines the target processor core of the thread migration; the transmission port 604 is configured to send the context information of the thread to the target processor core.
在上述实施例中,上述处理器603具体用于根据上述目标集群中各处理器核的访存指令队列的状态信息,确定上述线程迁移的目标处理器核;或者,In the above embodiment, the processor 603 is specifically configured to determine, according to the state information of the memory access instruction queue of each processor core in the target cluster, the target processor core of the thread migration; or
上述处理器603具体用于根据上述目标集群中各处理器核的线程上下文的存储单元的状态信息,确定上述线程迁移的目标处理器核;或者,The processor 603 is specifically configured to determine, according to state information of a storage unit of a thread context of each processor core in the target cluster, the target processor core of the thread migration; or
上述处理器603具体用于根据上述目标集群中各处理器核的访存指令队列的状态信息和线程上下文的存储单元的状态信息,确定上述线程迁移的目标处理器核。The processor 603 is specifically configured to determine, according to the state information of the memory access instruction queue of each processor core in the target cluster and the state information of the storage unit of the thread context, the target processor core of the thread migration.
在上述实施例中,上述访存指令队列的状态信息包括上述访存指令队列中所有访存指令的个数;In the above embodiment, the status information of the memory access instruction queue includes the number of all memory access instructions in the memory access instruction queue;
上述处理器603具体用于根据上述目标集群中各处理器核的访存指令队列中所有访存指令的个数,确定访存指令队列中所有访存指令的个数最少的处理器核为线程迁移的目标处理器核。The processor 603 is specifically configured to determine, according to the number of all memory access instructions in the memory access instruction queue of each processor core in the target cluster, that the processor core with the least number of memory access instructions in the memory queue is the thread. The target processor core of the migration.
在上述实施例中,上述线程上下文的存储单元包含若干表项,每个上述表项用于存储一个线程的上下文信息,上述线程上下文的存储单元的状态信息包括上述线程上下文的存储单元中空余表项的个数;In the above embodiment, the storage unit of the thread context includes a plurality of entries, each of the foregoing entries is used to store context information of a thread, and the state information of the storage unit of the thread context includes a storage unit of the thread context. The number of items;
上述处理器603具体用于根据上述目标集群中各处理器核的线程上下文的存储单元中空余表项的个数,确定线程上下文的存储单元中空余表项的个数最多的处理器核为线程迁移的目标处理器核。The processor 603 is specifically configured to determine, according to the number of empty storage entries of the storage unit of the thread context of each processor core in the target cluster, that the processor core with the largest number of remaining storage entries in the thread context is a thread. The target processor core of the migration.
在上述实施例中,上述处理器603具体用于为上述访存指令队列中所 有访存指令的个数分配第一权重系数,为上述线程上下文的存储单元中空余表项的个数分配第二权重系数;对上述目标集群中各处理器核的访存指令队列中所有访存指令的个数和上述线程上下文的存储单元中空余表项的个数进行加权求和;根据上述加权求和结果确定上述线程迁移的目标处理器核。In the above embodiment, the processor 603 is specifically configured to be used in the cached instruction queue. The number of the memory access instruction is assigned a first weighting coefficient, and the second weighting coefficient is allocated to the number of the remaining space entries of the storage unit in the thread context; and all the accesses in the memory access instruction queue of each processor core in the target cluster The number of the stored instructions and the number of the remaining spatial entries of the storage unit of the thread context are weighted and summed; and the target processor core of the thread migration is determined according to the weighted summation result.
在上述实施例中,上述监控器602具体用于针对上述目标集群的每个处理器核,通过两个计数器分别记录上述处理器核的访存指令队列中所有访存指令的个数和上述线程上下文的存储单元中空余表项的个数。In the foregoing embodiment, the monitor 602 is specifically configured to record, for each processor core of the target cluster, the number of all memory access instructions in the memory access instruction queue of the processor core and the thread by using two counters. The number of empty remaining entries in the storage unit of the context.
上述实施例的装置,通过监控器监控上述目标集群中的各处理器核的状态信息;处理器若判断目标集群的标识为所直接连接的集群的标识,根据上述目标集群中的各处理器核的状态信息,确定上述线程迁移的目标处理器核;发送端口将上述线程的上下文信息发送到上述目标处理器核,从而,实现集群之间的线程迁移。The device of the foregoing embodiment monitors, by the monitor, status information of each processor core in the target cluster; if the processor determines that the identifier of the target cluster is the identifier of the directly connected cluster, according to each processor core in the target cluster The status information determines the target processor core of the thread migration; the sending port sends the context information of the thread to the target processor core, thereby implementing thread migration between the clusters.
本发明还提供一种线程迁移系统实施例,本实施例的系统包括至少两个集群,所述两个集群之间通过M个图6所示的集群控制器相连,每个所述集群中包含若干个处理器核,每个集群控制器用于监控所直接连接的集群中的若干个处理器核的状态信息,所述M大于等于1且为整数,线程迁移系统实施例的附图可参照图1或图2。The present invention further provides an embodiment of a thread migration system. The system of this embodiment includes at least two clusters, and the two clusters are connected by M cluster controllers as shown in FIG. 6, each of which includes a plurality of processor cores, each cluster controller is configured to monitor state information of a plurality of processor cores in the directly connected cluster, wherein the M is greater than or equal to 1 and is an integer, and the figure of the thread migration system embodiment can refer to the figure. 1 or Figure 2.
本实施例的系统能够用于执行图5所示的方法实施例的技术方案,其实现原理和技术效果类似,此处不再赘述。The system of the present embodiment can be used to implement the technical solution of the method embodiment shown in FIG. 5, and the implementation principle and technical effects are similar, and details are not described herein again.
本领域普通技术人员可以理解:实现上述各方法实施例的全部或部分步骤可以通过程序指令相关的硬件来完成。前述的程序可以存储于一计算机可读取存储介质中。该程序在执行时,执行包括上述各方法实施例的步骤;而前述的存储介质包括:ROM、RAM、磁碟或者光盘等各种可以存储程序代码的介质。One of ordinary skill in the art will appreciate that all or part of the steps to implement the various method embodiments described above may be accomplished by hardware associated with the program instructions. The aforementioned program can be stored in a computer readable storage medium. The program, when executed, performs the steps including the foregoing method embodiments; and the foregoing storage medium includes various media that can store program codes, such as a ROM, a RAM, a magnetic disk, or an optical disk.
最后应说明的是:以上各实施例仅用以说明本发明的技术方案,而非对其限制;尽管参照前述各实施例对本发明进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分或者全部技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本发明各实施例技术方案的范围。 Finally, it should be noted that the above embodiments are merely illustrative of the technical solutions of the present invention, and are not intended to be limiting; although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art will understand that The technical solutions described in the foregoing embodiments may be modified, or some or all of the technical features may be equivalently replaced; and the modifications or substitutions do not deviate from the technical solutions of the embodiments of the present invention. range.

Claims (13)

  1. 一种线程迁移方法,其特征在于,应用于众核架构的集群系统中,所述众核架构的集群系统至少包含两个集群,所述两个集群之间通过M个集群控制器相连,每个所述集群中包含若干个处理器核,每个所述集群控制器用于监控所直接连接的集群中的若干个处理器核的状态信息,所述M大于等于1且为整数,所述方法包括:A thread migration method, characterized in that, in a cluster system applied to a many-core architecture, the cluster system of the many-core architecture includes at least two clusters, and the two clusters are connected by M cluster controllers, each Each of the clusters includes a plurality of processor cores, each of the cluster controllers for monitoring status information of a plurality of processor cores in the directly connected cluster, the M being greater than or equal to 1 and being an integer, the method include:
    集群控制器接收源处理器核发送的线程的上下文信息和目标集群的标识,所述集群控制器是所述M个集群控制器中任一个集群控制器,其中,所述线程的上下文信息包括:程序计数器值和寄存器值;The cluster controller receives the context information of the thread sent by the source processor core and the identifier of the target cluster, where the cluster controller is any one of the M cluster controllers, where the context information of the thread includes: Program counter value and register value;
    所述集群控制器若判断所述目标集群的标识为所述集群控制器直接连接的集群的标识,所述集群控制器根据所述目标集群中的各处理器核的状态信息,确定所述线程迁移的目标处理器核;If the cluster controller determines that the identifier of the target cluster is an identifier of a cluster directly connected to the cluster controller, the cluster controller determines the thread according to status information of each processor core in the target cluster. The target processor core of the migration;
    所述集群控制器将所述线程的上下文信息发送到所述目标处理器核。The cluster controller sends context information of the thread to the target processor core.
  2. 根据权利要求1所述的方法,其特征在于,The method of claim 1 wherein
    所述集群控制器根据所述目标集群中各处理器核的状态信息,确定所述线程迁移的目标处理器核,包括下述任一种方式:Determining, by the cluster controller, the target processor core of the thread migration according to the state information of each processor core in the target cluster, including any one of the following manners:
    所述集群控制器根据所述目标集群中各处理器核的访存指令队列的状态信息,确定所述线程迁移的目标处理器核;Determining, by the cluster controller, the target processor core of the thread migration according to status information of the memory access instruction queue of each processor core in the target cluster;
    所述集群控制器根据所述目标集群中各处理器核的线程上下文的存储单元的状态信息,确定所述线程迁移的目标处理器核;Determining, by the cluster controller, the target processor core of the thread migration according to state information of a storage unit of a thread context of each processor core in the target cluster;
    所述集群控制器根据所述目标集群中各处理器核的访存指令队列的状态信息和线程上下文的存储单元的状态信息,确定所述线程迁移的目标处理器核。The cluster controller determines the target processor core of the thread migration according to the state information of the memory access instruction queue of each processor core in the target cluster and the state information of the storage unit of the thread context.
  3. 根据权利要求2所述的方法,其特征在于,所述访存指令队列的状态信息包括所述访存指令队列中所有访存指令的个数;The method according to claim 2, wherein the status information of the memory access instruction queue includes the number of all memory access instructions in the memory access instruction queue;
    所述集群控制器根据所述目标集群中各处理器核的访存指令队列的状态信息,确定所述线程迁移的目标处理器核,包括:Determining, by the cluster controller, the target processor core of the thread migration according to the state information of the memory access instruction queue of each processor core in the target cluster, including:
    所述集群控制器根据所述目标集群中各处理器核的访存指令队列中所有访存指令的个数,确定访存指令队列中所有访存指令的个数最少的处理器核为线程迁移的目标处理器核。 The cluster controller determines, according to the number of all memory access instructions in the memory access instruction queue of each processor core in the target cluster, that the processor core with the least number of memory access instructions in the memory queue is the thread migration Target processor core.
  4. 根据权利要求2所述的方法,其特征在于,所述线程上下文的存储单元包含若干表项,每个所述表项用于存储一个线程的上下文信息,所述线程上下文的存储单元的状态信息包括所述线程上下文的存储单元中空余表项的个数;The method according to claim 2, wherein the storage unit of the thread context comprises a plurality of entries, each of the entries for storing context information of a thread, and status information of the storage unit of the thread context The number of empty remaining entries of the storage unit including the thread context;
    所述集群控制器根据所述目标集群中各处理器核的线程上下文的存储单元的状态信息,确定所述线程迁移的目标处理器核,包括:Determining, by the cluster controller, the target processor core of the thread migration according to the state information of the storage unit of the thread context of each processor core in the target cluster, including:
    所述集群控制器根据所述目标集群中各处理器核的线程上下文的存储单元中空余表项的个数,确定线程上下文的存储单元中空余表项的个数最多的处理器核为线程迁移的目标处理器核。Determining, by the cluster controller, the processor core with the largest number of empty remaining entries in the storage unit of the thread context according to the number of empty storage entries of the storage unit of the thread context of each processor core in the target cluster Target processor core.
  5. 根据权利要求2所述的方法,其特征在于,所述集群控制器根据所述目标集群中各处理器核的访存指令队列的状态信息和线程上下文的存储单元的状态信息,确定所述线程迁移的目标处理器核,包括:The method according to claim 2, wherein the cluster controller determines the thread according to state information of a memory access instruction queue of each processor core in the target cluster and state information of a storage unit of a thread context. The target processor core for the migration, including:
    所述集群控制器为所述访存指令队列中所有访存指令的个数分配第一权重系数,所述集群控制器为所述线程上下文的存储单元中空余表项的个数分配第二权重系数;The cluster controller allocates a first weight coefficient for the number of all the memory access instructions in the memory access instruction queue, and the cluster controller allocates a second weight to the number of the remaining space entries of the storage unit of the thread context. coefficient;
    所述集群控制器对所述目标集群中各处理器核的访存指令队列中所有访存指令的个数和所述线程上下文的存储单元中空余表项的个数进行加权求和;The cluster controller weights and sums the number of all fetch instructions in the fetch instruction queue of each processor core in the target cluster and the number of storage unit empty residual entries in the thread context;
    所述集群控制器根据所述加权求和结果确定所述线程迁移的目标处理器核。The cluster controller determines the target processor core of the thread migration according to the weighted summation result.
  6. 根据权利要求1~5任一项所述的方法,其特征在于,所述集群控制器根据所述目标集群中的各处理器核的状态信息,确定所述线程迁移的目标处理器核之前,还包括:The method according to any one of claims 1 to 5, wherein the cluster controller determines the target processor core that is migrated by the thread according to state information of each processor core in the target cluster. Also includes:
    所述集群控制器针对所述目标集群的每个处理器核,通过两个计数器分别记录所述处理器核的访存指令队列中所有访存指令的个数和所述线程上下文的存储单元中空余表项的个数。The cluster controller records, for each processor core of the target cluster, the number of all memory access instructions in the memory access instruction queue of the processor core and the storage unit of the thread context by two counters. The number of free entries.
  7. 一种线程迁移装置,其特征在于,应用于众核架构的集群系统中,所述众核架构的集群系统至少包含两个集群,所述两个集群之间通过M个集群控制器相连,每个所述集群中包含若干个处理器核,每个所述集群控制器用于监控所直接连接的集群中的若干个处理器核的状态信息,所述M 大于等于1且为整数,所述装置包括:A thread migration apparatus, characterized in that, in a cluster system applied to a multi-core architecture, the cluster system of the many-core architecture includes at least two clusters, and the two clusters are connected by M cluster controllers, each Each of the clusters includes a plurality of processor cores, each of the cluster controllers for monitoring status information of a plurality of processor cores in the directly connected cluster, the M The device is greater than or equal to 1 and is an integer, and the device includes:
    接收端口,用于接收源处理器核发送的线程的上下文信息和目标集群的标识,所述线程的上下文信息包括:程序计数器值和寄存器值;a receiving port, configured to receive context information of a thread sent by the source processor core and an identifier of the target cluster, where context information of the thread includes: a program counter value and a register value;
    监控器,用于监控所述目标集群中的各处理器核的状态信息;a monitor, configured to monitor status information of each processor core in the target cluster;
    处理器,用于若判断目标集群的标识为所直接连接的集群的标识,根据所述目标集群中的各处理器核的状态信息,确定所述线程迁移的目标处理器核;a processor, configured to determine, according to the state information of each processor core in the target cluster, the target processor core of the thread migration, if the identifier of the target cluster is determined to be an identifier of the directly connected cluster;
    发送端口,用于将所述线程的上下文信息发送到所述目标处理器核。a sending port, configured to send context information of the thread to the target processor core.
  8. 根据权利要求7所述的装置,其特征在于,所述处理器具体用于根据所述目标集群中各处理器核的访存指令队列的状态信息,确定所述线程迁移的目标处理器核;或者,The apparatus according to claim 7, wherein the processor is specifically configured to determine, according to state information of a memory access instruction queue of each processor core in the target cluster, a target processor core that is migrated by the thread; or,
    所述处理器具体用于根据所述目标集群中各处理器核的线程上下文的存储单元的状态信息,确定所述线程迁移的目标处理器核;或者,The processor is specifically configured to determine, according to state information of a storage unit of a thread context of each processor core in the target cluster, a target processor core that is migrated by the thread; or
    所述处理器具体用于根据所述目标集群中各处理器核的访存指令队列的状态信息和线程上下文的存储单元的状态信息,确定所述线程迁移的目标处理器核。The processor is specifically configured to determine, according to the state information of the memory access instruction queue of each processor core in the target cluster and the state information of the storage unit of the thread context, the target processor core of the thread migration.
  9. 根据权利要求8所述的装置,其特征在于,所述访存指令队列的状态信息包括所述访存指令队列中所有访存指令的个数;The apparatus according to claim 8, wherein the status information of the memory access instruction queue includes the number of all memory access instructions in the memory access instruction queue;
    所述处理器具体用于根据所述目标集群中各处理器核的访存指令队列中所有访存指令的个数,确定访存指令队列中所有访存指令的个数最少的处理器核为线程迁移的目标处理器核。The processor is specifically configured to determine, according to the number of all memory access instructions in the memory access instruction queue of each processor core in the target cluster, a processor core having the least number of memory access instructions in the memory queue The target processor core for thread migration.
  10. 根据权利要求8所述的装置,其特征在于,所述线程上下文的存储单元包含若干表项,每个所述表项用于存储一个线程的上下文信息,所述线程上下文的存储单元的状态信息包括所述线程上下文的存储单元中空余表项的个数;The apparatus according to claim 8, wherein the storage unit of the thread context comprises a plurality of entries, each of the entries for storing context information of a thread, and status information of a storage unit of the thread context The number of empty remaining entries of the storage unit including the thread context;
    所述处理器具体用于根据所述目标集群中各处理器核的线程上下文的存储单元中空余表项的个数,确定线程上下文的存储单元中空余表项的个数最多的处理器核为线程迁移的目标处理器核。The processor is specifically configured to determine, according to the number of empty storage entries of the storage unit of the thread context of each processor core in the target cluster, the processor core with the largest number of remaining storage entries in the thread context The target processor core for thread migration.
  11. 根据权利要求8所述的装置,其特征在于,所述处理器具体用于为所述访存指令队列中所有访存指令的个数分配第一权重系数,为所述线 程上下文的存储单元中空余表项的个数分配第二权重系数;对所述目标集群中各处理器核的访存指令队列中所有访存指令的个数和所述线程上下文的存储单元中空余表项的个数进行加权求和;根据所述加权求和结果确定所述线程迁移的目标处理器核。The apparatus according to claim 8, wherein the processor is specifically configured to allocate a first weight coefficient for the number of all memory access instructions in the memory access instruction queue, for the line a second weighting coefficient is allocated to the number of the remaining space entries of the storage unit of the program context; the number of all the fetch instructions in the fetch instruction queue of each processor core in the target cluster and the storage unit of the thread context The number of vacant entries is weighted and summed; the target processor core of the thread migration is determined according to the weighted summation result.
  12. 根据权利要求7~11任一项所述的装置,其特征在于,所述监控器具体用于针对所述目标集群的每个处理器核,通过两个计数器分别记录所述处理器核的访存指令队列中所有访存指令的个数和所述线程上下文的存储单元中空余表项的个数。The device according to any one of claims 7 to 11, wherein the monitor is specifically configured to separately record the access of the processor core by using two counters for each processor core of the target cluster. The number of all fetch instructions in the instruction queue and the number of empty cell entries in the storage unit of the thread context.
  13. 一种线程迁移系统,其特征在于,包括:A thread migration system, comprising:
    至少两个集群,所述两个集群之间通过M个如权利要求7~12任一项所述的集群控制器相连,每个所述集群中包含若干个处理器核,每个集群控制器用于监控所直接连接的集群中的若干个处理器核的状态信息,所述M大于等于1且为整数。 At least two clusters connected by the cluster controllers according to any one of claims 7 to 12, each of the clusters comprising a plurality of processor cores, each of the cluster controllers For monitoring status information of several processor cores in a directly connected cluster, the M is greater than or equal to 1 and is an integer.
PCT/CN2014/087101 2014-09-22 2014-09-22 Thread migration method, apparatus and system WO2016044980A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2014/087101 WO2016044980A1 (en) 2014-09-22 2014-09-22 Thread migration method, apparatus and system
CN201480038263.XA CN105637483B (en) 2014-09-22 2014-09-22 Thread migration method, device and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2014/087101 WO2016044980A1 (en) 2014-09-22 2014-09-22 Thread migration method, apparatus and system

Publications (1)

Publication Number Publication Date
WO2016044980A1 true WO2016044980A1 (en) 2016-03-31

Family

ID=55580029

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2014/087101 WO2016044980A1 (en) 2014-09-22 2014-09-22 Thread migration method, apparatus and system

Country Status (2)

Country Link
CN (1) CN105637483B (en)
WO (1) WO2016044980A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113268466A (en) * 2021-06-07 2021-08-17 上海数禾信息科技有限公司 Method and system for smoothly migrating message cluster
CN114020139A (en) * 2021-11-05 2022-02-08 珠海全志科技股份有限公司 CPU power consumption management method, computer device and computer readable storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101458634A (en) * 2008-01-22 2009-06-17 中兴通讯股份有限公司 Load equilibration scheduling method and device
US20110191776A1 (en) * 2010-02-02 2011-08-04 International Business Machines Corporation Low overhead dynamic thermal management in many-core cluster architecture
CN102193779A (en) * 2011-05-16 2011-09-21 武汉科技大学 MPSoC (multi-processor system-on-chip)-oriented multithread scheduling method
US20140129808A1 (en) * 2012-04-27 2014-05-08 Alon Naveh Migrating tasks between asymmetric computing elements of a multi-core processor

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060123423A1 (en) * 2004-12-07 2006-06-08 International Business Machines Corporation Borrowing threads as a form of load balancing in a multiprocessor data processing system
US20080005591A1 (en) * 2006-06-28 2008-01-03 Trautman Mark A Method, system, and apparatus for dynamic thermal management
CN101751295B (en) * 2009-12-22 2012-08-29 浙江大学 Method for realizing inter-core thread migration under multi-core architecture
CN102081551A (en) * 2011-01-28 2011-06-01 中国人民解放军国防科学技术大学 Micro-architecture sensitive thread scheduling (MSTS) method
US9075610B2 (en) * 2011-12-15 2015-07-07 Intel Corporation Method, apparatus, and system for energy efficiency and energy conservation including thread consolidation

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101458634A (en) * 2008-01-22 2009-06-17 中兴通讯股份有限公司 Load equilibration scheduling method and device
US20110191776A1 (en) * 2010-02-02 2011-08-04 International Business Machines Corporation Low overhead dynamic thermal management in many-core cluster architecture
CN102193779A (en) * 2011-05-16 2011-09-21 武汉科技大学 MPSoC (multi-processor system-on-chip)-oriented multithread scheduling method
US20140129808A1 (en) * 2012-04-27 2014-05-08 Alon Naveh Migrating tasks between asymmetric computing elements of a multi-core processor

Also Published As

Publication number Publication date
CN105637483A (en) 2016-06-01
CN105637483B (en) 2019-08-20

Similar Documents

Publication Publication Date Title
EP3754511A1 (en) Multi-protocol support for transactions
US10325343B1 (en) Topology aware grouping and provisioning of GPU resources in GPU-as-a-Service platform
US10254987B2 (en) Disaggregated memory appliance having a management processor that accepts request from a plurality of hosts for management, configuration and provisioning of memory
JP6290462B2 (en) Coordinated admission control for network accessible block storage
US9197703B2 (en) System and method to maximize server resource utilization and performance of metadata operations
US8250164B2 (en) Query performance data on parallel computer system having compute nodes
US9229751B2 (en) Apparatus and method for managing virtual memory
US20190245924A1 (en) Three-stage cost-efficient disaggregation for high-performance computation, high-capacity storage with online expansion flexibility
US8838916B2 (en) Hybrid data storage management taking into account input/output (I/O) priority
US20180285294A1 (en) Quality of service based handling of input/output requests method and apparatus
US10356150B1 (en) Automated repartitioning of streaming data
US10810143B2 (en) Distributed storage system and method for managing storage access bandwidth for multiple clients
US10860352B2 (en) Host system and method for managing data consumption rate in a virtual data processing environment
US11556391B2 (en) CPU utilization for service level I/O scheduling
WO2017020742A1 (en) Load balancing method and device
US20160253216A1 (en) Ordering schemes for network and storage i/o requests for minimizing workload idle time and inter-workload interference
WO2020134364A1 (en) Virtual machine migration method, cloud computing management platform, and storage medium
US10073629B2 (en) Memory transaction prioritization
WO2016044980A1 (en) Thread migration method, apparatus and system
CN113986830A (en) Distributed CT-oriented cloud data management and task scheduling method and system
US10917496B2 (en) Networked storage architecture
US10846125B2 (en) Memory access optimization in a processor complex
Gao et al. A load-aware data migration scheme for distributed surveillance video processing with hybrid storage architecture
CN108228323B (en) Hadoop task scheduling method and device based on data locality
US10824640B1 (en) Framework for scheduling concurrent replication cycles

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14902736

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 14902736

Country of ref document: EP

Kind code of ref document: A1