CN101957744A - Hardware multithreading control method for microprocessor and device thereof - Google Patents
Hardware multithreading control method for microprocessor and device thereof Download PDFInfo
- Publication number
- CN101957744A CN101957744A CN2010105127373A CN201010512737A CN101957744A CN 101957744 A CN101957744 A CN 101957744A CN 2010105127373 A CN2010105127373 A CN 2010105127373A CN 201010512737 A CN201010512737 A CN 201010512737A CN 101957744 A CN101957744 A CN 101957744A
- Authority
- CN
- China
- Prior art keywords
- thread
- hardware
- register
- instruction
- execution
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 25
- 238000002360 preparation method Methods 0.000 claims description 8
- 238000013461 design Methods 0.000 abstract description 3
- 230000001934 delay Effects 0.000 abstract description 2
- 238000004321 preservation Methods 0.000 abstract 1
- 238000004088 simulation Methods 0.000 abstract 1
- 238000010586 diagram Methods 0.000 description 28
- 238000012545 processing Methods 0.000 description 9
- 239000000872 buffer Substances 0.000 description 4
- 230000008901 benefit Effects 0.000 description 3
- 230000005540 biological transmission Effects 0.000 description 3
- 238000011161 development Methods 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 238000000926 separation method Methods 0.000 description 3
- 101100003180 Colletotrichum lindemuthianum ATG1 gene Proteins 0.000 description 2
- 238000011010 flushing procedure Methods 0.000 description 2
- 101100113692 Caenorhabditis elegans clk-2 gene Proteins 0.000 description 1
- 230000003111 delayed effect Effects 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
Images
Landscapes
- Advance Control (AREA)
Abstract
一种用于微处理器的硬件多线程控制方法及其装置,涉及微处理器体系结构领域。本发明提出的控制方法包括以下步骤:多线程取指,多线程译码,多线程执行,多线程访存,多线程写回。本发明提出的仿真装置包括:一个硬件多线程取指器件,一个硬件多线程译码器件,一个硬件多线程执行器件,一个硬件多线程访存器件,一个硬件多线程写回器件,一个硬件多线程寄存器组,一个多线程控制器件。此外,本发明针对软件多线程程序,可以通过处理器硬件多线程执行,执行时有效地隐藏了访存延迟,略去了线程切换时线程相关信息的保存与恢复,减少了线程切换的开销;采用流水线技术,使得原来执行一个线程的时间内现在可以并行执行n个线程;通过硬件多线程有效规避了深度流水带来的数据相关性风险,降低了系统的设计复杂性,从硬件上提高了系统的执行效率。
A hardware multi-thread control method and device for a microprocessor relate to the field of microprocessor architecture. The control method proposed by the present invention includes the following steps: multi-thread instruction fetching, multi-thread decoding, multi-thread execution, multi-thread memory access, and multi-thread write-back. The simulation device proposed by the present invention includes: a hardware multi-thread instruction fetching device, a hardware multi-thread decoding device, a hardware multi-thread execution device, a hardware multi-thread memory access device, a hardware multi-thread write-back device, a hardware multi-thread Thread register set, a multi-thread control device. In addition, the present invention is aimed at software multi-thread programs, which can be executed by processor hardware multi-threads, effectively hides memory access delays during execution, omits the preservation and restoration of thread-related information during thread switching, and reduces the overhead of thread switching; Pipeline technology is adopted, so that n threads can be executed in parallel within the original execution time of one thread; through hardware multi-threading, the risk of data correlation brought by deep pipelining is effectively avoided, the design complexity of the system is reduced, and the hardware is improved. system performance.
Description
技术领域:Technical field:
本发明涉及微处理器体系结构领域,特别涉及硬件多线程的控制方法及其装置。The invention relates to the field of microprocessor architecture, in particular to a hardware multi-thread control method and a device thereof.
背景知识:background knowledge:
为了进一步提高微处理器性能,已提出多种新颖的体系结构,如多核、多线程、流处理、PIM、可重构、多态等。这些新的体系结构从不同角度对微处理器发展中的问题提出了解决方法。受到软件程序特性影响和硬件工艺的物理限制,未来体系结构的发展已经不是仅仅通过提高主频就可以获得系统性能的大幅提升。体系结构技术发展的趋势是明显的,多线程和多核,成为两个关键技术方向,在各个领域,含有多线程或多核特征的处理器层出不穷。流水线技术是RISC处理器区别于CISC处理器的重要特征。采用深度流水技术,在指令相关和指令跳转时会大大降低流水线的性能。本发明在RISC架构基础上,采用深度流水技术,实现多硬件线程执行,有效的避免指令相关带来的性能减低,并通过多线程提高微处理器的性能。In order to further improve the performance of microprocessors, a variety of novel architectures have been proposed, such as multi-core, multi-thread, stream processing, PIM, reconfigurable, polymorphic, etc. These new architectures propose solutions to the problems in the development of microprocessors from different angles. Affected by the characteristics of software programs and the physical limitations of hardware technology, the development of future architecture is no longer able to obtain a substantial improvement in system performance just by increasing the main frequency. The development trend of architecture technology is obvious. Multi-thread and multi-core have become two key technical directions. In various fields, processors with multi-thread or multi-core features emerge in an endless stream. Pipeline technology is an important feature that distinguishes RISC processors from CISC processors. The use of deep pipeline technology will greatly reduce the performance of the pipeline during instruction correlation and instruction jump. Based on the RISC architecture, the present invention adopts deep pipeline technology to realize multi-hardware thread execution, effectively avoids the performance reduction caused by instruction correlation, and improves the performance of the microprocessor through multi-threading.
2008-7-16公开的专利:CN101221493A,题目:《并行处理器中的多线程执行》,发明人:D·伯恩斯坦因等。该发明公开了一种并行的硬件多线程处理器。该处理器包含一个谐调系统功能的通用处理器和支持多个硬件线程和多个微引擎。该处理器还包含具有第1存储控制器和第2存储控制器的存储器控制系统,第1存储控制器根据存储器访问是指向偶数存储组还是指向奇数存储组,将存储器访问加以分类,第2存储控制器则根据存储器访问是读访问还是写访问,对存储器访问进行优化。Patent published on July 16, 2008: CN101221493A, title: "Multithreaded Execution in Parallel Processors", inventor: D. Bernstein, etc. The invention discloses a parallel hardware multi-thread processor. The processor contains a general-purpose processor that coordinates system functions and supports multiple hardware threads and multiple microengines. The processor also includes a memory control system having a first memory controller and a second memory controller, the first memory controller classifying memory accesses according to whether the memory access is directed to an even bank or an odd bank, and a second memory controller The controller optimizes memory accesses based on whether the memory access is a read or write access.
2007-8-22公开的专利:CN101021801,题目:《流水线多进程之间基于消息队列的海量数据传输方法》,发明人:薛庆童等。该专利公开了一种流水线多进程之间基于消息队列的海量数据传输的方法。在计费产品业务处理流程中一条话单顺序通过至少分为格式化、规整(或者称为分拣)、排重、批价、入库多个进程,计费方法采用单步与整体提交相结合的机制,通过配置来实现在不同环境下采用不同消息队列类型自动分配任务,负载均衡管理,将话单分发至不相同的消息队列,根据业务逻辑,自定义方式进行部署。采用该发明方法实现的计费系统海量的话单数据在进程间的传输,全部通过消息队列,处理过程可以在内存里面实现,没有系统IO的开销,速度大大提高。采用基于消息队列的流水线并行处理技术方案的系统处理效率明显提高。处理速度在国内外计费厂家中名列前茅。Patent published on 2007-8-22: CN101021801, title: "Multiple data transmission method based on message queue between multiple pipeline processes", inventor: Xue Qingtong, etc. This patent discloses a method for mass data transmission based on message queues between pipelined multi-processes. In the business processing flow of billing products, the sequence of a bill is at least divided into formatting, ordering (or sorting), weight sorting, price approval, and warehousing. The billing method adopts single-step and overall submission. The combined mechanism is configured to automatically assign tasks in different environments using different message queue types, load balance management, distribute bills to different message queues, and deploy in a custom way according to business logic. The billing system realized by the method of the invention transmits a large amount of bill data between processes, all through the message queue, and the processing process can be realized in the memory without system IO overhead, and the speed is greatly improved. The processing efficiency of the system using the pipeline parallel processing technology scheme based on the message queue is obviously improved. The processing speed ranks among the best among billing manufacturers at home and abroad.
2006-1-25公开的专利:CN1725176,题目:《多线程流水线指令解码器的方法和设备》,发明人:J·P·杜格拉斯等。该专利公开了一种多线程流水线指令解码器的方法,使用多线程传输的指令解码器计时、清除和延迟多线程机器中解码流水线的指令,可以获得最佳的性能和最小的功耗。一个映像流水线映像保持线程标识的指令解码流水线和指令解码器每个流水线阶段有效指令比特。线程标识和有效的比特用于控制对指令解码器中每个流水线阶段的计时、清除和延迟。一个线程指令能够被清除而不与在解码流水线的其它线程指令发生冲突,在一些情况下,一个线程的指令能够被延迟而不与在解码流水线的其它线程指令发生冲突。本发明中,仅仅当有效指令需要前进以便保持功率和最小化延迟时计时流水线阶段。Patent published on January 25, 2006: CN1725176, title: "Method and Device for Multithreaded Pipeline Instruction Decoder", inventor: J·P·Douglas, etc. This patent discloses a method for a multi-threaded pipeline instruction decoder. Using the multi-threaded transmission instruction decoder to time, clear and delay the decoding pipeline instructions in a multi-threaded machine can obtain the best performance and minimum power consumption. An image pipeline image holds thread-identified instruction decode pipeline and instruction decoder valid instruction bits per pipeline stage. Thread ID and valid bits are used to control the timing, flushing, and latency of each pipeline stage in the instruction decoder. A thread instruction can be flushed without conflicting with other thread instructions in the decode pipeline, and in some cases, a thread's instruction can be delayed without conflicting with other thread instructions in the decode pipeline. In the present invention, pipeline stages are only clocked when active instructions need to advance in order to preserve power and minimize latency.
1999-9-15公开的专利:CN1228557,题目:《计算机处理器多线程指令级并行技术》,发明人:刘殷等。该专利公开了一种计算机处理器多线程指令级并行技术。该发明涉及一种可应用于计算机处理器的技术:多线程指令级并行技术。采用该技术的计算机处理器可以轮流地从处于执行状态的线程中取指令,使得在计算机处理器中并行执行的若干指令分别来自于不同的线程,因而这些指令间不存在“指令间依赖性问题”。Patent published on September 15, 1999: CN1228557, title: "Multithreaded Instruction Level Parallel Technology for Computer Processors", inventor: Liu Yin, etc. This patent discloses a computer processor multi-thread instruction level parallel technology. The invention relates to a technology applicable to computer processors: multi-thread instruction level parallel technology. The computer processor using this technology can fetch instructions from the threads in the execution state in turn, so that several instructions executed in parallel in the computer processor come from different threads, so there is no "inter-instruction dependency problem" between these instructions. ".
2007-6-6公开的专利:CN1975663,题目:《具有用于不同线程的非对称硬件多线程支持的装置》,发明人:戴维·A·克拉。该专利提供了一种用于特定类线程的非对称硬件支持。优选地,该特定类线程是高优先级的I/O绑定线程。在第一个方面中,多线程处理器包括用于支持N个线程的并发执行的N组寄存器。至少一个寄存器组专门用于特定类的线程,并且不能为其它线程所使用,即使在闲置时也是如此。在第二方面中,特定类的线程仅可填充超高速缓冲器存储器的有限的部分,以便减少否则可能出现的超高速缓冲器的刷新。Patent published on June 6, 2007: CN1975663, title: "A device with asymmetric hardware multi-thread support for different threads", inventor: David A. Carat. The patent provides an asymmetric hardware support for a specific class of threads. Preferably, the specific class of threads is a high priority I/O bound thread. In a first aspect, a multithreaded processor includes N sets of registers to support concurrent execution of N threads. At least one register bank is dedicated to a particular class of thread and cannot be used by other threads, even when idle. In a second aspect, a particular class of threads can only fill a limited portion of the cache memory in order to reduce flushing of the cache that might otherwise occur.
2005-12-14公开的专利:CN1707694,题目:《用于多线程流水线总线系统的存储控制器》,发明人:徐允范等。该专利公开了一种用于多线程流水线总线系统的存储控制器,在多线程流水线系统的存储控制方法中,从主机顺序接收存储单元中待访问的多排的地址。对于该多排中的每一排,判定当读/写命令输出到存储单元时对应于该排的地址是否从主机输入。当该判定结果表明对应于该排的地址已输入时,向该存储单元输出包括公开页信息和自动预充电信息中任一种在内的读/写命令。Patent published on December 14, 2005: CN1707694, title: "Memory Controller for Multithreaded Pipeline Bus System", inventor: Xu Yunfan, etc. This patent discloses a storage controller for a multi-thread pipeline bus system. In the storage control method of the multi-thread pipeline system, addresses of multiple banks to be accessed in the storage unit are sequentially received from the host. For each of the plurality of banks, it is determined whether or not an address corresponding to the bank is input from the host when a read/write command is output to the memory cell. When the result of the determination indicates that the address corresponding to the bank has been input, a read/write command including any one of open page information and automatic precharge information is output to the memory cell.
1999-10-20公开的专利:CN1232219,题目:《流水线型多处理器系统》,发明人:小池庸夫。该专利公开了一种流水线型多处理器系统,包括一组处理器单元,一组缓冲器及调试单元。该处理器单元用于流水线处理数据;该缓冲器保持输入数据和每个处理器单元的处理结果;缓冲器和处理器单元在数据输入和输出之间依次级联,调试单元用于可选择地在外部输出每个处理器单元的处理结果,以在调试时进行监控。Patent published on 1999-10-20: CN1232219, title: "Pipeline Multi-processor System", inventor: Yoshio Koike. This patent discloses a pipelined multiprocessor system, which includes a set of processor units, a set of buffers and a debugging unit. The processor unit is used for pipeline processing data; the buffer holds the input data and the processing results of each processor unit; the buffer and processor unit are sequentially cascaded between data input and output, and the debug unit is used to optionally The processing results of each processor unit are output externally for monitoring during debugging.
发明内容:Invention content:
本发明的目的是为软件多线程程序的执行设计一种用于微处理器的硬件多线程控制方法及相应的硬件多线程控制装置。The object of the present invention is to design a hardware multi-thread control method for a microprocessor and a corresponding hardware multi-thread control device for the execution of software multi-thread programs.
一种用于微处理器的硬件多线程控制方法,其特征在于该方法包括以下步骤:A hardware multi-thread control method for a microprocessor, characterized in that the method comprises the following steps:
1)多线程取指步骤,用于各个线程的指令读取,各个线程的指令地址产生。具体包括多线程指令地址控制、多线程指令地址缓存、多线程取指。1) The multi-thread instruction fetching step is used to read the instructions of each thread and generate the instruction address of each thread. Specifically, it includes multi-thread instruction address control, multi-thread instruction address cache, and multi-thread instruction fetch.
a)多线程指令地址控制,用于产生各个线程的指令地址,当某个线程阻塞时,仅阻塞该线程指令地址,其它线程指令地址正常更新;a) Multi-thread instruction address control, used to generate the instruction address of each thread, when a certain thread is blocked, only the thread instruction address is blocked, and other thread instruction addresses are updated normally;
b)多线程指令地址缓存,用于存储n个线程的指令地址;b) multi-thread instruction address cache, used to store the instruction addresses of n threads;
c)多线程取指,用于将取指逻辑对称划分成n级流水,取出n个硬件线程所对应的软件线程指令,当某个线程阻塞时,仅阻塞该线程取指,其它线程取指正常运行。c) Multi-thread instruction fetch, which is used to symmetrically divide the instruction fetch logic into n-level pipelines, and take out the software thread instructions corresponding to n hardware threads. When a thread is blocked, only the thread fetch is blocked, and other threads fetch instructions normal operation.
2)多线程译码步骤,用于对各个硬件线程的指令进行译码,准备好多线程执行步骤所需要的寄存器数据。具体包括多线程译码、多线程寄存器操作数准备、译码部件数据旁路控制:2) The multi-thread decoding step is used to decode the instructions of each hardware thread, and prepare the register data required by the multi-thread execution step. It specifically includes multi-threaded decoding, multi-threaded register operand preparation, and data bypass control of decoding components:
a)多线程译码,用于将译码逻辑对称划分成n级流水,完成多线程初始化专用指令译码、常规指令译码。当某个线程阻塞时,仅阻塞该线程指令译码,其它线程指令译码正常运行。所述的多线程初始化专用指令包括:用于标识该指令的操作码域,用于操作目的操作数的操作数域,用于操作源操作数的操作数域;a) Multi-thread decoding, which is used to symmetrically divide the decoding logic into n-level pipelines, and complete multi-thread initialization special instruction decoding and conventional instruction decoding. When a certain thread is blocked, only the instruction decoding of this thread is blocked, and the instruction decoding of other threads runs normally. The multi-thread initialization special instruction includes: an opcode field for identifying the instruction, an operand field for operating the destination operand, and an operand field for operating the source operand;
b)多线程寄存器操作数准备,用于产生待读取寄存器地址,并从n个寄存器组中读取指令所需操作数;b) Multi-thread register operand preparation, used to generate the address of the register to be read, and read the required operand of the instruction from n register groups;
c)译码部件数据旁路控制,用于将数据旁路的数据提供给指令译码的某个流水阶段;c) Data bypass control of the decoding component, used to provide the data of the data bypass to a certain pipeline stage of instruction decoding;
3)多线程执行步骤,用于执行各个线程指令。具体包括多线程初始化专用指令执行、线程号缓存、执行部件数据旁路控制、多线程常规指令执行:3) The multi-thread execution step is used to execute instructions of each thread. Specifically, it includes execution of special instructions for multi-thread initialization, thread number cache, data bypass control of execution components, and execution of multi-thread conventional instructions:
a)多线程初始化专用指令执行,用于产生新硬件线程号,新硬件线程号对应于该硬件线程所执行的软件线程;a) execution of multi-thread initialization special instructions for generating a new hardware thread number, and the new hardware thread number corresponds to the software thread executed by the hardware thread;
b)线程号缓存,用于将所述多线程初始化专用指令执行产生的新硬件线程号缓存,使得产生的某新硬件线程号在线程号寄存器序列中的位置与该硬件线程在指令地址寄存器序列、取指部件多线程寄存器序列、译码部件多线程寄存器序列、执行部件多线程寄存器序列、访存部件多线程寄存器序列、写回部件多线程寄存器序列中的位置一致;b) thread number cache, which is used to cache the new hardware thread number generated by the execution of the multi-thread initialization special instruction, so that the position of a certain new hardware thread number generated in the thread number register sequence is the same as that of the hardware thread in the instruction address register sequence , the multi-threaded register sequence of the fetching part, the multi-threaded register sequence of the decoding part, the multi-threaded register sequence of the execution part, the multi-threaded register sequence of the accessing part, and the multi-threaded register sequence of the write-back part are consistent in position;
c)执行部件数据旁路控制,用于将数据旁路的数据提供给指令执行的某个流水阶段;c) Execution component data bypass control, used to provide data bypassed to a certain pipeline stage of instruction execution;
d)多线程常规指令执行,用于将执行逻辑对称划分成n级流水,完成n个硬件线程的常规指令执行,当某个线程阻塞时,仅阻塞该线程指令执行,其它线程指令执行正常运行。d) Multi-threaded regular instruction execution, used to symmetrically divide the execution logic into n-level pipelines, and complete the regular instruction execution of n hardware threads. When a thread is blocked, only the thread instruction execution is blocked, and other thread instructions execute normally. .
4)多线程访存步骤,用于将访存逻辑对称划分成n级流水,将各个线程的执行结果写入到存储器或者从存储器读入线程所需数据,当某个线程阻塞时,仅阻塞该线程数据访存,其它线程数据访存正常运行。4) The multi-thread memory access step is used to symmetrically divide the memory access logic into n-level pipelines, write the execution results of each thread into the memory or read the data required by the thread from the memory, and when a thread is blocked, only block This thread data access, other thread data access works normally.
5)多线程写回步骤,用于将写回逻辑对称划分成n级流水,将各个线程的执行结果写回到对应寄存器组,当某个线程阻塞时,仅阻塞该线程数据写回,其它线程数据写回正常运行。具体包括多线程写回数据控制、多线程写回寄存器地址控制:5) The multi-threaded write-back step is used to symmetrically divide the write-back logic into n-level pipelines, and write the execution results of each thread back to the corresponding register group. When a certain thread is blocked, only the thread data write-back is blocked, and other Thread data writeback works normally. Specifically, it includes multi-thread write-back data control and multi-thread write-back register address control:
a)多线程写回数据控制,用于待写回寄存器数据的准备和输出;a) Multi-threaded write-back data control, used for preparation and output of register data to be written back;
b)多线程写回寄存器地址控制,用于待写回寄存器地址的准备和输出。b) Multi-threaded write-back register address control, used for preparing and outputting the address of the register to be written back.
一种用于微处理器的硬件多线程控制装置,该装置通过采用流水线技术,支持n个硬件线程并行执行。其特征在于包括以下部件:硬件多线程取指器件、硬件多线程译码器件、硬件多线程执行器件、硬件多线程访存器件、硬件多线程写回器件、硬件多线程寄存器组和多线程控制器件。A hardware multi-thread control device for a microprocessor, the device supports parallel execution of n hardware threads by adopting pipeline technology. It is characterized in that it includes the following components: hardware multi-thread instruction fetching device, hardware multi-thread decoding device, hardware multi-thread execution device, hardware multi-thread memory access device, hardware multi-thread write-back device, hardware multi-thread register group and multi-thread control device.
1)所述硬件多线程取指器件包括:指令地址控制器件,用于产生各个线程的指令地址;指令地址寄存器序列,用于存储n个线程的指令地址;取指部件多线程寄存器序列,用于暂存取指逻辑n级流水的中间结果,每级寄存器对应一个硬件线程的部分取指逻辑输出;1) The hardware multi-thread instruction fetching device includes: an instruction address control device, which is used to generate the instruction address of each thread; the instruction address register sequence, which is used to store the instruction addresses of n threads; the instruction fetch part multi-thread register sequence, used To temporarily store the intermediate results of the n-level pipeline of the instruction fetch logic, each stage of register corresponds to the partial instruction fetch logic output of a hardware thread;
2)所述硬件多线程译码器件包括:译码部件多线程寄存器序列,用于暂存译码逻辑n级流水的中间结果,每级寄存器对应一个硬件线程的部分译码逻辑输出;数据旁路寄存器序列,用于存储各个线程前两条指令的中间执行结果;2) the hardware multi-thread decoding device includes: a decoding part multi-thread register sequence, which is used to temporarily store the intermediate results of the n-level pipeline of decoding logic, and each level of register corresponds to the partial decoding logic output of a hardware thread; Road register sequence, used to store the intermediate execution results of the first two instructions of each thread;
3)所述硬件多线程执行器件包括:多线程初始化专用指令执行器件,用于产生硬件线程号,该硬件线程号对应于该硬件线程所执行的软件线程;线程号寄存器序列,用于缓存产生的硬件线程号;执行部件多线程寄存器序列,用于暂存执行逻辑n级流水的中间结果,且每级寄存器对应一个硬件线程的部分执行逻辑的输出;数据旁路寄存器序列,用于存储各个线程前两条指令的中间执行结果;3) The hardware multi-thread execution device includes: a multi-thread initialization special instruction execution device, which is used to generate a hardware thread number, which corresponds to the software thread executed by the hardware thread; a thread number register sequence, which is used for buffer generation The hardware thread number; the multi-threaded register sequence of the execution unit is used to temporarily store the intermediate results of the execution logic n-level pipeline, and each level of register corresponds to the output of a part of the execution logic of a hardware thread; the data bypass register sequence is used to store each The intermediate execution results of the first two instructions of the thread;
4)所述硬件多线程访存器件包括:访存部件多线程寄存器序列,用于暂存访存逻辑n级流水的中间结果,每级寄存器对应一个硬件线程的部分访存逻辑输出;4) The hardware multi-threaded memory access device includes: a multi-threaded register sequence of the memory access part, which is used to temporarily store the intermediate results of the n-level pipeline of the memory access logic, and each level of register corresponds to the partial memory access logic output of a hardware thread;
5)所述硬件多线程写回器件包括:写回部件多线程寄存器序列,用于暂存写回逻辑n级流水的中间结果,每级寄存器对应一个硬件线程的部分写回逻辑输出;5) The hardware multi-thread write-back device includes: a write-back component multi-thread register sequence, which is used to temporarily store the intermediate results of the write-back logic n-level pipeline, and each level of register corresponds to a part of the write-back logic output of a hardware thread;
6)所述硬件多线程寄存器组包括:译码器,根据多线程控制器件产生的寄存器控制信号以及硬件多线程译码器件或硬件多线程写回器件产生的地址信号,进行译码,输出当前线程的寄存器组使能信号和当前操作的寄存器地址;多路选通器,根据多线程控制器件产生的寄存器控制信号,选通当前线程的寄存器组的数据,并将其输出;n个寄存器组,分别提供给n个线程使用,各自独立,写入数据来自硬件多线程写回器件的输出数据,读出数据送给硬件多线程译码器件;6) The hardware multi-thread register group includes: a decoder, which decodes according to the register control signal generated by the multi-thread control device and the address signal generated by the hardware multi-thread decoding device or the hardware multi-thread write-back device, and outputs the current Register group enable signal of thread and register address of current operation; multiplexer, according to the register control signal generated by the multi-thread control device, selects the data of the register group of the current thread and outputs it; n register groups , respectively provided to n threads, each independently, the written data comes from the output data of the hardware multi-threaded write-back device, and the read data is sent to the hardware multi-threaded decoding device;
7)所述多线程控制器件,用于产生以下控制信号:产生取指控制信号,输出给所述硬件多线程取指器件,产生译码控制信号,输出给所述硬件多线程译码器件,产生执行控制信号,输出给所述硬件多线程执行器件,产生访存控制信号,输出给所述硬件多线程访存器件,产生写回控制信号,输出给所述硬件多线程写回器件,产生寄存器控制信号,输出给所述硬件多线程寄存器组。7) The multi-thread control device is used to generate the following control signals: generate an instruction fetch control signal, output to the hardware multi-thread instruction fetch device, generate a decoding control signal, and output to the hardware multi-thread decoding device, Generate an execution control signal, output it to the hardware multi-thread execution device, generate a memory access control signal, output it to the hardware multi-thread memory access device, generate a write-back control signal, output it to the hardware multi-thread write-back device, and generate The register control signal is output to the hardware multi-thread register set.
本发明的一个优点是针对软件多线程程序,可以利用处理器硬件多线程执行,执行时有效地隐藏了访存延迟,略去了线程切换时线程相关信息的保存与恢复,减少了线程切换的开销,从而提高了程序的执行效率,降低功耗。An advantage of the present invention is that for software multi-thread programs, processor hardware multi-threads can be used to execute, effectively hide memory access delays during execution, save and restore thread-related information when thread switching is omitted, and reduce the cost of thread switching. overhead, thereby improving the execution efficiency of the program and reducing power consumption.
本发明的另一个优点是通过采用流水线技术,使得原来执行一个线程的时间内现在可以并行执行n个线程,从硬件上提高了程序的执行效率。Another advantage of the present invention is that through the use of pipeline technology, n threads can be executed in parallel within the original execution time of one thread, which improves the execution efficiency of the program from the hardware.
本发明的另一个优点是通过硬件多线程有效规避了深度流水带来的数据相关性风险,降低了系统的设计复杂性,并提高了系统的执行效率。Another advantage of the present invention is that the risk of data correlation brought by deep pipelining is effectively avoided through hardware multithreading, the design complexity of the system is reduced, and the execution efficiency of the system is improved.
附图说明Description of drawings
图1是典型MIPS处理器流水体系结构图。Figure 1 is a typical MIPS processor pipeline architecture diagram.
图2是用于微处理器的硬件多线程控制装置图。Fig. 2 is a diagram of a hardware multi-thread control device for a microprocessor.
图3是硬件多线程装置中的硬件多线程取指器件图。Fig. 3 is a diagram of a hardware multithread instruction fetching device in a hardware multithread device.
图4是硬件多线程装置中的硬件多线程译码器件图。Fig. 4 is a diagram of a hardware multi-thread decoding device in a hardware multi-thread device.
图5是硬件多线程装置中的硬件多线程执行器件图。Fig. 5 is a diagram of hardware multi-thread execution devices in a hardware multi-thread device.
图6是硬件多线程装置中的硬件多线程访存器件图。Fig. 6 is a diagram of a hardware multithread memory access device in a hardware multithread device.
图7是硬件多线程装置中的硬件多线程写回器件图。FIG. 7 is a device diagram of hardware multi-thread write-back in a hardware multi-thread device.
图8是硬件多线程装置中的硬件多线程寄存器组说明图。FIG. 8 is an explanatory diagram of a hardware multithreading register set in a hardware multithreading device.
图9是多线程初始化专用指令编码格式。Fig. 9 is the encoding format of special instructions for multi-thread initialization.
图10是用于微处理器的硬件多线程控制方法步骤图。Fig. 10 is a step diagram of a hardware multi-thread control method for a microprocessor.
图11是硬件多线程流水线拆分及时钟图。Figure 11 is a hardware multi-thread pipeline split and clock diagram.
图12是硬件多线程执行时序示意图。FIG. 12 is a schematic diagram of hardware multi-thread execution timing.
具体实施方式Detailed ways
下面结合附图,对本发明的实现进行详细描述。The implementation of the present invention will be described in detail below in conjunction with the accompanying drawings.
图1所示是典型MIPS处理器流水体系结构图。将一条指令的执行划分为取指(IF)、译码(ID)、执行(EX)、访存(MEM)和写回(WB)五级流水,本专利即在此流水基础上进行设计,采用加深流水级数,支持硬件多线程执行;Figure 1 shows a typical MIPS processor pipeline architecture diagram. The execution of an instruction is divided into five levels of pipelines: fetch (IF), decode (ID), execute (EX), memory access (MEM) and write back (WB). This patent is designed on the basis of this pipeline. Adopt deepened pipeline series to support hardware multi-threaded execution;
图2是本发明用于微处理器的硬件多线程控制装置总体结构示意图。包括硬件多线程取指器件201,硬件多线程译码器件202,硬件多线程执行器件203,硬件多线程访存器件204,硬件多线程写回器件205、硬件多线程寄存器组206,多线程控制器件207。每个器件都支持n个硬件线程并行执行,且各器件之间是同步的。Fig. 2 is a schematic diagram of the overall structure of the hardware multi-thread control device used in the microprocessor according to the present invention. Including hardware multi-thread
硬件多线程取指器件201,根据多线程控制器件207输出的取指控制信号,完成n个硬件线程指令地址的更新操作、实现指令地址的存储、完成对n个硬件线程的取指操作,将指令输出到硬件多线程译码器件202;硬件多线程译码器件202,根据硬件多线程取指器件201输出的n个线程的指令和多线程控制器件207产生的译码控制信号,完成n个硬件线程的译码操作,将待操作的操作控制信息和数据信息输出给硬件多线程执行器件203;硬件多线程执行器件203,根据硬件多线程译码器件202输出的操作控制信息和数据信息、多线程控制器件207产生的执行控制信号,完成n个线程的指令执行,将执行结果的数据信息输出给硬件多线程访存器件204;硬件多线程访存器件204,根据多线程控制器件207产生的访存控制信号,完成n个硬件线程的存储器访问操作,并将来自多线程执行器件203的数据信息输出给硬件多线程写回器件205;硬件多线程写回器件205,根据多线程控制器件207产生的写回控制信号,将待写回的数据信息和当前待写回的寄存器地址输出给硬件多线程寄存器组206;硬件多线程寄存器组206,根据多线程控制器件207输出的寄存器控制信号,配合硬件多线程译码器件202和硬件多线程写回器件205,完成n个硬件线程的寄存器组读写操作;多线程控制器件207,控制整个硬件多线程装置的各器件执行,具体产生以下控制信号:产生取指控制信号输出给硬件多线程取指器件201,产生译码控制信号输出给硬件多线程译码器件202、产生执行控制信号输出给硬件多线程执行器件203、产生访存控制信号输出给硬件多线程访存器件204、产生写回控制信号输出给硬件多线程写回器件205,产生寄存器控制信号输出给硬件多线程寄存器组206。The hardware multithread instruction fetching device 201, according to the instruction fetch control signal output by the multithread control device 207, completes the update operation of n hardware thread instruction addresses, realizes the storage of instruction addresses, and completes the instruction fetch operation to n hardware threads, and The instruction is output to the hardware multi-thread decoding device 202; the hardware multi-thread decoding device 202, according to the instructions of the n threads output by the hardware multi-thread instruction fetching device 201 and the decoding control signal generated by the multi-thread control device 207, completes n The decoding operation of the hardware thread outputs the operation control information and data information to be operated to the hardware multi-thread execution device 203; the hardware multi-thread execution device 203, according to the operation control information and data information output by the hardware multi-thread decoding device 202, The execution control signal generated by the multi-thread control device 207 completes the instruction execution of n threads, and outputs the data information of the execution result to the hardware multi-thread memory access device 204; the hardware multi-thread memory access device 204 generates according to the multi-thread control device 207 The memory access control signal of n hardware threads is completed, and the data information from the multi-thread execution device 203 is output to the hardware multi-thread write-back device 205; the hardware multi-thread write-back device 205, according to the multi-thread control device The write-back control signal generated by 207 outputs the data information to be written back and the current register address to be written back to the hardware multi-thread register group 206; the hardware multi-thread register group 206, according to the register control signal output by the multi-thread control device 207 Cooperate with the hardware multi-thread decoding device 202 and the hardware multi-thread write-back device 205 to complete the register group read and write operations of n hardware threads; the multi-thread control device 207 controls the execution of each device of the entire hardware multi-thread device, specifically producing the following Control signal: generate instruction fetch control signal and output to hardware multi-thread instruction fetch device 201, generate decoding control signal and output to hardware multi-thread decoding device 202, generate execution control signal and output to hardware multi-thread execution device 203, generate memory access control The signal is output to the hardware multi-thread memory access device 204 , the write-back control signal is generated and output to the hardware multi-thread write-back device 205 , and the register control signal is generated and output to the hardware multi-thread register group 206 .
图3是配合图2用于微处理器的硬件多线程控制装置总体结构图的硬件多线程取指器件201结构图。硬件多线程取指器件201包括:指令地址控制器件301、指令地址寄存器序列302、取指部件多线程寄存器序列303。指令地址控制器件301控制产生各个线程的指令地址,每次产生对应于图3取指逻辑IF1中所执行线程的下一条指令地址,假设某时刻t时IF1中所执行线程序号为i,则该时钟周期中指令地址控制器件301产生线程i顺序执行的下一条指令地址,该地址在下一个时钟周期t+1被输出到序列302的pc_DFF_1中,在t+1周期pc_DFF_n在t周期的值(即线程i+1当前执行的指令地址)被取到指令地址控制器件301中,产生线程i+1顺序执行的下一条指令地址;指令地址寄存器序列302存储n个线程的指令地址,从pc_DFF_1寄存器到pc_DFF_n寄存器依次存储为:线程j、线程j-1、…、线程1、线程n、线程n-1、…、线程j+1的下一条指令地址;取指部件多线程寄存器序列303暂存取指逻辑n级流水的中间结果,每级寄存器中存放部分取指逻辑IFj(j=1,2,...,n)的输出,且取指部件多线程寄存器序列303中从IF_DFF_1寄存器到IF_DFF_n寄存器所存数据对应的线程号与指令地址寄存器序列302中从pc_DFF_1寄存器到pc_DFF_n寄存器所对应的线程号一致,即若pc_DFF_k寄存器中存放线程i的指令地址,则IF_DFF_k寄存器存放线程i的IFk取指逻辑输出。FIG. 3 is a structural diagram of a hardware multithread
图4是配合图2用于微处理器的硬件多线程控制装置总体结构图的硬件多线程译码器件202结构图。硬件多线程译码器件202包括:译码部件多线程寄存器序列401、译码部件数据旁路寄存器序列402。译码部件多线程寄存器序列401暂存译码逻辑n级流水的中间结果,每级寄存器中存放部分译码逻辑IDj的输出,且译码部件多线程寄存器序列401中从ID_DFF_1寄存器到ID_DFF_n寄存器所存数据对应的线程号与取指部件多线程寄存器序列303中从IF_DFF_1寄存器到IF_DFF_n寄存器对应的线程号一致,即若IF_DFF_k寄存器中存放线程i的IFk取指逻辑输出,则ID_DFF_k寄存器存放线程i的IDk译码逻辑输出;译码部件数据旁路寄存器序列402存储n个线程前两条指令的中间执行结果。FIG. 4 is a structural diagram of a hardware
图5是配合图2用于微处理器的硬件多线程控制装置总体结构图的硬件多线程执行器件203结构图。硬件多线程执行器件203包括:多线程初始化专用指令执行器件501、线程号寄存器序列502、执行部件多线程寄存器序列503、执行部件数据旁路寄存器序列504。多线程初始化专用指令执行器件501产生多线程的硬件线程号;线程号寄存器序列502为n级寄存器缓存,将产生的某硬件线程的线程号在寄存器序列502中位置Th_DFF_i与该硬件线程在指令地址寄存器序列(302)、取指部件多线程寄存器序列(303)、译码部件多线程寄存器序列(401)、执行部件多线程寄存器序列(503)、访存部件多线程寄存器序列(601)、写回部件多线程寄存器序列(701)中的位置一致;执行部件多线程寄存器序列503暂存执行逻辑n级流水的中间结果,每级寄存器中存放部分执行逻辑Ej的输出,且执行部件多线程寄存器序列503中从E_DFF_1寄存器到E_DFF_n寄存器所存数据对应的线程号与取指部件多线程寄存器序列303中从IF_DFF_1寄存器到IF_DFF_n寄存器所对应的线程号一致,即若IF_DFF_k寄存器中存放线程i的IFk取指逻辑输出,则E_DFF_k寄存器存放线程i的Ek执行逻辑输出;执行部件数据旁路寄存器序列504存储n个线程前两条指令的中间执行结果。FIG. 5 is a structural diagram of a hardware
图6是配合图2用于微处理器的硬件多线程控制装置总体结构图的硬件多线程访存器件204结构图。硬件多线程访存器件204包括:访存部件多线程寄存器序列601,该序列暂存访存逻辑n级流水的中间结果,每级寄存器中存放部分访存逻辑Mj的输出,且访存部件多线程寄存器序列601中从M_DFF_1寄存器到M_DFF_n寄存器所存数据对应的线程号与取指部件多线程寄存器序列303中从IF_DFF_1寄存器到IF_DFF_n寄存器所对应的线程号一致,即若IF_DFF_k寄存器中存放线程i的IFk取指逻辑输出,则M_DFF_k寄存器存放线程i的Mk执行逻辑输出。FIG. 6 is a structural diagram of a hardware multithread
图7是配合图2用于微处理器的硬件多线程控制装置总体结构图的硬件多线程写回器件205结构图。硬件多线程写回器件205包括:写回部件多线程寄存器序列701,该序列暂存写回逻辑n级流水的中间结果,每级寄存器中存放部分写回逻辑Wj的输出,且写回部件多线程寄存器序列701中从W_DFF_1寄存器到W_DFF_n寄存器所存数据对应的线程号与取指部件多线程寄存器序列303中从IF_DFF_1寄存器到IF_DFF_n寄存器所对应的线程号一致,即若IF_DFF_k寄存器中存放线程i的IFk取指逻辑输出,则W_DFF_k寄存器存放线程i的Wk写回逻辑输出。FIG. 7 is a structural diagram of a hardware multithread write-
图8是配合图2用于微处理器的硬件多线程控制装置总体结构图的硬件多线程寄存器组206结构图。硬件多线程寄存器组206包括:译码器801、多路选通器802、n个寄存器组803。译码器801根据多线程控制器件207产生的寄存器控制信号和硬件多线程译码器件202或硬件多线程写回器件205产生的地址信号进行译码,输出使当前线程的寄存器组Regs_i有效的使能信号和当前操作的寄存器地址;多路选通器802,根据多线程控制器件207产生的寄存器控制信号,选通当前线程的某个寄存器组的数据输出;n个寄存器组803,分别提供给n个线程使用,各自独立,写入数据来自硬件多线程写回器件205的输出数据,读出数据送给硬件多线程译码器件202。FIG. 8 is a structural diagram of a hardware multithreading register set 206 in conjunction with the overall structural diagram of the hardware multithreading control device for a microprocessor in FIG. 2 . The hardware multi-thread register set 206 includes: a
本发明可在微处理器上实现硬件多线程执行,通过提供一种多线程初始化专用指令来完成软件多线程到硬件多线程的映射,图9是所提供的硬件多线程初始化专用指令的编码格式。包括:用于标识该指令的操作码域901;用于操作目的操作数的操作数域902,此目的操作数可以是该硬件线程对应的寄存器组中某个寄存器,用来在初始化阶段标识该硬件线程号;用于操作源操作数的操作数域903,此源操作数可以是新产生的一个线程号。The present invention can realize hardware multi-thread execution on a microprocessor, and completes the mapping from software multi-thread to hardware multi-thread by providing a multi-thread initialization special instruction, and Fig. 9 is the encoding format of the provided hardware multi-thread initialization special instruction . Including: an
图10是硬件多线程控制方法步骤图。多线程取指步骤1001,用于各个线程的指令读取,各个线程的指令地址产生。包括多线程指令地址控制、多线程指令地址缓存、多线程取指。多线程译码步骤1002,用于对各个硬件线程的指令进行译码,准备好执行步骤所需要的寄存器数据。包括多线程译码、多线程寄存器操作数准备、译码部件数据旁路控制。多线程执行步骤1003,用于执行各个线程指令。包括多线程初始化专用指令执行、线程号缓存、执行部件数据旁路控制、多线程常规指令执行。多线程访存步骤1004,用于各个线程的执行结果写入到存储器或者从存储器读入线程所需数据。多线程写回步骤1005,用于各个线程的执行结果写回到寄存器,包括多线程写回数据控制、多线程写回寄存器地址控制。FIG. 10 is a step diagram of a hardware multi-thread control method. The multi-thread
下面以硬件4线程为例(n=4),对本发明的实现时序进行描述。Taking 4 hardware threads as an example (n=4) below, the implementation sequence of the present invention will be described.
图11是硬件多线程流水线划分及时钟示意图。以101取指逻辑为例,图11中将101取指逻辑IF对称划分为4级流水,则每级逻辑的延时变为单级流水时的1/4,如果图1中IF逻辑的时钟周期为T,对应的时钟为图11中的clk1,则硬件四线程的时钟可以如clk2所示,频率为clk1的4倍,使得单级流水时执行一个线程的时间内现在硬件四线程可以并行执行4个线程。Fig. 11 is a schematic diagram of hardware multi-thread pipeline division and clock. Taking 101 instruction fetch logic as an example, in Figure 11, the 101 instruction fetch logic IF is symmetrically divided into 4 stages of pipeline, and the delay of each stage of logic becomes 1/4 of that of a single stage of pipeline, if the clock of the IF logic in Figure 1 The period is T, and the corresponding clock is clk1 in Figure 11. Then the clock of the four hardware threads can be shown as clk2, and the frequency is 4 times that of clk1, so that the four threads of the hardware can run in parallel within the time of executing one thread in a single-stage pipeline. Execute 4 threads.
图12是硬件多线程执行时序示意图。以n=4为例,#1,#2,#3,#4分别对应于4个不同的硬件线程,所有的硬件线程都从相同指令地址处开始取指执行,假设T1时刻开始执行,硬件线程#1执行第一条指令的取指逻辑的第一步IF1;T2时刻硬件线程#1执行第一条指令的取指逻辑的第二步IF2,同时硬件线程#2执行第一条指令的取指逻辑的第一步IF1;依次类推,T5时刻,硬件线程#1执行第一条指令的译码逻辑的第二步ID2和第二条指令的取指逻辑的第二步IF2,硬件线程#2执行第一条指令的译码逻辑的第一步ID1和第二条指令的取指逻辑的第一步IF1。假设第一条指令是一条多线程初始化专用指令,则T2时刻硬件线程#2开始取多线程初始化专用指令,T5时刻硬件线程#2开始译码多线程初始化专用指令,T6时刻硬件线程#2开始执行多线程初始化专用指令,产生相应的硬件线程号2,T7时刻该硬件线程号2通过数据旁路传递给硬件线程#2的第二、三条指令的执行和译码阶段,T8时刻硬件线程#2开始初始化专用指令写回逻辑,T9时刻完成初始化专用指令写回逻辑。在T7时刻初始化专用指令已经计算出硬件线程号2,并且可以通过数据旁路传递给硬件线程#2的第二、三条指令的执行和译码阶段,第三条指令若为条件跳转指令,则其在T7时刻开始译码该条件跳转指令,通过判断来自数据旁路的硬件线程号来决定是否进行跳转,即将线程号和2进行比较,对于硬件线程#2,其硬件线程号与2相等,实现跳转,对于其他线程,其硬件线程号不等于2,不进行跳转,继续顺序执行,从而实现了各个线程的分离,该条件跳转指令不必只为第三条指令,在第三条指令之后都可,因为T7时刻已经计算出硬件线程号,可以通过条件跳转指令进行条件跳转。对于线程分离的其他情况的描述类似,可见,如果第一条指令为多线程初始化专用指令,则第三条指令或者第三条指令之后通过条件跳转指令就可以实现线程的分离。FIG. 12 is a schematic diagram of hardware multi-thread execution timing. Taking n=4 as an example, #1, #2, #3, and #4 respectively correspond to 4 different hardware threads, and all hardware threads start fetching and executing instructions from the same instruction address. Assuming that execution starts at T1, the
Claims (10)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201010512737.3A CN101957744B (en) | 2010-10-13 | 2010-10-13 | Hardware multithreading control method for microprocessor and device thereof |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201010512737.3A CN101957744B (en) | 2010-10-13 | 2010-10-13 | Hardware multithreading control method for microprocessor and device thereof |
Publications (2)
Publication Number | Publication Date |
---|---|
CN101957744A true CN101957744A (en) | 2011-01-26 |
CN101957744B CN101957744B (en) | 2013-07-24 |
Family
ID=43485089
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201010512737.3A Expired - Fee Related CN101957744B (en) | 2010-10-13 | 2010-10-13 | Hardware multithreading control method for microprocessor and device thereof |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN101957744B (en) |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2013093681A1 (en) * | 2011-12-20 | 2013-06-27 | International Business Machines Corporation | Low latency variable transfer network for fine grained parallelism of virtual threads across multiple hardware threads |
CN104699465A (en) * | 2015-03-26 | 2015-06-10 | 中国人民解放军国防科学技术大学 | Vector access and storage device supporting SIMT in vector processor and control method |
CN104699463A (en) * | 2015-03-20 | 2015-06-10 | 浪潮集团有限公司 | Implementation method for assembly lines low in power consumption |
CN104778074A (en) * | 2014-01-14 | 2015-07-15 | 腾讯科技(深圳)有限公司 | Calculation task processing method and device |
CN105808357A (en) * | 2016-03-29 | 2016-07-27 | 沈阳航空航天大学 | Multi-core multi-threaded processor with precise performance control function |
CN106408504A (en) * | 2015-07-31 | 2017-02-15 | Arm有限公司 | Graphics processing systems |
CN108255587A (en) * | 2016-12-29 | 2018-07-06 | 展讯通信(上海)有限公司 | A kind of synchronous multiline procedure processor |
CN109445851A (en) * | 2012-10-17 | 2019-03-08 | 瑞萨电子株式会社 | Multiline procedure processor |
CN109597654A (en) * | 2018-12-07 | 2019-04-09 | 湖南国科微电子股份有限公司 | Initialization of register method, the generation method and embedded system of configurations table |
CN110018781A (en) * | 2018-01-09 | 2019-07-16 | 阿里巴巴集团控股有限公司 | Disk flow control method, device and electronic equipment |
CN110647357A (en) * | 2018-06-27 | 2020-01-03 | 展讯通信(上海)有限公司 | Synchronous multithread processor |
CN110647358A (en) * | 2018-06-27 | 2020-01-03 | 展讯通信(上海)有限公司 | Synchronous multithread processor |
CN112579278A (en) * | 2020-12-24 | 2021-03-30 | 海光信息技术股份有限公司 | Central processing unit, method, device and storage medium for simultaneous multithreading |
CN112713993A (en) * | 2020-12-24 | 2021-04-27 | 天津国芯科技有限公司 | Encryption algorithm module accelerator and high-speed data encryption method |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111651201B (en) * | 2016-04-26 | 2023-06-13 | 中科寒武纪科技股份有限公司 | Apparatus and method for performing vector merge operation |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1760826A (en) * | 2004-10-14 | 2006-04-19 | 国际商业机器公司 | Method, processor and system for processing instructions |
US20070005942A1 (en) * | 2002-01-14 | 2007-01-04 | Gil Vinitzky | Converting a processor into a compatible virtual multithreaded processor (VMP) |
-
2010
- 2010-10-13 CN CN201010512737.3A patent/CN101957744B/en not_active Expired - Fee Related
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070005942A1 (en) * | 2002-01-14 | 2007-01-04 | Gil Vinitzky | Converting a processor into a compatible virtual multithreaded processor (VMP) |
CN1760826A (en) * | 2004-10-14 | 2006-04-19 | 国际商业机器公司 | Method, processor and system for processing instructions |
Non-Patent Citations (1)
Title |
---|
VICTOR P. RUBIO: "A FPGA Implementation of a MIPS RISC Processor for Computer Architecture Education", 《MASTER OF SCIENCE NEW MEXICO STATE UNIVERSITY》, 8 July 2004 (2004-07-08), pages 16 - 35 * |
Cited By (25)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2013093681A1 (en) * | 2011-12-20 | 2013-06-27 | International Business Machines Corporation | Low latency variable transfer network for fine grained parallelism of virtual threads across multiple hardware threads |
GB2512004A (en) * | 2011-12-20 | 2014-09-17 | Ibm | Low latency variable transfer network for fine grained parallelism of virtual threads across multiple hardware threads |
GB2512004B (en) * | 2011-12-20 | 2015-01-07 | Ibm | Low latency variable transfer network for fine grained parallelism of virtual threads across multiple hardware threads |
US9021237B2 (en) | 2011-12-20 | 2015-04-28 | International Business Machines Corporation | Low latency variable transfer network communicating variable written to source processing core variable register allocated to destination thread to destination processing core variable register allocated to source thread |
CN109445851B (en) * | 2012-10-17 | 2023-08-15 | 瑞萨电子株式会社 | Multithreaded processor |
CN109445851A (en) * | 2012-10-17 | 2019-03-08 | 瑞萨电子株式会社 | Multiline procedure processor |
US10146588B2 (en) | 2014-01-14 | 2018-12-04 | Tencent Technology (Shenzhen) Company Limited | Method and apparatus for processing computational task having multiple subflows |
CN104778074A (en) * | 2014-01-14 | 2015-07-15 | 腾讯科技(深圳)有限公司 | Calculation task processing method and device |
CN104699463B (en) * | 2015-03-20 | 2017-05-17 | 浪潮集团有限公司 | Implementation method for assembly lines low in power consumption |
CN104699463A (en) * | 2015-03-20 | 2015-06-10 | 浪潮集团有限公司 | Implementation method for assembly lines low in power consumption |
CN104699465B (en) * | 2015-03-26 | 2017-05-24 | 中国人民解放军国防科学技术大学 | Vector access and storage device supporting SIMT in vector processor and control method |
CN104699465A (en) * | 2015-03-26 | 2015-06-10 | 中国人民解放军国防科学技术大学 | Vector access and storage device supporting SIMT in vector processor and control method |
CN106408504A (en) * | 2015-07-31 | 2017-02-15 | Arm有限公司 | Graphics processing systems |
CN105808357B (en) * | 2016-03-29 | 2021-07-27 | 沈阳航空航天大学 | Performance with precise control over multi-core and multi-threaded processors |
CN105808357A (en) * | 2016-03-29 | 2016-07-27 | 沈阳航空航天大学 | Multi-core multi-threaded processor with precise performance control function |
CN108255587A (en) * | 2016-12-29 | 2018-07-06 | 展讯通信(上海)有限公司 | A kind of synchronous multiline procedure processor |
CN110018781A (en) * | 2018-01-09 | 2019-07-16 | 阿里巴巴集团控股有限公司 | Disk flow control method, device and electronic equipment |
CN110018781B (en) * | 2018-01-09 | 2022-06-21 | 阿里巴巴集团控股有限公司 | Disk flow control method and device and electronic equipment |
CN110647358A (en) * | 2018-06-27 | 2020-01-03 | 展讯通信(上海)有限公司 | Synchronous multithread processor |
CN110647357A (en) * | 2018-06-27 | 2020-01-03 | 展讯通信(上海)有限公司 | Synchronous multithread processor |
CN109597654B (en) * | 2018-12-07 | 2022-01-11 | 湖南国科微电子股份有限公司 | Register initialization method, basic configuration table generation method and embedded system |
CN109597654A (en) * | 2018-12-07 | 2019-04-09 | 湖南国科微电子股份有限公司 | Initialization of register method, the generation method and embedded system of configurations table |
CN112579278A (en) * | 2020-12-24 | 2021-03-30 | 海光信息技术股份有限公司 | Central processing unit, method, device and storage medium for simultaneous multithreading |
CN112713993A (en) * | 2020-12-24 | 2021-04-27 | 天津国芯科技有限公司 | Encryption algorithm module accelerator and high-speed data encryption method |
CN112579278B (en) * | 2020-12-24 | 2023-01-20 | 海光信息技术股份有限公司 | Central processing unit, method, device and storage medium for simultaneous multithreading |
Also Published As
Publication number | Publication date |
---|---|
CN101957744B (en) | 2013-07-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN101957744B (en) | Hardware multithreading control method for microprocessor and device thereof | |
KR101594090B1 (en) | Processors, methods, and systems to relax synchronization of accesses to shared memory | |
US8904153B2 (en) | Vector loads with multiple vector elements from a same cache line in a scattered load operation | |
US9645819B2 (en) | Method and apparatus for reducing area and complexity of instruction wakeup logic in a multi-strand out-of-order processor | |
CN106104481B (en) | System and method for performing deterministic and opportunistic multithreading | |
US7237094B2 (en) | Instruction group formation and mechanism for SMT dispatch | |
KR100973951B1 (en) | Misaligned memory access prediction | |
US9811340B2 (en) | Method and apparatus for reconstructing real program order of instructions in multi-strand out-of-order processor | |
US20170371660A1 (en) | Load-store queue for multiple processor cores | |
CN100461094C (en) | An Instruction Control Method for Stream Processor | |
US20120060016A1 (en) | Vector Loads from Scattered Memory Locations | |
KR20180021812A (en) | Block-based architecture that executes contiguous blocks in parallel | |
JP5209933B2 (en) | Data processing device | |
CN101373427A (en) | Program Execution Control Device | |
US20110022821A1 (en) | System and Methods to Improve Efficiency of VLIW Processors | |
US20170371659A1 (en) | Load-store queue for block-based processor | |
US11086631B2 (en) | Illegal instruction exception handling | |
GB2458487A (en) | Processor with multiple execution pipeline paths | |
CN101201732A (en) | 32-bit multi-mode microprocessor | |
EP1974254B1 (en) | Early conditional selection of an operand | |
US20100251016A1 (en) | Issuing Instructions In-Order in an Out-of-Order Processor Using False Dependencies | |
US10613866B2 (en) | Method of detecting repetition of an out-of-order execution schedule, apparatus and computer-readable medium | |
US6351803B2 (en) | Mechanism for power efficient processing in a pipeline processor | |
CN112395000B (en) | Data preloading method and instruction processing device | |
CN112148106A (en) | System, apparatus and method for hybrid reservation station for processor |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20130724 Termination date: 20211013 |