CN101957744A

CN101957744A - Hardware multithreading control method for microprocessor and device thereof

Info

Publication number: CN101957744A
Application number: CN2010105127373A
Authority: CN
Inventors: 齐悦; 王磊; 王惠娟; 师立宁; 王沁
Original assignee: University of Science and Technology Beijing USTB
Current assignee: University of Science and Technology Beijing USTB
Priority date: 2010-10-13
Filing date: 2010-10-13
Publication date: 2011-01-26
Anticipated expiration: 2030-10-13
Also published as: CN101957744B

Abstract

A hardware multi-thread control method and device for a microprocessor relate to the field of microprocessor architecture. The control method proposed by the present invention includes the following steps: multi-thread instruction fetching, multi-thread decoding, multi-thread execution, multi-thread memory access, and multi-thread write-back. The simulation device proposed by the present invention includes: a hardware multi-thread instruction fetching device, a hardware multi-thread decoding device, a hardware multi-thread execution device, a hardware multi-thread memory access device, a hardware multi-thread write-back device, a hardware multi-thread Thread register set, a multi-thread control device. In addition, the present invention is aimed at software multi-thread programs, which can be executed by processor hardware multi-threads, effectively hides memory access delays during execution, omits the preservation and restoration of thread-related information during thread switching, and reduces the overhead of thread switching; Pipeline technology is adopted, so that n threads can be executed in parallel within the original execution time of one thread; through hardware multi-threading, the risk of data correlation brought by deep pipelining is effectively avoided, the design complexity of the system is reduced, and the hardware is improved. system performance.

Description

A hardware multi-thread control method and device for a microprocessor

技术领域：Technical field:

本发明涉及微处理器体系结构领域，特别涉及硬件多线程的控制方法及其装置。The invention relates to the field of microprocessor architecture, in particular to a hardware multi-thread control method and a device thereof.

背景知识：background knowledge:

为了进一步提高微处理器性能，已提出多种新颖的体系结构，如多核、多线程、流处理、PIM、可重构、多态等。这些新的体系结构从不同角度对微处理器发展中的问题提出了解决方法。受到软件程序特性影响和硬件工艺的物理限制，未来体系结构的发展已经不是仅仅通过提高主频就可以获得系统性能的大幅提升。体系结构技术发展的趋势是明显的，多线程和多核，成为两个关键技术方向，在各个领域，含有多线程或多核特征的处理器层出不穷。流水线技术是RISC处理器区别于CISC处理器的重要特征。采用深度流水技术，在指令相关和指令跳转时会大大降低流水线的性能。本发明在RISC架构基础上，采用深度流水技术，实现多硬件线程执行，有效的避免指令相关带来的性能减低，并通过多线程提高微处理器的性能。In order to further improve the performance of microprocessors, a variety of novel architectures have been proposed, such as multi-core, multi-thread, stream processing, PIM, reconfigurable, polymorphic, etc. These new architectures propose solutions to the problems in the development of microprocessors from different angles. Affected by the characteristics of software programs and the physical limitations of hardware technology, the development of future architecture is no longer able to obtain a substantial improvement in system performance just by increasing the main frequency. The development trend of architecture technology is obvious. Multi-thread and multi-core have become two key technical directions. In various fields, processors with multi-thread or multi-core features emerge in an endless stream. Pipeline technology is an important feature that distinguishes RISC processors from CISC processors. The use of deep pipeline technology will greatly reduce the performance of the pipeline during instruction correlation and instruction jump. Based on the RISC architecture, the present invention adopts deep pipeline technology to realize multi-hardware thread execution, effectively avoids the performance reduction caused by instruction correlation, and improves the performance of the microprocessor through multi-threading.

2008-7-16公开的专利：CN101221493A，题目：《并行处理器中的多线程执行》，发明人：D·伯恩斯坦因等。该发明公开了一种并行的硬件多线程处理器。该处理器包含一个谐调系统功能的通用处理器和支持多个硬件线程和多个微引擎。该处理器还包含具有第1存储控制器和第2存储控制器的存储器控制系统，第1存储控制器根据存储器访问是指向偶数存储组还是指向奇数存储组，将存储器访问加以分类，第2存储控制器则根据存储器访问是读访问还是写访问，对存储器访问进行优化。Patent published on July 16, 2008: CN101221493A, title: "Multithreaded Execution in Parallel Processors", inventor: D. Bernstein, etc. The invention discloses a parallel hardware multi-thread processor. The processor contains a general-purpose processor that coordinates system functions and supports multiple hardware threads and multiple microengines. The processor also includes a memory control system having a first memory controller and a second memory controller, the first memory controller classifying memory accesses according to whether the memory access is directed to an even bank or an odd bank, and a second memory controller The controller optimizes memory accesses based on whether the memory access is a read or write access.

2007-8-22公开的专利：CN101021801，题目：《流水线多进程之间基于消息队列的海量数据传输方法》，发明人：薛庆童等。该专利公开了一种流水线多进程之间基于消息队列的海量数据传输的方法。在计费产品业务处理流程中一条话单顺序通过至少分为格式化、规整(或者称为分拣)、排重、批价、入库多个进程，计费方法采用单步与整体提交相结合的机制，通过配置来实现在不同环境下采用不同消息队列类型自动分配任务，负载均衡管理，将话单分发至不相同的消息队列，根据业务逻辑，自定义方式进行部署。采用该发明方法实现的计费系统海量的话单数据在进程间的传输，全部通过消息队列，处理过程可以在内存里面实现，没有系统IO的开销，速度大大提高。采用基于消息队列的流水线并行处理技术方案的系统处理效率明显提高。处理速度在国内外计费厂家中名列前茅。Patent published on 2007-8-22: CN101021801, title: "Multiple data transmission method based on message queue between multiple pipeline processes", inventor: Xue Qingtong, etc. This patent discloses a method for mass data transmission based on message queues between pipelined multi-processes. In the business processing flow of billing products, the sequence of a bill is at least divided into formatting, ordering (or sorting), weight sorting, price approval, and warehousing. The billing method adopts single-step and overall submission. The combined mechanism is configured to automatically assign tasks in different environments using different message queue types, load balance management, distribute bills to different message queues, and deploy in a custom way according to business logic. The billing system realized by the method of the invention transmits a large amount of bill data between processes, all through the message queue, and the processing process can be realized in the memory without system IO overhead, and the speed is greatly improved. The processing efficiency of the system using the pipeline parallel processing technology scheme based on the message queue is obviously improved. The processing speed ranks among the best among billing manufacturers at home and abroad.

2006-1-25公开的专利：CN1725176，题目：《多线程流水线指令解码器的方法和设备》，发明人：J·P·杜格拉斯等。该专利公开了一种多线程流水线指令解码器的方法，使用多线程传输的指令解码器计时、清除和延迟多线程机器中解码流水线的指令，可以获得最佳的性能和最小的功耗。一个映像流水线映像保持线程标识的指令解码流水线和指令解码器每个流水线阶段有效指令比特。线程标识和有效的比特用于控制对指令解码器中每个流水线阶段的计时、清除和延迟。一个线程指令能够被清除而不与在解码流水线的其它线程指令发生冲突，在一些情况下，一个线程的指令能够被延迟而不与在解码流水线的其它线程指令发生冲突。本发明中，仅仅当有效指令需要前进以便保持功率和最小化延迟时计时流水线阶段。Patent published on January 25, 2006: CN1725176, title: "Method and Device for Multithreaded Pipeline Instruction Decoder", inventor: J·P·Douglas, etc. This patent discloses a method for a multi-threaded pipeline instruction decoder. Using the multi-threaded transmission instruction decoder to time, clear and delay the decoding pipeline instructions in a multi-threaded machine can obtain the best performance and minimum power consumption. An image pipeline image holds thread-identified instruction decode pipeline and instruction decoder valid instruction bits per pipeline stage. Thread ID and valid bits are used to control the timing, flushing, and latency of each pipeline stage in the instruction decoder. A thread instruction can be flushed without conflicting with other thread instructions in the decode pipeline, and in some cases, a thread's instruction can be delayed without conflicting with other thread instructions in the decode pipeline. In the present invention, pipeline stages are only clocked when active instructions need to advance in order to preserve power and minimize latency.

1999-9-15公开的专利：CN1228557，题目：《计算机处理器多线程指令级并行技术》，发明人：刘殷等。该专利公开了一种计算机处理器多线程指令级并行技术。该发明涉及一种可应用于计算机处理器的技术：多线程指令级并行技术。采用该技术的计算机处理器可以轮流地从处于执行状态的线程中取指令，使得在计算机处理器中并行执行的若干指令分别来自于不同的线程，因而这些指令间不存在“指令间依赖性问题”。Patent published on September 15, 1999: CN1228557, title: "Multithreaded Instruction Level Parallel Technology for Computer Processors", inventor: Liu Yin, etc. This patent discloses a computer processor multi-thread instruction level parallel technology. The invention relates to a technology applicable to computer processors: multi-thread instruction level parallel technology. The computer processor using this technology can fetch instructions from the threads in the execution state in turn, so that several instructions executed in parallel in the computer processor come from different threads, so there is no "inter-instruction dependency problem" between these instructions. ".

2007-6-6公开的专利：CN1975663，题目：《具有用于不同线程的非对称硬件多线程支持的装置》，发明人：戴维·A·克拉。该专利提供了一种用于特定类线程的非对称硬件支持。优选地，该特定类线程是高优先级的I/O绑定线程。在第一个方面中，多线程处理器包括用于支持N个线程的并发执行的N组寄存器。至少一个寄存器组专门用于特定类的线程，并且不能为其它线程所使用，即使在闲置时也是如此。在第二方面中，特定类的线程仅可填充超高速缓冲器存储器的有限的部分，以便减少否则可能出现的超高速缓冲器的刷新。Patent published on June 6, 2007: CN1975663, title: "A device with asymmetric hardware multi-thread support for different threads", inventor: David A. Carat. The patent provides an asymmetric hardware support for a specific class of threads. Preferably, the specific class of threads is a high priority I/O bound thread. In a first aspect, a multithreaded processor includes N sets of registers to support concurrent execution of N threads. At least one register bank is dedicated to a particular class of thread and cannot be used by other threads, even when idle. In a second aspect, a particular class of threads can only fill a limited portion of the cache memory in order to reduce flushing of the cache that might otherwise occur.

2005-12-14公开的专利：CN1707694，题目：《用于多线程流水线总线系统的存储控制器》，发明人：徐允范等。该专利公开了一种用于多线程流水线总线系统的存储控制器，在多线程流水线系统的存储控制方法中，从主机顺序接收存储单元中待访问的多排的地址。对于该多排中的每一排，判定当读/写命令输出到存储单元时对应于该排的地址是否从主机输入。当该判定结果表明对应于该排的地址已输入时，向该存储单元输出包括公开页信息和自动预充电信息中任一种在内的读/写命令。Patent published on December 14, 2005: CN1707694, title: "Memory Controller for Multithreaded Pipeline Bus System", inventor: Xu Yunfan, etc. This patent discloses a storage controller for a multi-thread pipeline bus system. In the storage control method of the multi-thread pipeline system, addresses of multiple banks to be accessed in the storage unit are sequentially received from the host. For each of the plurality of banks, it is determined whether or not an address corresponding to the bank is input from the host when a read/write command is output to the memory cell. When the result of the determination indicates that the address corresponding to the bank has been input, a read/write command including any one of open page information and automatic precharge information is output to the memory cell.

1999-10-20公开的专利：CN1232219，题目：《流水线型多处理器系统》，发明人：小池庸夫。该专利公开了一种流水线型多处理器系统，包括一组处理器单元，一组缓冲器及调试单元。该处理器单元用于流水线处理数据；该缓冲器保持输入数据和每个处理器单元的处理结果；缓冲器和处理器单元在数据输入和输出之间依次级联，调试单元用于可选择地在外部输出每个处理器单元的处理结果，以在调试时进行监控。Patent published on 1999-10-20: CN1232219, title: "Pipeline Multi-processor System", inventor: Yoshio Koike. This patent discloses a pipelined multiprocessor system, which includes a set of processor units, a set of buffers and a debugging unit. The processor unit is used for pipeline processing data; the buffer holds the input data and the processing results of each processor unit; the buffer and processor unit are sequentially cascaded between data input and output, and the debug unit is used to optionally The processing results of each processor unit are output externally for monitoring during debugging.

发明内容：Invention content:

本发明的目的是为软件多线程程序的执行设计一种用于微处理器的硬件多线程控制方法及相应的硬件多线程控制装置。The object of the present invention is to design a hardware multi-thread control method for a microprocessor and a corresponding hardware multi-thread control device for the execution of software multi-thread programs.

一种用于微处理器的硬件多线程控制方法，其特征在于该方法包括以下步骤：A hardware multi-thread control method for a microprocessor, characterized in that the method comprises the following steps:

1)多线程取指步骤，用于各个线程的指令读取，各个线程的指令地址产生。具体包括多线程指令地址控制、多线程指令地址缓存、多线程取指。1) The multi-thread instruction fetching step is used to read the instructions of each thread and generate the instruction address of each thread. Specifically, it includes multi-thread instruction address control, multi-thread instruction address cache, and multi-thread instruction fetch.

a)多线程指令地址控制，用于产生各个线程的指令地址，当某个线程阻塞时，仅阻塞该线程指令地址，其它线程指令地址正常更新；a) Multi-thread instruction address control, used to generate the instruction address of each thread, when a certain thread is blocked, only the thread instruction address is blocked, and other thread instruction addresses are updated normally;

b)多线程指令地址缓存，用于存储n个线程的指令地址；b) multi-thread instruction address cache, used to store the instruction addresses of n threads;

c)多线程取指，用于将取指逻辑对称划分成n级流水，取出n个硬件线程所对应的软件线程指令，当某个线程阻塞时，仅阻塞该线程取指，其它线程取指正常运行。c) Multi-thread instruction fetch, which is used to symmetrically divide the instruction fetch logic into n-level pipelines, and take out the software thread instructions corresponding to n hardware threads. When a thread is blocked, only the thread fetch is blocked, and other threads fetch instructions normal operation.

2)多线程译码步骤，用于对各个硬件线程的指令进行译码，准备好多线程执行步骤所需要的寄存器数据。具体包括多线程译码、多线程寄存器操作数准备、译码部件数据旁路控制：2) The multi-thread decoding step is used to decode the instructions of each hardware thread, and prepare the register data required by the multi-thread execution step. It specifically includes multi-threaded decoding, multi-threaded register operand preparation, and data bypass control of decoding components:

a)多线程译码，用于将译码逻辑对称划分成n级流水，完成多线程初始化专用指令译码、常规指令译码。当某个线程阻塞时，仅阻塞该线程指令译码，其它线程指令译码正常运行。所述的多线程初始化专用指令包括：用于标识该指令的操作码域，用于操作目的操作数的操作数域，用于操作源操作数的操作数域；a) Multi-thread decoding, which is used to symmetrically divide the decoding logic into n-level pipelines, and complete multi-thread initialization special instruction decoding and conventional instruction decoding. When a certain thread is blocked, only the instruction decoding of this thread is blocked, and the instruction decoding of other threads runs normally. The multi-thread initialization special instruction includes: an opcode field for identifying the instruction, an operand field for operating the destination operand, and an operand field for operating the source operand;

b)多线程寄存器操作数准备，用于产生待读取寄存器地址，并从n个寄存器组中读取指令所需操作数；b) Multi-thread register operand preparation, used to generate the address of the register to be read, and read the required operand of the instruction from n register groups;

c)译码部件数据旁路控制，用于将数据旁路的数据提供给指令译码的某个流水阶段；c) Data bypass control of the decoding component, used to provide the data of the data bypass to a certain pipeline stage of instruction decoding;

3)多线程执行步骤，用于执行各个线程指令。具体包括多线程初始化专用指令执行、线程号缓存、执行部件数据旁路控制、多线程常规指令执行：3) The multi-thread execution step is used to execute instructions of each thread. Specifically, it includes execution of special instructions for multi-thread initialization, thread number cache, data bypass control of execution components, and execution of multi-thread conventional instructions:

a)多线程初始化专用指令执行，用于产生新硬件线程号，新硬件线程号对应于该硬件线程所执行的软件线程；a) execution of multi-thread initialization special instructions for generating a new hardware thread number, and the new hardware thread number corresponds to the software thread executed by the hardware thread;

b)线程号缓存，用于将所述多线程初始化专用指令执行产生的新硬件线程号缓存，使得产生的某新硬件线程号在线程号寄存器序列中的位置与该硬件线程在指令地址寄存器序列、取指部件多线程寄存器序列、译码部件多线程寄存器序列、执行部件多线程寄存器序列、访存部件多线程寄存器序列、写回部件多线程寄存器序列中的位置一致；b) thread number cache, which is used to cache the new hardware thread number generated by the execution of the multi-thread initialization special instruction, so that the position of a certain new hardware thread number generated in the thread number register sequence is the same as that of the hardware thread in the instruction address register sequence , the multi-threaded register sequence of the fetching part, the multi-threaded register sequence of the decoding part, the multi-threaded register sequence of the execution part, the multi-threaded register sequence of the accessing part, and the multi-threaded register sequence of the write-back part are consistent in position;

c)执行部件数据旁路控制，用于将数据旁路的数据提供给指令执行的某个流水阶段；c) Execution component data bypass control, used to provide data bypassed to a certain pipeline stage of instruction execution;

d)多线程常规指令执行，用于将执行逻辑对称划分成n级流水，完成n个硬件线程的常规指令执行，当某个线程阻塞时，仅阻塞该线程指令执行，其它线程指令执行正常运行。d) Multi-threaded regular instruction execution, used to symmetrically divide the execution logic into n-level pipelines, and complete the regular instruction execution of n hardware threads. When a thread is blocked, only the thread instruction execution is blocked, and other thread instructions execute normally. .

4)多线程访存步骤，用于将访存逻辑对称划分成n级流水，将各个线程的执行结果写入到存储器或者从存储器读入线程所需数据，当某个线程阻塞时，仅阻塞该线程数据访存，其它线程数据访存正常运行。4) The multi-thread memory access step is used to symmetrically divide the memory access logic into n-level pipelines, write the execution results of each thread into the memory or read the data required by the thread from the memory, and when a thread is blocked, only block This thread data access, other thread data access works normally.

5)多线程写回步骤，用于将写回逻辑对称划分成n级流水，将各个线程的执行结果写回到对应寄存器组，当某个线程阻塞时，仅阻塞该线程数据写回，其它线程数据写回正常运行。具体包括多线程写回数据控制、多线程写回寄存器地址控制：5) The multi-threaded write-back step is used to symmetrically divide the write-back logic into n-level pipelines, and write the execution results of each thread back to the corresponding register group. When a certain thread is blocked, only the thread data write-back is blocked, and other Thread data writeback works normally. Specifically, it includes multi-thread write-back data control and multi-thread write-back register address control:

a)多线程写回数据控制，用于待写回寄存器数据的准备和输出；a) Multi-threaded write-back data control, used for preparation and output of register data to be written back;

b)多线程写回寄存器地址控制，用于待写回寄存器地址的准备和输出。b) Multi-threaded write-back register address control, used for preparing and outputting the address of the register to be written back.

一种用于微处理器的硬件多线程控制装置，该装置通过采用流水线技术，支持n个硬件线程并行执行。其特征在于包括以下部件：硬件多线程取指器件、硬件多线程译码器件、硬件多线程执行器件、硬件多线程访存器件、硬件多线程写回器件、硬件多线程寄存器组和多线程控制器件。A hardware multi-thread control device for a microprocessor, the device supports parallel execution of n hardware threads by adopting pipeline technology. It is characterized in that it includes the following components: hardware multi-thread instruction fetching device, hardware multi-thread decoding device, hardware multi-thread execution device, hardware multi-thread memory access device, hardware multi-thread write-back device, hardware multi-thread register group and multi-thread control device.

1)所述硬件多线程取指器件包括：指令地址控制器件，用于产生各个线程的指令地址；指令地址寄存器序列，用于存储n个线程的指令地址；取指部件多线程寄存器序列，用于暂存取指逻辑n级流水的中间结果，每级寄存器对应一个硬件线程的部分取指逻辑输出；1) The hardware multi-thread instruction fetching device includes: an instruction address control device, which is used to generate the instruction address of each thread; the instruction address register sequence, which is used to store the instruction addresses of n threads; the instruction fetch part multi-thread register sequence, used To temporarily store the intermediate results of the n-level pipeline of the instruction fetch logic, each stage of register corresponds to the partial instruction fetch logic output of a hardware thread;

2)所述硬件多线程译码器件包括：译码部件多线程寄存器序列，用于暂存译码逻辑n级流水的中间结果，每级寄存器对应一个硬件线程的部分译码逻辑输出；数据旁路寄存器序列，用于存储各个线程前两条指令的中间执行结果；2) the hardware multi-thread decoding device includes: a decoding part multi-thread register sequence, which is used to temporarily store the intermediate results of the n-level pipeline of decoding logic, and each level of register corresponds to the partial decoding logic output of a hardware thread; Road register sequence, used to store the intermediate execution results of the first two instructions of each thread;

3)所述硬件多线程执行器件包括：多线程初始化专用指令执行器件，用于产生硬件线程号，该硬件线程号对应于该硬件线程所执行的软件线程；线程号寄存器序列，用于缓存产生的硬件线程号；执行部件多线程寄存器序列，用于暂存执行逻辑n级流水的中间结果，且每级寄存器对应一个硬件线程的部分执行逻辑的输出；数据旁路寄存器序列，用于存储各个线程前两条指令的中间执行结果；3) The hardware multi-thread execution device includes: a multi-thread initialization special instruction execution device, which is used to generate a hardware thread number, which corresponds to the software thread executed by the hardware thread; a thread number register sequence, which is used for buffer generation The hardware thread number; the multi-threaded register sequence of the execution unit is used to temporarily store the intermediate results of the execution logic n-level pipeline, and each level of register corresponds to the output of a part of the execution logic of a hardware thread; the data bypass register sequence is used to store each The intermediate execution results of the first two instructions of the thread;

4)所述硬件多线程访存器件包括：访存部件多线程寄存器序列，用于暂存访存逻辑n级流水的中间结果，每级寄存器对应一个硬件线程的部分访存逻辑输出；4) The hardware multi-threaded memory access device includes: a multi-threaded register sequence of the memory access part, which is used to temporarily store the intermediate results of the n-level pipeline of the memory access logic, and each level of register corresponds to the partial memory access logic output of a hardware thread;

5)所述硬件多线程写回器件包括：写回部件多线程寄存器序列，用于暂存写回逻辑n级流水的中间结果，每级寄存器对应一个硬件线程的部分写回逻辑输出；5) The hardware multi-thread write-back device includes: a write-back component multi-thread register sequence, which is used to temporarily store the intermediate results of the write-back logic n-level pipeline, and each level of register corresponds to a part of the write-back logic output of a hardware thread;

6)所述硬件多线程寄存器组包括：译码器，根据多线程控制器件产生的寄存器控制信号以及硬件多线程译码器件或硬件多线程写回器件产生的地址信号，进行译码，输出当前线程的寄存器组使能信号和当前操作的寄存器地址；多路选通器，根据多线程控制器件产生的寄存器控制信号，选通当前线程的寄存器组的数据，并将其输出；n个寄存器组，分别提供给n个线程使用，各自独立，写入数据来自硬件多线程写回器件的输出数据，读出数据送给硬件多线程译码器件；6) The hardware multi-thread register group includes: a decoder, which decodes according to the register control signal generated by the multi-thread control device and the address signal generated by the hardware multi-thread decoding device or the hardware multi-thread write-back device, and outputs the current Register group enable signal of thread and register address of current operation; multiplexer, according to the register control signal generated by the multi-thread control device, selects the data of the register group of the current thread and outputs it; n register groups , respectively provided to n threads, each independently, the written data comes from the output data of the hardware multi-threaded write-back device, and the read data is sent to the hardware multi-threaded decoding device;

7)所述多线程控制器件，用于产生以下控制信号：产生取指控制信号，输出给所述硬件多线程取指器件，产生译码控制信号，输出给所述硬件多线程译码器件，产生执行控制信号，输出给所述硬件多线程执行器件，产生访存控制信号，输出给所述硬件多线程访存器件，产生写回控制信号，输出给所述硬件多线程写回器件，产生寄存器控制信号，输出给所述硬件多线程寄存器组。7) The multi-thread control device is used to generate the following control signals: generate an instruction fetch control signal, output to the hardware multi-thread instruction fetch device, generate a decoding control signal, and output to the hardware multi-thread decoding device, Generate an execution control signal, output it to the hardware multi-thread execution device, generate a memory access control signal, output it to the hardware multi-thread memory access device, generate a write-back control signal, output it to the hardware multi-thread write-back device, and generate The register control signal is output to the hardware multi-thread register set.

本发明的一个优点是针对软件多线程程序，可以利用处理器硬件多线程执行，执行时有效地隐藏了访存延迟，略去了线程切换时线程相关信息的保存与恢复，减少了线程切换的开销，从而提高了程序的执行效率，降低功耗。An advantage of the present invention is that for software multi-thread programs, processor hardware multi-threads can be used to execute, effectively hide memory access delays during execution, save and restore thread-related information when thread switching is omitted, and reduce the cost of thread switching. overhead, thereby improving the execution efficiency of the program and reducing power consumption.

本发明的另一个优点是通过采用流水线技术，使得原来执行一个线程的时间内现在可以并行执行n个线程，从硬件上提高了程序的执行效率。Another advantage of the present invention is that through the use of pipeline technology, n threads can be executed in parallel within the original execution time of one thread, which improves the execution efficiency of the program from the hardware.

本发明的另一个优点是通过硬件多线程有效规避了深度流水带来的数据相关性风险，降低了系统的设计复杂性，并提高了系统的执行效率。Another advantage of the present invention is that the risk of data correlation brought by deep pipelining is effectively avoided through hardware multithreading, the design complexity of the system is reduced, and the execution efficiency of the system is improved.

附图说明Description of drawings

图1是典型MIPS处理器流水体系结构图。Figure 1 is a typical MIPS processor pipeline architecture diagram.

图2是用于微处理器的硬件多线程控制装置图。Fig. 2 is a diagram of a hardware multi-thread control device for a microprocessor.

图3是硬件多线程装置中的硬件多线程取指器件图。Fig. 3 is a diagram of a hardware multithread instruction fetching device in a hardware multithread device.

图4是硬件多线程装置中的硬件多线程译码器件图。Fig. 4 is a diagram of a hardware multi-thread decoding device in a hardware multi-thread device.

图5是硬件多线程装置中的硬件多线程执行器件图。Fig. 5 is a diagram of hardware multi-thread execution devices in a hardware multi-thread device.

图6是硬件多线程装置中的硬件多线程访存器件图。Fig. 6 is a diagram of a hardware multithread memory access device in a hardware multithread device.

图7是硬件多线程装置中的硬件多线程写回器件图。FIG. 7 is a device diagram of hardware multi-thread write-back in a hardware multi-thread device.

图8是硬件多线程装置中的硬件多线程寄存器组说明图。FIG. 8 is an explanatory diagram of a hardware multithreading register set in a hardware multithreading device.

图9是多线程初始化专用指令编码格式。Fig. 9 is the encoding format of special instructions for multi-thread initialization.

图10是用于微处理器的硬件多线程控制方法步骤图。Fig. 10 is a step diagram of a hardware multi-thread control method for a microprocessor.

图11是硬件多线程流水线拆分及时钟图。Figure 11 is a hardware multi-thread pipeline split and clock diagram.

图12是硬件多线程执行时序示意图。FIG. 12 is a schematic diagram of hardware multi-thread execution timing.

具体实施方式Detailed ways

下面结合附图，对本发明的实现进行详细描述。The implementation of the present invention will be described in detail below in conjunction with the accompanying drawings.

图1所示是典型MIPS处理器流水体系结构图。将一条指令的执行划分为取指(IF)、译码(ID)、执行(EX)、访存(MEM)和写回(WB)五级流水，本专利即在此流水基础上进行设计，采用加深流水级数，支持硬件多线程执行；Figure 1 shows a typical MIPS processor pipeline architecture diagram. The execution of an instruction is divided into five levels of pipelines: fetch (IF), decode (ID), execute (EX), memory access (MEM) and write back (WB). This patent is designed on the basis of this pipeline. Adopt deepened pipeline series to support hardware multi-threaded execution;

图2是本发明用于微处理器的硬件多线程控制装置总体结构示意图。包括硬件多线程取指器件201，硬件多线程译码器件202，硬件多线程执行器件203，硬件多线程访存器件204，硬件多线程写回器件205、硬件多线程寄存器组206，多线程控制器件207。每个器件都支持n个硬件线程并行执行，且各器件之间是同步的。Fig. 2 is a schematic diagram of the overall structure of the hardware multi-thread control device used in the microprocessor according to the present invention. Including hardware multi-thread instruction fetching device 201, hardware multi-thread decoding device 202, hardware multi-thread execution device 203, hardware multi-thread access device 204, hardware multi-thread write-back device 205, hardware multi-thread register group 206, multi-thread control Device 207 . Each device supports n hardware threads to execute in parallel, and each device is synchronized.

硬件多线程取指器件201，根据多线程控制器件207输出的取指控制信号，完成n个硬件线程指令地址的更新操作、实现指令地址的存储、完成对n个硬件线程的取指操作，将指令输出到硬件多线程译码器件202；硬件多线程译码器件202，根据硬件多线程取指器件201输出的n个线程的指令和多线程控制器件207产生的译码控制信号，完成n个硬件线程的译码操作，将待操作的操作控制信息和数据信息输出给硬件多线程执行器件203；硬件多线程执行器件203，根据硬件多线程译码器件202输出的操作控制信息和数据信息、多线程控制器件207产生的执行控制信号，完成n个线程的指令执行，将执行结果的数据信息输出给硬件多线程访存器件204；硬件多线程访存器件204，根据多线程控制器件207产生的访存控制信号，完成n个硬件线程的存储器访问操作，并将来自多线程执行器件203的数据信息输出给硬件多线程写回器件205；硬件多线程写回器件205，根据多线程控制器件207产生的写回控制信号，将待写回的数据信息和当前待写回的寄存器地址输出给硬件多线程寄存器组206；硬件多线程寄存器组206，根据多线程控制器件207输出的寄存器控制信号，配合硬件多线程译码器件202和硬件多线程写回器件205，完成n个硬件线程的寄存器组读写操作；多线程控制器件207，控制整个硬件多线程装置的各器件执行，具体产生以下控制信号：产生取指控制信号输出给硬件多线程取指器件201，产生译码控制信号输出给硬件多线程译码器件202、产生执行控制信号输出给硬件多线程执行器件203、产生访存控制信号输出给硬件多线程访存器件204、产生写回控制信号输出给硬件多线程写回器件205，产生寄存器控制信号输出给硬件多线程寄存器组206。The hardware multithread instruction fetching device 201, according to the instruction fetch control signal output by the multithread control device 207, completes the update operation of n hardware thread instruction addresses, realizes the storage of instruction addresses, and completes the instruction fetch operation to n hardware threads, and The instruction is output to the hardware multi-thread decoding device 202; the hardware multi-thread decoding device 202, according to the instructions of the n threads output by the hardware multi-thread instruction fetching device 201 and the decoding control signal generated by the multi-thread control device 207, completes n The decoding operation of the hardware thread outputs the operation control information and data information to be operated to the hardware multi-thread execution device 203; the hardware multi-thread execution device 203, according to the operation control information and data information output by the hardware multi-thread decoding device 202, The execution control signal generated by the multi-thread control device 207 completes the instruction execution of n threads, and outputs the data information of the execution result to the hardware multi-thread memory access device 204; the hardware multi-thread memory access device 204 generates according to the multi-thread control device 207 The memory access control signal of n hardware threads is completed, and the data information from the multi-thread execution device 203 is output to the hardware multi-thread write-back device 205; the hardware multi-thread write-back device 205, according to the multi-thread control device The write-back control signal generated by 207 outputs the data information to be written back and the current register address to be written back to the hardware multi-thread register group 206; the hardware multi-thread register group 206, according to the register control signal output by the multi-thread control device 207 Cooperate with the hardware multi-thread decoding device 202 and the hardware multi-thread write-back device 205 to complete the register group read and write operations of n hardware threads; the multi-thread control device 207 controls the execution of each device of the entire hardware multi-thread device, specifically producing the following Control signal: generate instruction fetch control signal and output to hardware multi-thread instruction fetch device 201, generate decoding control signal and output to hardware multi-thread decoding device 202, generate execution control signal and output to hardware multi-thread execution device 203, generate memory access control The signal is output to the hardware multi-thread memory access device 204 , the write-back control signal is generated and output to the hardware multi-thread write-back device 205 , and the register control signal is generated and output to the hardware multi-thread register group 206 .

图3是配合图2用于微处理器的硬件多线程控制装置总体结构图的硬件多线程取指器件201结构图。硬件多线程取指器件201包括：指令地址控制器件301、指令地址寄存器序列302、取指部件多线程寄存器序列303。指令地址控制器件301控制产生各个线程的指令地址，每次产生对应于图3取指逻辑IF1中所执行线程的下一条指令地址，假设某时刻t时IF1中所执行线程序号为i，则该时钟周期中指令地址控制器件301产生线程i顺序执行的下一条指令地址，该地址在下一个时钟周期t+1被输出到序列302的pc_DFF_1中，在t+1周期pc_DFF_n在t周期的值(即线程i+1当前执行的指令地址)被取到指令地址控制器件301中，产生线程i+1顺序执行的下一条指令地址；指令地址寄存器序列302存储n个线程的指令地址，从pc_DFF_1寄存器到pc_DFF_n寄存器依次存储为：线程j、线程j-1、…、线程1、线程n、线程n-1、…、线程j+1的下一条指令地址；取指部件多线程寄存器序列303暂存取指逻辑n级流水的中间结果，每级寄存器中存放部分取指逻辑IFj(j＝1，2，...，n)的输出，且取指部件多线程寄存器序列303中从IF_DFF_1寄存器到IF_DFF_n寄存器所存数据对应的线程号与指令地址寄存器序列302中从pc_DFF_1寄存器到pc_DFF_n寄存器所对应的线程号一致，即若pc_DFF_k寄存器中存放线程i的指令地址，则IF_DFF_k寄存器存放线程i的IFk取指逻辑输出。FIG. 3 is a structural diagram of a hardware multithread instruction fetching device 201 in conjunction with the overall structural diagram of the hardware multithread control device for a microprocessor in FIG. 2 . The hardware multi-thread instruction fetching device 201 includes: an instruction address control device 301 , an instruction address register sequence 302 , and a multi-thread register sequence 303 of an instruction fetching component. The instruction address control device 301 controls to generate the instruction address of each thread, and generates the next instruction address corresponding to the thread executed in the instruction fetch logic IF1 in FIG. In the clock cycle, the instruction address control device 301 generates the address of the next instruction executed by the thread i sequentially, and the address is output to the pc_DFF_1 of the sequence 302 in the next clock cycle t+1, and the value of pc_DFF_n in the t cycle in the t+1 cycle (ie The instruction address currently executed by thread i+1) is taken into the instruction address control device 301 to generate the next instruction address of thread i+1 sequential execution; the instruction address register sequence 302 stores the instruction addresses of n threads, from the pc_DFF_1 register to The pc_DFF_n registers are stored in sequence as: the next instruction address of thread j, thread j-1, ..., thread 1, thread n, thread n-1, ..., thread j+1; fetching parts multi-threaded register sequence 303 temporary storage and retrieval Refers to the intermediate result of logical n-level pipeline, stores the output of part of the instruction-fetch logic IFj (j=1, 2, . The thread number corresponding to the data stored in the register is consistent with the thread number corresponding to the pc_DFF_1 register to the pc_DFF_n register in the instruction address register sequence 302, that is, if the instruction address of thread i is stored in the pc_DFF_k register, then the IF_DFF_k register stores the IFk instruction fetch logic of thread i output.

图4是配合图2用于微处理器的硬件多线程控制装置总体结构图的硬件多线程译码器件202结构图。硬件多线程译码器件202包括：译码部件多线程寄存器序列401、译码部件数据旁路寄存器序列402。译码部件多线程寄存器序列401暂存译码逻辑n级流水的中间结果，每级寄存器中存放部分译码逻辑IDj的输出，且译码部件多线程寄存器序列401中从ID_DFF_1寄存器到ID_DFF_n寄存器所存数据对应的线程号与取指部件多线程寄存器序列303中从IF_DFF_1寄存器到IF_DFF_n寄存器对应的线程号一致，即若IF_DFF_k寄存器中存放线程i的IFk取指逻辑输出，则ID_DFF_k寄存器存放线程i的IDk译码逻辑输出；译码部件数据旁路寄存器序列402存储n个线程前两条指令的中间执行结果。FIG. 4 is a structural diagram of a hardware multi-thread decoding device 202 in conjunction with the general structural diagram of the hardware multi-thread control device for a microprocessor in FIG. 2 . The hardware multi-thread decoding device 202 includes: a multi-thread register sequence 401 of a decoding component, and a data bypass register sequence 402 of a decoding component. The multi-threaded register sequence 401 of the decoding part temporarily stores the intermediate results of the n-stage pipeline of decoding logic, and stores the output of part of the decoding logic IDj in each stage of registers, and the multi-threaded register sequence 401 of the decoding part stores from the ID_DFF_1 register to the ID_DFF_n register The thread number corresponding to the data is consistent with the thread number corresponding to the IF_DFF_1 register to the IF_DFF_n register in the multi-threaded register sequence 303 of the fetching part, that is, if the IFk fetching logical output of the thread i is stored in the IF_DFF_k register, then the ID_DFF_k register stores the IDk of the thread i Decoding logic output; decoding component data bypass register sequence 402 stores the intermediate execution results of the first two instructions of n threads.

图5是配合图2用于微处理器的硬件多线程控制装置总体结构图的硬件多线程执行器件203结构图。硬件多线程执行器件203包括：多线程初始化专用指令执行器件501、线程号寄存器序列502、执行部件多线程寄存器序列503、执行部件数据旁路寄存器序列504。多线程初始化专用指令执行器件501产生多线程的硬件线程号；线程号寄存器序列502为n级寄存器缓存，将产生的某硬件线程的线程号在寄存器序列502中位置Th_DFF_i与该硬件线程在指令地址寄存器序列(302)、取指部件多线程寄存器序列(303)、译码部件多线程寄存器序列(401)、执行部件多线程寄存器序列(503)、访存部件多线程寄存器序列(601)、写回部件多线程寄存器序列(701)中的位置一致；执行部件多线程寄存器序列503暂存执行逻辑n级流水的中间结果，每级寄存器中存放部分执行逻辑Ej的输出，且执行部件多线程寄存器序列503中从E_DFF_1寄存器到E_DFF_n寄存器所存数据对应的线程号与取指部件多线程寄存器序列303中从IF_DFF_1寄存器到IF_DFF_n寄存器所对应的线程号一致，即若IF_DFF_k寄存器中存放线程i的IFk取指逻辑输出，则E_DFF_k寄存器存放线程i的Ek执行逻辑输出；执行部件数据旁路寄存器序列504存储n个线程前两条指令的中间执行结果。FIG. 5 is a structural diagram of a hardware multi-thread execution device 203 in conjunction with the overall structural diagram of the hardware multi-thread control device for a microprocessor in FIG. 2 . The hardware multi-thread execution device 203 includes: a multi-thread initialization special instruction execution device 501 , a thread number register sequence 502 , an execution unit multi-thread register sequence 503 , and an execution unit data bypass register sequence 504 . Multi-thread initialization special-purpose instruction executive device 501 produces the hardware thread number of multi-thread; Thread number register sequence 502 is n-level register cache, and the thread number of certain hardware thread that will produce is in register sequence 502 position Th_DFF_i and this hardware thread is in instruction address Register sequence (302), fetching part multi-thread register sequence (303), decoding part multi-thread register sequence (401), executing part multi-thread register sequence (503), accessing part multi-thread register sequence (601), writing The positions in the part multi-thread register sequence (701) are consistent; the execution part multi-thread register sequence 503 temporarily stores the intermediate results of the execution logic n-level pipeline, stores the output of part execution logic Ej in each stage register, and executes the part multi-thread register The thread number corresponding to the data stored from the E_DFF_1 register to the E_DFF_n register in the sequence 503 is consistent with the thread number corresponding to the IF_DFF_1 register to the IF_DFF_n register in the multi-threaded register sequence 303 of the instruction fetch unit, that is, if the IFk instruction fetch of the thread i is stored in the IF_DFF_k register logic output, the E_DFF_k register stores the Ek execution logic output of thread i; the execution unit data bypass register sequence 504 stores the intermediate execution results of the first two instructions of n threads.

图6是配合图2用于微处理器的硬件多线程控制装置总体结构图的硬件多线程访存器件204结构图。硬件多线程访存器件204包括：访存部件多线程寄存器序列601，该序列暂存访存逻辑n级流水的中间结果，每级寄存器中存放部分访存逻辑Mj的输出，且访存部件多线程寄存器序列601中从M_DFF_1寄存器到M_DFF_n寄存器所存数据对应的线程号与取指部件多线程寄存器序列303中从IF_DFF_1寄存器到IF_DFF_n寄存器所对应的线程号一致，即若IF_DFF_k寄存器中存放线程i的IFk取指逻辑输出，则M_DFF_k寄存器存放线程i的Mk执行逻辑输出。FIG. 6 is a structural diagram of a hardware multithread memory access device 204 that cooperates with the overall structural diagram of the hardware multithread control device for a microprocessor in FIG. 2 . The hardware multi-thread memory access device 204 includes: a multi-threaded register sequence 601 of the memory access part, which temporarily stores the intermediate results of the n-level pipeline of the memory access logic, stores the output of part of the memory access logic Mj in each stage of registers, and has many memory access parts The thread number corresponding to the data stored from the M_DFF_1 register to the M_DFF_n register in the thread register sequence 601 is consistent with the thread number corresponding to the IF_DFF_1 register to the IF_DFF_n register in the multi-threaded register sequence 303 of the instruction fetching part, that is, if the IFk of the thread i is stored in the IF_DFF_k register Instruction fetch logic output, then the M_DFF_k register stores the Mk execution logic output of thread i.

图7是配合图2用于微处理器的硬件多线程控制装置总体结构图的硬件多线程写回器件205结构图。硬件多线程写回器件205包括：写回部件多线程寄存器序列701，该序列暂存写回逻辑n级流水的中间结果，每级寄存器中存放部分写回逻辑Wj的输出，且写回部件多线程寄存器序列701中从W_DFF_1寄存器到W_DFF_n寄存器所存数据对应的线程号与取指部件多线程寄存器序列303中从IF_DFF_1寄存器到IF_DFF_n寄存器所对应的线程号一致，即若IF_DFF_k寄存器中存放线程i的IFk取指逻辑输出，则W_DFF_k寄存器存放线程i的Wk写回逻辑输出。FIG. 7 is a structural diagram of a hardware multithread write-back device 205 that cooperates with the overall structural diagram of the hardware multithread control device for a microprocessor in FIG. 2 . The hardware multi-threaded write-back device 205 includes: a write-back component multi-thread register sequence 701, which temporarily stores the intermediate results of the write-back logic n-level pipeline, stores part of the output of the write-back logic Wj in each level of registers, and has many write-back components The thread number corresponding to the data stored in the thread register sequence 701 from the W_DFF_1 register to the W_DFF_n register is consistent with the thread number corresponding to the IF_DFF_1 register to the IF_DFF_n register in the multi-threaded register sequence 303 of the instruction fetching part, that is, if the IFk of the thread i is stored in the IF_DFF_k register Fetch logic output, then the W_DFF_k register stores thread i's Wk write-back logic output.

图8是配合图2用于微处理器的硬件多线程控制装置总体结构图的硬件多线程寄存器组206结构图。硬件多线程寄存器组206包括：译码器801、多路选通器802、n个寄存器组803。译码器801根据多线程控制器件207产生的寄存器控制信号和硬件多线程译码器件202或硬件多线程写回器件205产生的地址信号进行译码，输出使当前线程的寄存器组Regs_i有效的使能信号和当前操作的寄存器地址；多路选通器802，根据多线程控制器件207产生的寄存器控制信号，选通当前线程的某个寄存器组的数据输出；n个寄存器组803，分别提供给n个线程使用，各自独立，写入数据来自硬件多线程写回器件205的输出数据，读出数据送给硬件多线程译码器件202。FIG. 8 is a structural diagram of a hardware multithreading register set 206 in conjunction with the overall structural diagram of the hardware multithreading control device for a microprocessor in FIG. 2 . The hardware multi-thread register set 206 includes: a decoder 801 , a multiplexer 802 , and n register sets 803 . The decoder 801 decodes according to the register control signal generated by the multi-thread control device 207 and the address signal generated by the hardware multi-thread decoding device 202 or the hardware multi-thread write-back device 205, and outputs the valid register group Regs_i of the current thread. Can signal and the register address of current operation; Multiplexer 802, according to the register control signal that multi-thread control device 207 produces, selects the data output of a certain register group of current thread; N register groups 803, provide respectively to The n threads are used independently of each other, and the written data comes from the output data of the hardware multi-threaded write-back device 205 , and the read data is sent to the hardware multi-threaded decoding device 202 .

本发明可在微处理器上实现硬件多线程执行，通过提供一种多线程初始化专用指令来完成软件多线程到硬件多线程的映射，图9是所提供的硬件多线程初始化专用指令的编码格式。包括：用于标识该指令的操作码域901；用于操作目的操作数的操作数域902，此目的操作数可以是该硬件线程对应的寄存器组中某个寄存器，用来在初始化阶段标识该硬件线程号；用于操作源操作数的操作数域903，此源操作数可以是新产生的一个线程号。The present invention can realize hardware multi-thread execution on a microprocessor, and completes the mapping from software multi-thread to hardware multi-thread by providing a multi-thread initialization special instruction, and Fig. 9 is the encoding format of the provided hardware multi-thread initialization special instruction . Including: an opcode field 901 for identifying the instruction; an operand field 902 for operating the destination operand, the destination operand can be a register in the register bank corresponding to the hardware thread, and is used to identify the instruction in the initialization phase. Hardware thread number; the operand field 903 for operating the source operand, and the source operand may be a newly generated thread number.

图10是硬件多线程控制方法步骤图。多线程取指步骤1001，用于各个线程的指令读取，各个线程的指令地址产生。包括多线程指令地址控制、多线程指令地址缓存、多线程取指。多线程译码步骤1002，用于对各个硬件线程的指令进行译码，准备好执行步骤所需要的寄存器数据。包括多线程译码、多线程寄存器操作数准备、译码部件数据旁路控制。多线程执行步骤1003，用于执行各个线程指令。包括多线程初始化专用指令执行、线程号缓存、执行部件数据旁路控制、多线程常规指令执行。多线程访存步骤1004，用于各个线程的执行结果写入到存储器或者从存储器读入线程所需数据。多线程写回步骤1005，用于各个线程的执行结果写回到寄存器，包括多线程写回数据控制、多线程写回寄存器地址控制。FIG. 10 is a step diagram of a hardware multi-thread control method. The multi-thread instruction fetching step 1001 is used to read the instructions of each thread, and generate the instruction address of each thread. Including multi-thread instruction address control, multi-thread instruction address cache, and multi-thread instruction fetch. The multi-thread decoding step 1002 is used to decode the instructions of each hardware thread, and prepare the register data required by the execution steps. Including multi-thread decoding, multi-thread register operand preparation, data bypass control of decoding components. The multi-thread execution step 1003 is used to execute instructions of each thread. Including multi-thread initialization special instruction execution, thread number cache, execution unit data bypass control, multi-thread conventional instruction execution. The multi-thread memory access step 1004 is used to write the execution result of each thread into the memory or read the data required by the thread from the memory. The multi-thread write-back step 1005 is used to write the execution result of each thread back to the register, including multi-thread write-back data control and multi-thread write-back register address control.

下面以硬件4线程为例(n＝4)，对本发明的实现时序进行描述。Taking 4 hardware threads as an example (n=4) below, the implementation sequence of the present invention will be described.

图11是硬件多线程流水线划分及时钟示意图。以101取指逻辑为例，图11中将101取指逻辑IF对称划分为4级流水，则每级逻辑的延时变为单级流水时的1/4，如果图1中IF逻辑的时钟周期为T，对应的时钟为图11中的clk1，则硬件四线程的时钟可以如clk2所示，频率为clk1的4倍，使得单级流水时执行一个线程的时间内现在硬件四线程可以并行执行4个线程。Fig. 11 is a schematic diagram of hardware multi-thread pipeline division and clock. Taking 101 instruction fetch logic as an example, in Figure 11, the 101 instruction fetch logic IF is symmetrically divided into 4 stages of pipeline, and the delay of each stage of logic becomes 1/4 of that of a single stage of pipeline, if the clock of the IF logic in Figure 1 The period is T, and the corresponding clock is clk1 in Figure 11. Then the clock of the four hardware threads can be shown as clk2, and the frequency is 4 times that of clk1, so that the four threads of the hardware can run in parallel within the time of executing one thread in a single-stage pipeline. Execute 4 threads.

图12是硬件多线程执行时序示意图。以n＝4为例，#1，#2，#3，#4分别对应于4个不同的硬件线程，所有的硬件线程都从相同指令地址处开始取指执行，假设T1时刻开始执行，硬件线程#1执行第一条指令的取指逻辑的第一步IF1；T2时刻硬件线程#1执行第一条指令的取指逻辑的第二步IF2，同时硬件线程#2执行第一条指令的取指逻辑的第一步IF1；依次类推，T5时刻，硬件线程#1执行第一条指令的译码逻辑的第二步ID2和第二条指令的取指逻辑的第二步IF2，硬件线程#2执行第一条指令的译码逻辑的第一步ID1和第二条指令的取指逻辑的第一步IF1。假设第一条指令是一条多线程初始化专用指令，则T2时刻硬件线程#2开始取多线程初始化专用指令，T5时刻硬件线程#2开始译码多线程初始化专用指令，T6时刻硬件线程#2开始执行多线程初始化专用指令，产生相应的硬件线程号2，T7时刻该硬件线程号2通过数据旁路传递给硬件线程#2的第二、三条指令的执行和译码阶段，T8时刻硬件线程#2开始初始化专用指令写回逻辑，T9时刻完成初始化专用指令写回逻辑。在T7时刻初始化专用指令已经计算出硬件线程号2，并且可以通过数据旁路传递给硬件线程#2的第二、三条指令的执行和译码阶段，第三条指令若为条件跳转指令，则其在T7时刻开始译码该条件跳转指令，通过判断来自数据旁路的硬件线程号来决定是否进行跳转，即将线程号和2进行比较，对于硬件线程#2，其硬件线程号与2相等，实现跳转，对于其他线程，其硬件线程号不等于2，不进行跳转，继续顺序执行，从而实现了各个线程的分离，该条件跳转指令不必只为第三条指令，在第三条指令之后都可，因为T7时刻已经计算出硬件线程号，可以通过条件跳转指令进行条件跳转。对于线程分离的其他情况的描述类似，可见，如果第一条指令为多线程初始化专用指令，则第三条指令或者第三条指令之后通过条件跳转指令就可以实现线程的分离。FIG. 12 is a schematic diagram of hardware multi-thread execution timing. Taking n=4 as an example, #1, #2, #3, and #4 respectively correspond to 4 different hardware threads, and all hardware threads start fetching and executing instructions from the same instruction address. Assuming that execution starts at T1, the hardware Thread #1 executes the first step IF1 of the instruction fetch logic of the first instruction; at T2, hardware thread #1 executes the second step IF2 of the instruction fetch logic of the first instruction, while hardware thread #2 executes the instruction fetch logic of the first instruction The first step IF1 of the fetching logic; and so on, at T5 time, the hardware thread #1 executes the second step ID2 of the decoding logic of the first instruction and the second step IF2 of the fetching logic of the second instruction, the hardware thread #2 Execute the first step ID1 of the decoding logic of the first instruction and the first step IF1 of the instruction fetch logic of the second instruction. Assuming that the first instruction is a special instruction for multi-thread initialization, hardware thread #2 starts to fetch the special instruction for multi-thread initialization at time T2, hardware thread #2 starts to decode the special instruction for multi-thread initialization at time T5, and hardware thread #2 starts at time T6 Execute the special instruction for multi-thread initialization to generate the corresponding hardware thread number 2. At T7, the hardware thread number 2 is passed to the execution and decoding stage of the second and third instructions of hardware thread #2 through data bypass. At T8, the hardware thread # 2. Start to initialize the special instruction write-back logic, and complete the initialization special instruction write-back logic at time T9. The hardware thread number 2 has been calculated by the initialization special instruction at T7, and can be passed to the second and third instruction execution and decoding stages of the hardware thread #2 through data bypass. If the third instruction is a conditional jump instruction, Then it starts to decode the conditional jump instruction at T7 time, and decides whether to jump by judging the hardware thread number from the data bypass, that is, comparing the thread number with 2, for hardware thread #2, its hardware thread number is the same as 2 is equal, jump is realized, for other threads, its hardware thread number is not equal to 2, do not jump, continue to execute sequentially, thereby realizing the separation of each thread, the conditional jump instruction does not have to be only the third instruction, in After the third instruction, it is possible, because the hardware thread number has been calculated at T7, and the conditional jump can be performed through the conditional jump instruction. The descriptions for other situations of thread separation are similar. It can be seen that if the first instruction is a special instruction for multi-thread initialization, then the third instruction or the third instruction can realize thread separation through a conditional jump instruction after the third instruction.

Claims

1. a hardware multi-thread control method for microprocessor, it is characterized in that comprising the following steps:

1) Multithread instruction fetching step (1001), used for instruction reading of each thread, and instruction address generation of each thread;

2) Multi-thread decoding step (1002), used to decode the instructions of each hardware thread, and prepare the register data required by the multi-thread execution step (1003);

3) Multi-thread execution step (1003), for executing each thread instruction;

4) The multi-thread memory access step (1004) is used to symmetrically divide the memory access logic (105) into n-level pipelines. The execution results of each thread are written into the memory or the data required by the thread is read from the memory. When a certain thread When blocked, only the data access of this thread is blocked, and the data access of other threads runs normally;

5) Multi-thread writing back step (1005), for writing the execution results of each thread back to the register bank.

2. The hardware multi-thread control method as claimed in claim 1, characterized in that the multi-thread instruction fetching step (1001) comprises multi-thread instruction address control, multi-thread instruction address cache, and multi-thread instruction fetch;

1) Multi-thread instruction address control, which is used to generate the instruction address of each thread. When a thread is blocked, only the update of the instruction address of this thread is blocked, and the instruction addresses of other threads are updated normally;

2) Multi-thread instruction address cache, used to store the instruction addresses of n threads;

3) Multi-thread instruction fetching, which is used to symmetrically divide the instruction fetch logic (101) into n-level pipelines, and fetch software thread instructions corresponding to n hardware threads.

3. The hardware multi-thread control method as claimed in claim 1, characterized in that the multi-thread decoding step (1002) comprises multi-thread decoding, multi-thread register operand preparation, decoding component data bypass control;

1) Multi-thread decoding, which is used to symmetrically divide the decoding logic (102) into n-level pipelines to complete multi-thread initialization special instruction decoding and regular instruction decoding. When a thread is blocked, only the thread instruction translation is blocked code, other thread instruction decoding runs normally;

The multi-thread initialization special instruction includes: opcode domain (901), used to identify the instruction, operand domain (902), used for operating purpose operand, operand domain (903), used for operating source operand ;

2) The multi-thread register operand preparation is used to select the register group corresponding to the thread from n register groups, and read the required operand of the instruction;

3) Data bypass control of the decoding component, which is used to provide data bypassed to a certain pipeline stage of instruction decoding;

4. The hardware multi-thread control method as claimed in claim 1, characterized in that the multi-thread execution step (1003) includes multi-thread initialization special instruction execution, thread number cache, execution component data bypass control, multi-thread conventional instruction execution;

1) Execution of multi-thread initialization special instructions, used to generate a new hardware thread number, the new hardware thread number corresponds to the software thread executed by the hardware thread;

2) thread number cache, for the new hardware thread number cache that described multi-thread initialization special instruction execution produces, makes the position of certain new hardware thread number that produces in the thread number register sequence (502) and this hardware thread at address Pointer register sequence (302), fetching part multi-thread register sequence (303), decoding part multi-thread register sequence (401), execution part multi-thread register sequence (503), memory access part multi-thread register sequence (601), The position in the write-back component multi-thread register sequence (701) is consistent;

3) Execute component data bypass control, which is used to provide data bypassed to a certain pipeline stage of instruction execution;

4) Execution of multi-thread conventional instructions, which is used to symmetrically divide the execution logic (103) into n-level pipelines, and complete the execution of conventional instructions of n hardware threads. When a certain thread is blocked, only the execution of the thread instruction is blocked, and other thread instructions Perform normal operation.

5. The hardware multi-thread control method as claimed in claim 1, characterized in that the multi-thread write-back step (1005) symmetrically divides the write-back logic (105) into n-level pipelines, and when a certain thread is blocked, only this thread is blocked. Thread data is written back, and other thread data is written back to normal operation, and the multi-thread write-back step (1005) includes multi-thread write-back data control, multi-thread write-back register address control;

1) Multi-threaded write-back data control, used for preparation and output of register data to be written back;

2) The multi-threaded write-back register address control is used for the preparation and output of the register address to be written back.

6. A hardware multi-thread control device for a microprocessor, the device supports n hardware threads to execute in parallel by adopting pipeline technology; it is characterized in that the device includes a hardware multi-thread instruction fetching device (201), a hardware multi-thread Decoding device (202), hardware multi-thread execution device (203), hardware multi-thread memory access device (204), hardware multi-thread write-back device (205), hardware multi-thread register group (206), multi-thread control device ( 207);

The hardware multi-thread instruction fetching device (201) is used to complete the value fetching operation for n hardware threads, output the instruction to the hardware multi-thread decoding device (202), and control the instruction fetching outputted by the multi-thread control device (207) The signal completes the update operation of the next instruction address of n hardware threads, and realizes the storage of the instruction address;

The hardware multi-thread decoding device (202), is used to receive instructions from the n threads of the hardware multi-thread instruction fetching device (201), and decode control signals from the multi-thread control device (207), to complete n hardware threads Decoding operation, outputting the operation control information and data information to be operated to the hardware multi-thread execution device (203);

The hardware multi-thread execution device (203) is used to receive the operation control information and data information from the hardware multi-thread decoding device (202), the execution control signal from the multi-thread control device (207), and complete the instruction execution of n threads , output the data information of the execution result to the hardware multi-thread memory access device (204);

The hardware multi-thread memory access device (204) is used to complete the memory access operations of n hardware threads; the hardware multi-thread memory access device (204) includes a memory access component multi-thread register sequence (601), which is used for temporary storage access Store the intermediate results of logical n-level pipelines, and each level of registers M_DFF 1 ~ M_DFF n corresponds to the output of a part of the memory access logic M1 ~ Mn of a hardware thread, wherein the input of the memory access logic M1 is the output of the hardware multi-thread execution device (203) With the access control signal that multi-thread control device (207) produces, the output of register M_DFF n is as the output of hardware multi-thread memory access device (204);

The hardware multi-thread write-back device (205) is used to output the data information to be written back and the current register address to be written back to the hardware multi-thread register group (206); the hardware multi-thread write-back device (205) includes Write-back component multi-threaded register sequence (701), used to temporarily store the intermediate results of write-back logic n-level pipeline, each level of registers W DFF 1～W_DFF n corresponds to the output of a hardware thread’s part write-back logic W1～Wn, wherein The input of writing back logic W1 is the write-back control signal that the output of hardware multi-thread memory access device (204) and multi-thread control device (207) produce, and the output of register W_DFF n writes back the output of device (205) as hardware multi-thread ;

The hardware multi-thread register group (206), according to the register control signal output by the multi-thread control device (207), cooperates with the hardware multi-thread decoding device (202) and the hardware multi-thread write-back device (205), to complete n hardware threads Register bank read and write operations;

The multi-thread control device (207) outputs control signals to control the execution of each device of the entire hardware multi-thread device, specifically including the following control signals: generating an instruction fetch control signal and outputting it to the hardware multi-thread instruction fetch device (201), generating translation A code control signal is output to the hardware multi-thread decoding device (202), an execution control signal is generated and output to the hardware multi-thread execution device (203), a memory access control signal is generated and output to the hardware multi-thread memory access device ( 204), generating a write-back control signal to output to the hardware multi-thread write-back device (205), generating a register control signal to output to the hardware multi-thread register set (206).

7. The hardware multi-thread control device according to claim 6, wherein the hardware multi-thread instruction fetching device (201) comprises the following components:

1) the instruction address control device (301), according to the output of the instruction fetch control signal (comprising instruction address control signal and jump address) and pc_DFFn produced by the multi-thread control device (207), generates the next instruction address of the current thread, And send it into the instruction address register sequence (302);

2) instruction address register sequence (302), used to store the instruction addresses of n threads, the instruction address of each thread passes through the instruction address registers pc_DFF1～pc_DFF n in turn;

3) The multi-threaded register sequence (303) of the instruction fetching component is used to temporarily store the intermediate results of the n-stage pipeline of the instruction fetch logic, and each stage of registers IF_DFF 1 ~ IF_DFF n corresponds to the output of part of the instruction fetch logic IF1 ~ IFn of a hardware thread, Wherein the input of the logic IF1 fetching is the output of the instruction address control device (301) and the fetching control signal produced by the multi-thread control device (207), and the output of the register IF_DFF n is the output of the hardware multi-thread fetching device (201).

8. The hardware multi-thread control device according to claim 6, wherein the hardware multi-thread decoding device (202) comprises:

1) The multi-threaded register sequence (401) of the decoding part is used to temporarily store the intermediate results of the n-level pipeline of the decoding logic, and each level of registers ID_DFF 1 ~ ID_DFF n corresponds to the output of part of the decoding logic ID1 ~ IDn of a hardware thread, Wherein the input of the decoding logic ID1 is the output of the hardware multi-thread instruction fetching device (201), the read data of the hardware multi-thread register group (206) and the decoding control signal produced by the multi-thread control device (207), register ID_DFF n The output is the output of the hardware multi-thread decoding device (202);

2) data bypass register sequence (402), for storing the intermediate execution results of the first two instructions of each thread, and the stored data comes from the output data of the hardware multi-thread execution device (203) and the hardware multi-thread memory access device (204) respectively ), the input data of each stage register ID_Bypass_1~ID_Bypass_n-1 of the data bypass register sequence are respectively input to the corresponding decoding logic ID1~IDn-1, and the output data of the register ID_Bypass_n-1 is input to the decoding logic IDn .

9. The hardware multi-thread control device according to claim 6, wherein the hardware multi-thread execution device (203) comprises the following components:

1) Multi-thread initialization dedicated instruction execution device (501), used to generate hardware thread number, and output the generated hardware thread number to the thread number register sequence (502), the hardware thread number corresponds to the software executed by the hardware thread thread;

2) thread number register sequence (502), for the hardware thread number that cache produces, each thread number that is produced by multi-thread initialization special-purpose instruction execution device (501) passes through thread number register Th_DFF1～Th_DFF n successively;

3) The multi-threaded register sequence (503) of the execution unit is used to temporarily store the intermediate results of the execution logic n-level pipeline, and each level of registers E_DFF 1 ~ E_DFF n corresponds to the output of a part of the execution logic E1 ~ En of a hardware thread, wherein the execution logic The input of E1 is the execution control signal that the output of hardware multi-thread decoding device (202) and multi-thread control device (207) produce, the output of register E_DFF n or the output of register Th_DFF n is as hardware multi-thread execution device (203) output;

4) The data bypass register sequence (504), which is used to store the intermediate execution results of the first two instructions of each thread, respectively comes from the output data of the hardware multi-thread execution device (203) and the hardware multi-thread memory access device (204) For output data, the input data of registers E_Bypass_1~E_Bypass_n-1 of each stage of the data bypass register sequence are respectively input to the corresponding execution logic E1~En-1, and the output data of register E_Bypass_n-1 is input to the decoding logic En.

10. The hardware multi-thread control device according to claim 6, characterized in that, the hardware multi-thread register set (206) comprises the following components:

1) Decoder (801), according to the register control signal generated by the multi-thread control device (207) and the address signal generated by the hardware multi-thread decoding device (202) or the hardware multi-thread write-back device (205), to decode code, outputting the register bank enable signal of the current thread and the register address of the current operation;

2) multiplexer (802), selects the data of the register group of current thread according to the register control signal that multi-thread control device (207) produces, and outputs it;

3) n register groups (803), respectively provided to n threads to use, independent of each other, write data from the output data of the hardware multi-thread write-back device (205), and read data to the hardware multi-thread decoding device ( 202).