CN113407239B - Pipeline processor based on asynchronous monorail - Google Patents

Pipeline processor based on asynchronous monorail Download PDF

Info

Publication number
CN113407239B
CN113407239B CN202110644854.3A CN202110644854A CN113407239B CN 113407239 B CN113407239 B CN 113407239B CN 202110644854 A CN202110644854 A CN 202110644854A CN 113407239 B CN113407239 B CN 113407239B
Authority
CN
China
Prior art keywords
module
asynchronous
click
pipeline
signal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110644854.3A
Other languages
Chinese (zh)
Other versions
CN113407239A (en
Inventor
田龙锋
虞志益
王凯
肖山林
李智宇
黄宇皓
朱瑞敏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sun Yat Sen University
Original Assignee
Sun Yat Sen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sun Yat Sen University filed Critical Sun Yat Sen University
Priority to CN202110644854.3A priority Critical patent/CN113407239B/en
Publication of CN113407239A publication Critical patent/CN113407239A/en
Application granted granted Critical
Publication of CN113407239B publication Critical patent/CN113407239B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3867Concurrent instruction execution, e.g. pipeline, look ahead using instruction pipelines
    • G06F9/3871Asynchronous instruction pipeline, e.g. using handshake signals between stages
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The application relates to a pipeline processor based on asynchronous monorail, comprising: the system comprises an asynchronous control module, an instruction fetching module, a decoding module, an executing module, a self-adaptive selection module, a memory access module, a write-back module, a storage module, a control and status register and a general register, wherein data communication is completed among the modules through asynchronous single-rail handshake, the asynchronous control module comprises a plurality of control units, the plurality of control units are a plurality of phase decoupling Click units, and the plurality of phase decoupling Click units are mutually cascaded through handshake and are respectively connected with corresponding pipelines. According to the method and the device, the problems of high power consumption, serious global clock offset and low clock frequency limiting speed of the pipeline processor in the related technology are solved, and the pipeline is operated at a higher speed without a clock under low power consumption.

Description

Pipeline processor based on asynchronous monorail
Technical Field
The present application relates to the field of electronic information, processors and asynchronous circuits, and in particular to a pipeline processor based on an asynchronous monorail.
Background
With the rapid development of the internet of things and artificial intelligence, the SOC technology is continuously mature, most of the current chips integrate own processors, and the visible processors play an important role in the electronic technology, so that the design of the processors is receiving extensive attention. The processor architecture is generally an arithmetic logic unit, a register unit and a control unit, which are all composed of a large number of registers, and the data processing instruction operates only on the registers. Because of the existence of the global clock, although the operation speed and the execution efficiency are high, the register always turns over along with the clock, so that more energy is consumed, and additional power consumption is increased. In addition, since most processors are designed by synchronous circuits, global clock skew is serious, a complex clock tree network exists, the design is difficult, and the clock tree occupies the design area and power consumption of a chip seriously. Meanwhile, in the synchronous circuit, all paths work under the same clock, in order to ensure that all logic operations can be completed in one clock cycle, the clock frequency is limited by the key path delay in the circuit, other paths are influenced at the same time, and the optimization of the key paths is difficult, so that the clock frequency is difficult to improve, and the performance of the whole processor is limited. Therefore, the prior art has the problems of high power consumption of the pipeline processor, serious global clock offset and slow speed of limiting the clock frequency.
Disclosure of Invention
In this embodiment, an asynchronous monorail-based pipeline processor is provided to solve the problems of high power consumption, serious global clock offset, and limited clock frequency and slow speed of the pipeline processor in the related art.
An asynchronous monorail-based pipeline processor of the present application, comprising: the system comprises an asynchronous control module, an instruction fetching module, a decoding module, an executing module, a self-adaptive selection module, a memory access module, a write-back module, a storage module, a control and status register and a general register, wherein data communication is completed among the modules through asynchronous single-rail handshake, the asynchronous control module comprises a plurality of control units, the plurality of control units are a plurality of phase decoupling Click units, and the plurality of phase decoupling Click units are mutually cascaded through handshake and are respectively connected with corresponding pipelines.
And after the handshake between the phase decoupling Click units is successful, a Click signal for controlling the pipeline of the stage is generated.
The first-stage pipeline comprises the instruction fetching module, the second-stage pipeline comprises the decoding module, the third-stage pipeline comprises the execution module, the fourth-stage pipeline comprises the memory accessing module, and the fifth-stage pipeline comprises the write-back module.
The phase decoupling Click unit generates a Click signal, so that the program counter calculates an instruction address according to a control signal, the instruction address is transmitted to the instruction fetch module, and meanwhile, a request signal is sent to the next-stage phase decoupling Click unit, wherein the control signal is an electric signal comprising skip, abnormality and interruption, and the request signal is an electric signal sent by the previous-stage phase decoupling Click unit to the next-stage phase decoupling Click unit and requesting the next-stage phase decoupling Click unit to work.
The instruction fetching module reads an instruction from the storage module according to the instruction address and transmits the instruction to the decoding module.
The decoding module decodes the instruction, reads the data to be processed related to the instruction from the control and status register and the general register, and transmits the data to be processed to the execution module.
The execution module executes corresponding operation according to the data to be processed, and transmits the operation result to the memory module after obtaining the operation result.
And the memory access module performs read-write operation on the memory module according to the operation result.
The execution module comprises a prediction flushing module, a jump module and a bypass module, wherein the prediction flushing module flushes incorrect prediction instructions, the jump module generates jump addresses according to jump signals and operation results and returns the jump addresses to the program counter, and the bypass module obtains needed data from the rear module in advance according to register addresses.
The self-adaptive selection module acquires an operation result from the execution module and transmits the operation result to the memory module and the write-back module according to the control signal. And the write-back module receives the operation result transmitted by the execution module and the access module and writes back the operation result into a register. The storage module comprises an instruction storage module and is used for storing received instructions; and the data storage module is used for storing the received data.
Compared with the related art, the pipeline processor based on the asynchronous monorail solves the problems of high power consumption, serious global clock offset and low clock frequency limiting speed of the pipeline processor in the related art, and realizes that the pipeline runs at a higher speed without a clock under low power consumption.
The details of one or more embodiments of the application are set forth in the accompanying drawings and the description below to provide a more thorough understanding of the other features, objects, and advantages of the application.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute an undue limitation to the application. In the drawings:
FIG. 1 is a schematic diagram of a pipeline processor based on an asynchronous monorail;
FIG. 2 is a logic diagram of a phase decoupling Click unit according to an embodiment of the present application;
fig. 3 is a logic schematic diagram of the control unit c_ex2mem of the embodiment of the present application;
fig. 4 is a logic schematic diagram of a control unit c_mem2WB of an embodiment of the present application;
fig. 5 is a logic schematic diagram of a first stage control unit of an asynchronous control module in an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described and illustrated below with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application. All other embodiments, which can be made by one of ordinary skill in the art without undue burden on the person of ordinary skill in the art based on the embodiments provided herein, are intended to be within the scope of the present application.
It is apparent that the drawings in the following description are only some examples or embodiments of the present application, and it is possible for those of ordinary skill in the art to apply the present application to other similar situations according to these drawings without inventive effort. Moreover, it should be appreciated that while such a development effort might be complex and lengthy, it would nevertheless be a routine undertaking of design, fabrication, or manufacture for those of ordinary skill having the benefit of this disclosure, and thus should not be construed as having the benefit of this disclosure.
Reference in the specification to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the application. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is to be expressly and implicitly understood by those of ordinary skill in the art that the embodiments described herein can be combined with other embodiments without conflict.
Unless defined otherwise, technical or scientific terms used herein should be given the ordinary meaning as understood by one of ordinary skill in the art to which this application belongs. Reference to "a," "an," "the," and similar terms herein do not denote a limitation of quantity, but rather denote the singular or plural. The terms "comprising," "including," "having," and any variations thereof, are intended to cover a non-exclusive inclusion; for example, a process, method, system, article, or apparatus that comprises a list of steps or modules (elements) is not limited to only those steps or elements but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus. The terms "connected," "coupled," and the like in this application are not limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. The term "plurality" as used herein refers to two or more. "and/or" describes an association relationship of an association object, meaning that there may be three relationships, e.g., "a and/or B" may mean: a exists alone, A and B exist together, and B exists alone. The character "/" generally indicates that the context-dependent object is an "or" relationship. The terms "first," "second," "third," and the like, as used herein, are merely distinguishing between similar objects and not representing a particular ordering of objects.
In this embodiment an asynchronous monorail based pipelined processor is provided. FIG. 1 is a schematic diagram of an asynchronous monorail-based pipeline processor of the present application, such as the one shown in FIG. 1, comprising: the system comprises an asynchronous control module, an instruction fetching module, a decoding module, an executing module, a self-adaptive selection module, a memory access module, a write-back module, a storage module, a control and status register and a general register, wherein data communication is completed among the modules through asynchronous monorail handshake, the asynchronous control module comprises a plurality of control units, the plurality of control units are a plurality of phase decoupling Click units, and the plurality of phase decoupling Click units are mutually cascaded through handshake and are respectively connected with corresponding pipelines. In the above embodiment, the instruction fetch module corresponds to the first stage pipeline 20, the decode module corresponds to the second stage pipeline 30, the execution module corresponds to the third stage pipeline 40, the memory access module corresponds to the fourth stage pipeline 50, and the write-back module corresponds to the fifth stage pipeline 60.
In this embodiment, the phase decoupling Click unit generates a Click signal that controls the pipeline of this stage after a handshake between them is successful. The phase decoupling Click units generate Click signals through handshake of 'request' and 'response' signals, and the Click signals replace global clock of a synchronous circuit to control the Click signals of the pipeline of the stage, so that the clock signals are driven by events and belong to a circuit without global clock. As shown in the connection between the first control unit 101 and the first stage pipeline 20 in fig. 1, the specific working procedures of the first control unit 101 and the first stage pipeline 20 are as follows: the first control unit 101 generates a click signal of the program counter according to an enable signal of the processor operation, causes the program counter to operate once, calculates the next instruction address according to control signals such as skip, exception, interrupt, etc., and transmits the next instruction address to the first stage pipeline 20. The control unit generates a request signal of the control unit of the next stage while generating the click signal of the present stage. The control signal is an electric signal comprising skip, abnormality and interruption, and the request signal is an electric signal sent by the previous-stage phase decoupling Click unit to the next-stage phase decoupling Click unit and requesting the next-stage phase decoupling Click unit to work. The request signal is transmitted between the upper and lower stages of the adjacent control units, and the specific overview is: the upper control unit (such as the first control unit) sends a request signal to the adjacent lower control unit (such as the second control unit), wherein the request signal is an electric signal which can enable the lower control unit to control the corresponding pipeline to work, the lower control unit starts the connected pipeline to work after receiving the request signal sent by the upper control unit, and meanwhile, the upper control unit sends a response signal to inform the upper control unit that the control task of the lower control unit is finished. This embodiment demonstrates only a simple five-stage pipeline structure, and changes can be made to the content included in the pipeline if the actual application requires it. Compared with the prior art, the clock signal input device has the advantages that the clock module (control unit) in the asynchronous monorail circuit is used for replacing the global clock in the synchronous circuit by using the clock signal generated by the clock module (control unit) in the asynchronous monorail circuit according to the state of the request signal and the response signal, so that no external clock signal input is needed, a complex clock network is avoided, the clock tree network occupies a large amount of chip area, the power consumption is increased, and meanwhile, the design is simple, the running speed of a processor can be improved, and the power consumption is reduced.
It should be noted that an initialization signal is required to initialize the processor before the processor begins operation. After initialization, the data inside the processor is in an initial state, and the asynchronous pipeline is in a dead state. When the enable signal level for processor operation is high, the asynchronous monorail pipeline processor begins operation. Therefore, when the processor is used for processing data each time, an initialization signal is needed to initialize the processor, so that the data of each module, the memory and the register in the processor are ensured to be in an initial state.
In this embodiment, the first to fifth control units refer to different phase decoupling Click units, and naming rules and corresponding relations are as follows: the first control unit is a phase decoupling Click unit c_pc, the second control unit is a phase decoupling Click unit c_if2id, the third control unit is a phase decoupling Click unit c_id2ex, the fourth control unit is a phase decoupling Click unit c_ex2me, and the fifth control unit is a phase decoupling Click unit c_mem2WB. Correspondingly, the five control units respectively control five most of registers, specifically: the first control unit c_pc controls the PC register, the second control unit c_if2id controls the IF2ID register, the third control unit c_id2ex controls the ID2EX register, the fourth control unit c_ex2me controls the EX2ME register, and the fifth control unit c_mem2WB controls the CSR status register and the general purpose register. The above correspondence is only one of specific embodiments, and is not represented as having only the above-described correspondence. In the above embodiment, after the instruction fetch module (the first stage pipeline 20) receives the instruction address, the working contents and the working manner of each module of the subsequent processor are as follows: the instruction fetching module reads instructions from the storage module according to the instruction addresses and transmits the instructions to the decoding module. Meanwhile, the control unit C_IF2ID receives a request signal generated by the C_PC unit and a response signal returned by the C_IF2EX unit, and generates a click signal for controlling the pipeline operation of the stage after handshake is successful, and generates a request signal for the next stage of pipeline and a response signal returned to the previous stage of pipeline. The decode module (second stage pipeline 30) decodes the instruction and reads the data to be processed associated with the instruction from the control and status registers, general purpose registers, and transfers the data to be processed to the execution module. Meanwhile, the control unit C_id2EX receives a request signal generated by the C_IF2ID unit and a response signal returned by the C_EX2ME unit, and generates a click signal for controlling the pipeline operation of the stage after handshake is successful, and generates a request signal for the next stage of pipeline and a response signal returned to the previous stage of pipeline. The execution module (the third-stage pipeline 40) executes corresponding operation according to the data to be processed, and transmits the operation result to the memory module after obtaining the operation result. Meanwhile, the control unit C_EX2MEM receives a request signal generated by the C_Ed2 EX unit and a response signal returned by the C_MEM2WB unit, and generates a click signal for controlling the pipeline operation of the stage after handshake is successful, and generates a request signal for a next stage pipeline and a response signal returned to a previous stage pipeline. The memory access module (fourth stage pipeline 50) performs read-write operation on the memory module according to the operation result. The self-adaptive selection module acquires an operation result from the execution module, and transmits the operation result to the memory access module and the write-back module according to the control signal. The write-back module (fifth stage pipeline 60) receives the operation result transmitted by the execution module and the memory access module, and writes back the operation result to the register. The control units C_Wreg and C_Wcsr generate click signals written back to the CSR state register and the general purpose register, and belong to a write-back control unit. Meanwhile, the control unit C_MEM2WB receives 2 request signals generated by the C_EX2MEM unit and response signals returned by the C_Wreg and C_Wcsr units, and generates click signals for controlling the pipeline operation of the stage after handshake is successful, and simultaneously generates request signals for the next stage pipeline and response signals for the previous stage pipeline. Specifically, the control units c_wreg and c_wcsr are control units that control CSR and REGISTER REGISTERs, and a phase decoupling Click unit is employed. The storage module comprises an instruction storage module and is used for storing the received instructions; and the data storage module is used for storing the received data. In the working mode and the connection relation, each stage of pipeline is driven by a pulse signal generated by event completion, the pulse signal frequency of each stage of pipeline is different, and the frequency is limited only by the longest path of the stage of pipeline, so that the processing speed of the processor is faster than that of the synchronous circuit.
The execution module comprises a prediction flushing module, a jump module and a bypass module, wherein the prediction flushing module flushes incorrect prediction instructions, the jump module generates jump addresses according to jump signals and operation results and returns the jump addresses to the program counter, and the bypass module obtains needed data from the post module in advance according to register addresses, so that collision between the data can be avoided.
For the register IF2ID adopted by the invention, the second control unit C_IF2ID is correspondingly used as a switch for data transmission, when a Click pulse generated by the second control unit C_IF2ID arrives, the switch is opened to transmit data transmitted by a previous stage pipeline, and the other time is in a closed state. The relatively closed design can effectively avoid the data in the register from being disturbed or damaged when the corresponding control unit does not send out a Click pulse signal. It should be noted that other registers used in the present invention have the same effect as the register IF2ID described above, i.e. the switch is turned on when the click pulse arrives, and the rest of the time is turned off.
FIG. 2 is a logic diagram of a phase decoupling Click unit according to an embodiment of the present application. First, the naming convention in the figure is: d is a module controlled by the phase decoupling Click unit, in_data and out_data represent Data input to the module and Data output from the module, in_req and out_req represent an input request signal and an output request signal, respectively, and in_ack and out_ack represent an input response signal and an output response signal, respectively. Therefore, for the first control unit c_pc, it has only two signals of out_req and in_ack, because the first control unit preamble is not connected to the control unit, and thus it is not necessary to output the response signal and the reception request signal to the outside. As shown in fig. 2, the phase decoupling Click unit workflow is: assuming that in_req=1, in_ack=0, out_ack=1, out_req=0, the in_req signal is exclusive-ored with the in_ack signal, the out_req signal is exclusive-ored with the out_ack signal, the output results are all 1, and then a Click pulse is generated through an and gate, so that Pi and Po are triggered, and the in_ack and the out_req are turned over, and the values are all changed to 1. I.e. the value of the return reply signal and the generation request signal is 1, i.e. a handshake is completed. Through the phase decoupling Click unit, a Click pulse signal can be generated at required time to drive a connected pipeline, and the generated Click signal is used for replacing a global clock in a synchronous circuit, so that a clock tree network is prevented from occupying a large amount of chip area and increasing power consumption, and meanwhile, the phase decoupling Click unit has the advantage of simple design, and can improve the running speed of a processor and reduce the power consumption.
It is to be noted in detail that the control unit c_ex2mem and the control unit c_mem2WB are a pair of control units that are mutually matched, and are combined together to realize the selection function of the handshake signals, and the working principle of the two is described next.
Fig. 3 is a logic diagram of the control unit c_ex2mem of the embodiment of the present application. As shown in fig. 3, the c_ex2mem control unit is a pre-handshake selector, and selects and outputs different handshake request signals according to control signals. The Sel signal and the last generated request signal are exclusive-or and exclusive-nor respectively, and the request signal is obtained through a register triggered by the click signal. When Sel is 1, the handshake generation request signal req2 is 1, and req1 is 0; when Sel is 0, the handshake generation request signal req1 is 1, at which time req2 is 0. Through the design, the control unit C_EX2MEM can output different request signals according to different Sel values to control the corresponding pipelines to execute corresponding operations.
Fig. 4 is a logic diagram of the control unit c_mem2WB according to an embodiment of the present application. As shown in fig. 4, the c_mem2WB control unit is a post handshake selector, which selects different request acknowledgement signals for handshake according to the control signal. The Sel signal and the response signal generated by the upper control unit are exclusive-or and exclusive-nor respectively, then the response signal is obtained through a register triggered by the Click signal, the response signal is exclusive-or with the request signal given by the upper stage, the Click signal is obtained through the combinational logic, and the Click signal triggers the register to enable the request signal given to the next stage pipeline to be overturned. As shown in fig. 4, the c_mem2WB control unit may implement a handshake function of two pairs of request acknowledge signals. The Sel signal and Sel of the c_ex2mem unit are the same signal, and the two control units are matched to realize the function of selecting handshake signals.
Fig. 5 is a logic schematic diagram of a first stage control unit of an asynchronous control module in the embodiment of the application, as shown in fig. 5, a control unit c_pc generates a click signal of a program counter according to an enabling signal of a processor to operate, so that the program counter operates once, calculates a next instruction address according to control signals of skip, exception, interrupt, etc., and transmits the next instruction address to a value taking module. The first stage control unit generates a request signal of the next stage control module c_if2id while generating the click signal of the present stage. The c_pc unit is a first stage control unit of the pipeline, so it does not require an input of a request signal and an output of a response signal. D in the figure is a first stage pipeline and corresponds to the finger taking module. The first stage pipeline D starts working after receiving the click signal generated by the first control unit, and transmits output Data out_data to the next stage pipeline.
As can be seen from the description of the above embodiments, the present invention has at least the following advantages: the asynchronous monorail circuit is adopted, and the global clock in the synchronous circuit is replaced by the click signal, so that the invention does not need to design a clock tree network, and the consumed power consumption and occupied area of the clock tree network are large, therefore, the invention can greatly reduce the power consumption of the processor circuit, save the designable area in the circuit, provide larger space for circuit transformation and optimization, simultaneously, adopt the asynchronous circuit without the problem that the global clock frequency is limited by the key path delay in the circuit, the key paths of all modules are independent, and better optimize the key paths.
The foregoing examples represent only a few embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the patent. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, which would be within the scope of the present application. Accordingly, the scope of protection of the present application shall be subject to the appended claims.

Claims (7)

1. An asynchronous monorail-based pipelined processor, comprising: the system comprises an asynchronous control module, an instruction taking module, a decoding module, an executing module, a self-adaptive selection module, a memory access module, a write-back module, a storage module, a control and status register and a general register, wherein data communication is completed among the modules through asynchronous single-rail handshake, the asynchronous control module comprises a plurality of control units, the plurality of control units are a plurality of phase decoupling Click units, and the plurality of phase decoupling Click units are mutually cascaded through handshake and are respectively connected with corresponding pipelines;
after handshake between the phase decoupling Click units is successful, a Click signal for controlling the pipeline of the stage is generated;
the phase decoupling Click unit generates a Click signal, so that a program counter calculates an instruction address according to a control signal, the instruction address is transmitted to the instruction fetch module, and meanwhile, a request signal is sent to a next-stage phase decoupling Click unit, wherein the control signal is an electric signal comprising skip, abnormality and interruption, and the request signal is an electric signal which is sent by a previous-stage phase decoupling Click unit to the next-stage phase decoupling Click unit and requests the next-stage phase decoupling Click unit to work;
the self-adaptive selection module acquires an operation result from the execution module, transmits the operation result to the access module and the write-back module according to a control signal, and the write-back module receives the operation result transmitted by the execution module and the access module and writes the operation result back to a register; the data storage module is used for storing the received data;
wherein an initialization signal is required to initialize the processor before the processor begins operation.
2. The asynchronous monorail-based pipeline processor of claim 1, wherein a first stage pipeline comprises the finger fetch module, a second stage pipeline comprises the decode module, a third stage pipeline comprises the execute module, a fourth stage pipeline comprises the memory module, and a fifth stage pipeline comprises the write back module.
3. An asynchronous monorail-based pipeline processor according to claim 1, wherein the instruction fetch module reads instructions from the memory module based on the instruction address and transfers the instructions to the decode module.
4. A pipeline processor based on asynchronous monorail according to claim 3, wherein said decode module decodes said instruction and reads the data to be processed associated with said instruction from said control and status register, said general purpose register, and transfers said data to be processed to said execution module.
5. The pipeline processor based on asynchronous monorail of claim 4, wherein the execution module performs corresponding operations according to the data to be processed, and transmits the operation result to the memory module after obtaining the operation result.
6. The asynchronous monorail-based pipeline processor of claim 5, wherein the memory module performs read-write operations on the memory module according to the operation result.
7. The asynchronous monorail-based pipeline processor of claim 1, wherein the execution module comprises a predictive flush module, a skip module, and a bypass module, the predictive flush module flushing incorrect predicted instructions, the skip module generating a skip address based on a skip signal and an operation result and returning the skip address to the program counter, the bypass module obtaining the required data from the post module in advance based on the register address.
CN202110644854.3A 2021-06-09 2021-06-09 Pipeline processor based on asynchronous monorail Active CN113407239B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110644854.3A CN113407239B (en) 2021-06-09 2021-06-09 Pipeline processor based on asynchronous monorail

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110644854.3A CN113407239B (en) 2021-06-09 2021-06-09 Pipeline processor based on asynchronous monorail

Publications (2)

Publication Number Publication Date
CN113407239A CN113407239A (en) 2021-09-17
CN113407239B true CN113407239B (en) 2023-06-13

Family

ID=77683307

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110644854.3A Active CN113407239B (en) 2021-06-09 2021-06-09 Pipeline processor based on asynchronous monorail

Country Status (1)

Country Link
CN (1) CN113407239B (en)

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4982402A (en) * 1989-02-03 1991-01-01 Digital Equipment Corporation Method and apparatus for detecting and correcting errors in a pipelined computer system
US5752070A (en) * 1990-03-19 1998-05-12 California Institute Of Technology Asynchronous processors
CN107092462A (en) * 2017-04-01 2017-08-25 何安平 A kind of 64 Asynchronous Multipliers based on FPGA
CN107404380A (en) * 2017-06-30 2017-11-28 吴尽昭 A kind of RSA Algorithm based on asynchronous data-path
CN207473606U (en) * 2017-07-27 2018-06-08 兰州大学 The communicating circuit of disparate step artificial neural network chip based on click controllers
CN108537331A (en) * 2018-04-04 2018-09-14 清华大学 A kind of restructural convolutional neural networks accelerating circuit based on asynchronous logic
CN109815619A (en) * 2019-02-18 2019-05-28 清华大学 A method of asynchronous circuit is converted by synchronous circuit
CN109918130A (en) * 2019-01-24 2019-06-21 中山大学 A kind of four level production line RISC-V processors with rapid data bypass structure
CN110928832A (en) * 2019-10-09 2020-03-27 中山大学 Asynchronous pipeline processor circuit, device and data processing method
CN111078294A (en) * 2019-11-22 2020-04-28 苏州浪潮智能科技有限公司 Instruction processing method and device of processor and storage medium
CN112486312A (en) * 2020-11-19 2021-03-12 杭州电子科技大学 Low-power-consumption processor
CN112667292A (en) * 2021-01-26 2021-04-16 北京中科芯蕊科技有限公司 Asynchronous miniflow line controller

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9384003B2 (en) * 2007-10-23 2016-07-05 Texas Instruments Incorporated Determining whether a branch instruction is predicted based on a capture range of a second instruction
US10892968B2 (en) * 2015-12-18 2021-01-12 Google Llc Systems and methods for latency reduction in content item interactions using client-generated click identifiers

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4982402A (en) * 1989-02-03 1991-01-01 Digital Equipment Corporation Method and apparatus for detecting and correcting errors in a pipelined computer system
US5752070A (en) * 1990-03-19 1998-05-12 California Institute Of Technology Asynchronous processors
CN107092462A (en) * 2017-04-01 2017-08-25 何安平 A kind of 64 Asynchronous Multipliers based on FPGA
CN107404380A (en) * 2017-06-30 2017-11-28 吴尽昭 A kind of RSA Algorithm based on asynchronous data-path
CN207473606U (en) * 2017-07-27 2018-06-08 兰州大学 The communicating circuit of disparate step artificial neural network chip based on click controllers
CN108537331A (en) * 2018-04-04 2018-09-14 清华大学 A kind of restructural convolutional neural networks accelerating circuit based on asynchronous logic
CN109918130A (en) * 2019-01-24 2019-06-21 中山大学 A kind of four level production line RISC-V processors with rapid data bypass structure
CN109815619A (en) * 2019-02-18 2019-05-28 清华大学 A method of asynchronous circuit is converted by synchronous circuit
CN110928832A (en) * 2019-10-09 2020-03-27 中山大学 Asynchronous pipeline processor circuit, device and data processing method
CN111078294A (en) * 2019-11-22 2020-04-28 苏州浪潮智能科技有限公司 Instruction processing method and device of processor and storage medium
CN112486312A (en) * 2020-11-19 2021-03-12 杭州电子科技大学 Low-power-consumption processor
CN112667292A (en) * 2021-01-26 2021-04-16 北京中科芯蕊科技有限公司 Asynchronous miniflow line controller

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
A Design Flow for Click-Based Asynchronous Circuits Design With Conventional EDA Tools;Hui Wu et al.;《IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS》;第第40卷卷(第第11期期);第2421-2425页 *
Design and FPGA-implementation of Asynchronous Circuits Using Two-phase Handshaking;Adrian Mardari et al.;《2019 25th IEEE International Symposium on Synchronous Circuits and Systems》;第9-18页 *
基于约束数据捆绑两相握手协议的8位异步Booth乘法器设计;何安平;刘晓庆;陈虹;;电子学报(第04期);全文 *

Also Published As

Publication number Publication date
CN113407239A (en) 2021-09-17

Similar Documents

Publication Publication Date Title
US20190018815A1 (en) Processors, methods, and systems with a configurable spatial accelerator
CN111512292A (en) Apparatus, method and system for unstructured data flow in a configurable spatial accelerator
EP3776229A1 (en) Apparatuses, methods, and systems for remote memory access in a configurable spatial accelerator
US10140124B2 (en) Reconfigurable microprocessor hardware architecture
WO2020005448A1 (en) Apparatuses, methods, and systems for unstructured data flow in a configurable spatial accelerator
TW201802668A (en) Interruptible and restartable matrix multiplication instructions, processors, methods, and systems
CN103150146A (en) ASIP (application-specific instruction-set processor) based on extensible processor architecture and realizing method thereof
CN110427337B (en) Processor core based on field programmable gate array and operation method thereof
US8977835B2 (en) Reversing processing order in half-pumped SIMD execution units to achieve K cycle issue-to-issue latency
CN109739556B (en) General deep learning processor based on multi-parallel cache interaction and calculation
CN110928832A (en) Asynchronous pipeline processor circuit, device and data processing method
US11907713B2 (en) Apparatuses, methods, and systems for fused operations using sign modification in a processing element of a configurable spatial accelerator
US8402251B2 (en) Selecting configuration memory address for execution circuit conditionally based on input address or computation result of preceding execution circuit as address
US8171259B2 (en) Multi-cluster dynamic reconfigurable circuit for context valid processing of data by clearing received data with added context change indicative signal
US7917707B2 (en) Semiconductor device
US5909588A (en) Processor architecture with divisional signal in instruction decode for parallel storing of variable bit-width results in separate memory locations
CN113407239B (en) Pipeline processor based on asynchronous monorail
CN111008042B (en) Efficient general processor execution method and system based on heterogeneous pipeline
US10445099B2 (en) Reconfigurable microprocessor hardware architecture
CN110045989B (en) Dynamic switching type low-power-consumption processor
Lee et al. A low-power implementation of asynchronous 8051 employing adaptive pipeline structure
CN101989191B (en) Realizing method of multi-Ready input CPU (central processing unit)
US20210089305A1 (en) Instruction executing method and apparatus
CN113986354A (en) RISC-V instruction set based six-stage pipeline CPU
US20140115358A1 (en) Integrated circuit device and method for controlling an operating mode of an on-die memory

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant