CN112579172A - Processing circuit and method for multi-cycle same-instruction execution of non-pipeline unit - Google Patents

Processing circuit and method for multi-cycle same-instruction execution of non-pipeline unit Download PDF

Info

Publication number
CN112579172A
CN112579172A CN202011397778.2A CN202011397778A CN112579172A CN 112579172 A CN112579172 A CN 112579172A CN 202011397778 A CN202011397778 A CN 202011397778A CN 112579172 A CN112579172 A CN 112579172A
Authority
CN
China
Prior art keywords
unit
instruction
buffer
control unit
buffer control
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011397778.2A
Other languages
Chinese (zh)
Other versions
CN112579172B (en
Inventor
牛少平
田泽
魏艳艳
郝冲
许宏杰
王绮卉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian Xiangteng Microelectronics Technology Co Ltd
Original Assignee
Xian Xiangteng Microelectronics Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian Xiangteng Microelectronics Technology Co Ltd filed Critical Xian Xiangteng Microelectronics Technology Co Ltd
Priority to CN202011397778.2A priority Critical patent/CN112579172B/en
Publication of CN112579172A publication Critical patent/CN112579172A/en
Application granted granted Critical
Publication of CN112579172B publication Critical patent/CN112579172B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3802Instruction prefetching
    • G06F9/3814Implementation provisions of instruction buffers, e.g. prefetch buffer; banks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3867Concurrent instruction execution, e.g. pipeline, look ahead using instruction pipelines
    • G06F9/3869Implementation aspects, e.g. pipeline latches; pipeline synchronisation and clocking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/20Processor architectures; Processor configuration, e.g. pipelining
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention relates to a processing circuit and a method for executing multiple cycles of same instructions by a non-pipelined unit. The processing circuit of the invention comprises an input buffer unit, a non-pipeline unit, a buffer control unit and an output buffer unit, wherein: the input buffer unit is respectively connected with the non-pipelined unit and the buffer control unit; the non-pipeline unit is respectively connected with the input cache unit, the buffer control unit and the output cache unit; the buffer control unit is respectively connected with the input buffer unit, the non-pipeline unit and the output buffer unit; the output buffer unit is respectively connected with the non-pipelined unit and the buffer control unit. The invention facilitates the statistics of the execution state of the non-pipeline unit when the multi-cycle same instruction is issued, and also ensures the relative consistency with the execution processing of other pipeline units.

Description

Processing circuit and method for multi-cycle same-instruction execution of non-pipeline unit
Technical Field
The invention belongs to the field of computer graphic processing hardware, and relates to a processing circuit and a method for multi-cycle same-instruction execution of a non-pipelined unit.
Background
Aiming at the problem that the execution state statistics is complex due to different execution modes of different execution units in the prior art that the execution functional units corresponding to the instructions realize diversity, flow, block non-flow and non-block non-flow when multiple cycles are issued with the same instruction in a unified dyeing array.
Disclosure of Invention
The present invention provides a processing circuit and method for executing a non-pipelined unit with multiple cycles and the same instruction, which facilitates the statistics of the execution state of the non-pipelined unit when the multiple cycles and the same instruction are issued, and also ensures the relative consistency of the execution processing with other pipelined units.
The technical solution of the invention is as follows: the invention relates to a processing circuit for executing multiple cycles of same instruction in a non-pipeline unit, which is characterized in that: the processing circuit comprises an input buffer unit, a non-pipeline unit, a buffer control unit and an output buffer unit, wherein:
the input buffer unit is respectively connected with the non-flow unit and the buffer control unit, carries out n-level buffer on externally input instruction data of n periods, outputs the instruction data to the non-flow unit for n times through the operation times of the buffer control unit, and outputs a first instruction operation enabling signal to the buffer control unit;
the non-pipeline unit is respectively connected with the input cache unit, the buffer control unit and the output cache unit, completes the operation function of the instruction data given by the input cache unit, the operation cycle is m, outputs the operation result to the output cache unit, and outputs the effective signal of the operation result to the buffer control unit;
the buffer control unit is respectively connected with the input buffer unit, the non-flow unit and the output buffer unit, obtains the operation times through the effective signals of the operation results of the non-flow unit, generates instruction operation enabling signals through the operation times and the first instruction operation enabling signals of the input buffer unit, and outputs the instruction operation enabling signals to the non-flow unit;
the output buffer unit is respectively connected with the non-pipelined unit and the buffer control unit, n-level buffer storage is respectively carried out on the operation result of the non-pipelined unit according to n operation completion signals of the buffer control unit, an operation result effective signal is output at the last n beats after the nth operation is finished, n periods are effective, and the operation results of n times are output together.
Preferably, the non-pipelined unit can receive the same 1 instruction in a plurality of cycles and complete the same 1 instruction operation in a plurality of cycles.
Preferably, the buffer control unit includes an m-bit shift register, and controls the non-pipeline unit to execute n operations, which are sequentially executed.
A method for implementing the processing circuit for multi-cycle concurrent instruction execution of non-pipelined units, the method comprising: the method comprises the following steps:
1) the input cache unit performs n-level cache on externally input instruction data of n periods, then outputs the instruction data to the non-flow unit by n times through the operation times of the cache control unit, and outputs a first instruction operation enabling signal to the cache control unit;
2) the non-pipeline unit completes the operation function of the instruction data given by the input cache unit, the operation period is m, the operation result is output to the output cache unit, and the effective signal of the operation result is output to the buffer control unit;
3) the buffer control unit obtains the operation times through the effective signals of the operation results of the non-flow unit, n operation completion signals, and generates instruction operation enabling signals through the operation times and the first instruction operation enabling signals input into the buffer unit and outputs the instruction operation enabling signals to the non-flow unit;
4) the output buffer unit respectively carries out n-level buffer storage on the operation result of the non-pipeline unit according to n operation completion signals of the buffer control unit, and outputs an operation result effective signal for n periods in the last n beats after the nth operation is finished, and simultaneously outputs the operation results for n times together.
The invention provides a processing circuit and a method for executing a multi-period same instruction by a non-pipeline unit, which comprises a processing circuit consisting of an input cache unit, the non-pipeline unit, a buffer control unit and an output cache unit, wherein the input and the result output of the multi-period instruction are subjected to multi-stage caching through the processing, and the processing is executed according to a pseudo-pipeline method, so that the statistics of the execution state of the non-pipeline unit is facilitated, and the relative consistency of the execution processing of other pipeline units is also ensured.
Drawings
FIG. 1 is a block diagram of the circuit of the present invention.
Detailed Description
The invention provides a processing circuit for executing multiple cycles of same instruction by a non-pipeline unit, which comprises an input cache unit, the non-pipeline unit, a buffer control unit and an output cache unit, wherein:
the input buffer unit is respectively connected with the non-flow unit and the buffer control unit, carries out n-level buffer on externally input instruction data of n periods, outputs the instruction data to the non-flow unit for n times through the operation times of the buffer control unit, and outputs a first instruction operation enabling signal to the buffer control unit;
the non-pipeline unit is respectively connected with the input cache unit, the buffer control unit and the output cache unit, completes the operation function of the instruction data given by the input cache unit, the operation cycle is m, outputs the operation result to the output cache unit, and outputs the effective signal of the operation result to the buffer control unit; the non-pipelined unit can receive the same 1 instruction in a plurality of cycles and complete the same 1 instruction operation in a plurality of cycles.
The buffer control unit is respectively connected with the input buffer unit, the non-flow unit and the output buffer unit, obtains the operation times through the effective signals of the operation results of the non-flow unit, generates instruction operation enabling signals through the operation times and the first instruction operation enabling signals of the input buffer unit, and outputs the instruction operation enabling signals to the non-flow unit; the buffer control unit has a shift register with m bits, and controls the non-pipeline unit to execute n times of operation, and the n times of operation are executed sequentially.
The output buffer unit is respectively connected with the non-pipelined unit and the buffer control unit, n-level buffer storage is respectively carried out on the operation result of the non-pipelined unit according to n operation completion signals of the buffer control unit, an operation result effective signal is output at the last n beats after the nth operation is finished, n periods are effective, and the operation results of n times are output together.
The invention also provides a method for processing circuit of non-pipeline unit multi-cycle same-instruction execution, which comprises the following steps:
1) the input cache unit performs n-level cache on externally input instruction data of n periods, then outputs the instruction data to the non-flow unit by n times through the operation times of the cache control unit, and outputs a first instruction operation enabling signal to the cache control unit;
2) the non-pipeline unit completes the operation function of the instruction data given by the input cache unit, the operation period is m, the operation result is output to the output cache unit, and the effective signal of the operation result is output to the buffer control unit;
3) the buffer control unit obtains the operation times through the effective signals of the operation results of the non-flow unit, n operation completion signals, and generates instruction operation enabling signals through the operation times and the first instruction operation enabling signals input into the buffer unit and outputs the instruction operation enabling signals to the non-flow unit;
4) the output buffer unit respectively carries out n-level buffer storage on the operation result of the non-pipeline unit according to n operation completion signals of the buffer control unit, and outputs an operation result effective signal for n periods in the last n beats after the nth operation is finished, and simultaneously outputs the operation results for n times together.
The technical solution of the present invention is further described in detail with reference to the accompanying drawings and specific embodiments.
Referring to fig. 1, a processing circuit for multi-cycle concurrent instruction execution of a non-pipeline unit according to an embodiment of the present invention includes an input buffer unit 1, a non-pipeline unit 2, a buffer control unit 3, and an output buffer unit 4.
The input buffer unit 1 is connected with the non-flow unit 2 and the buffer control unit 3, performs n-level buffer storage on externally input instruction data of n periods, outputs the instruction data to the non-flow unit 2 in n times through the operation times of the buffer control unit 3, and outputs a first instruction operation enabling signal to the buffer control unit 3;
the non-pipeline unit 2 is connected with the input buffer unit 1, the buffer control unit 3 and the output buffer unit 4, completes the operation function of the instruction data given by the input buffer unit 1, the operation cycle is m, outputs the operation result to the output buffer unit 4, and outputs the effective signal of the operation result to the buffer control unit 3.
The buffer control unit 3 is connected with the input buffer unit 1, the non-pipeline unit 2 and the output buffer unit 4, obtains the operation times through the effective signals of the operation results of the non-pipeline unit 2, generates the instruction operation enabling signals through the operation times and the first instruction operation enabling signals input into the buffer unit 1, and outputs the instruction operation enabling signals to the non-pipeline unit 2.
The output buffer unit 4 is connected with the non-pipeline unit 2 and the buffer control unit 3, and respectively performs n-level buffer storage on the operation result of the non-pipeline unit 2 according to n operation completion signals of the buffer control unit 3, and outputs an operation result effective signal for n cycles when the last n beats of operation of the nth operation are finished, and simultaneously outputs the operation results for n times together.
The non-pipeline unit 2 of the circuit can receive the same 1 instruction in a plurality of cycles and complete the same 1 instruction operation in a plurality of cycles.
A buffer control unit 3 of the circuit has an m-bit shift register, and controls a non-pipeline unit 2 to execute operations n times in sequence.
In one embodiment of the present invention, the multi-cycle is 4 cycles, and the operation cycle of the non-pipelined unit is 19 cycles. The processing circuit comprises an input buffer unit 1, a non-pipeline unit 2, a buffer control unit 3 and an output buffer unit 4. The specific treatment method comprises the following steps:
1) the input buffer unit 1 performs 4-level buffer storage on externally input 4-cycle instruction data, outputs the instruction data to the non-pipeline unit 2 by 4 times through the operation times of the buffer control unit 3, and outputs a first instruction operation enabling signal to the buffer control unit 3;
2) the non-pipeline unit 2 completes the operation function of the instruction data given by the input cache unit 1, the operation period is 19, the operation result is output to the output cache unit 4, and the effective signal of the operation result is output to the buffer control unit 3;
3) the buffer control unit 3 obtains the operation times through the effective signal of the operation result of the non-pipeline unit 2, 4 operation completion signals, generates an instruction operation enabling signal through the operation times and the first instruction operation enabling signal input into the buffer unit 1, and outputs the instruction operation enabling signal to the non-pipeline unit 2.
4) 4 operation completion signals of the output buffer unit 4 respectively carry out 4-level buffer on the operation results of the non-pipeline unit 2, and an operation result effective signal is output at the last 4 beats of the 4 th operation end, so that the operation results are effective for 4 periods, and the operation results of 4 times are output together.
The non-pipeline unit 2 of the circuit can receive the same 1 instruction in 4 cycles and complete the same 1 instruction operation in 4 cycles.
The buffer control unit 3 of this circuit has a 19-bit shift register, and controls the non-pipeline unit to execute 4 operations, and the 4 operations are executed sequentially.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (4)

1. A non-pipelined cell multi-cycle same-instruction execution processing circuit, comprising: the processing circuit comprises an input buffer unit, a non-pipeline unit, a buffer control unit and an output buffer unit, wherein:
the input cache unit is respectively connected with the non-flow unit and the buffer control unit, carries out n-level cache on externally input instruction data of n periods, outputs the instruction data to the non-flow unit for n times through the operation times of the buffer control unit, and outputs a first instruction operation enabling signal to the buffer control unit;
the non-pipelined unit is respectively connected with the input cache unit, the buffer control unit and the output cache unit, completes the operation function of the instruction data given by the input cache unit, the operation cycle is m, outputs the operation result to the output cache unit, and outputs the effective signal of the operation result to the buffer control unit;
the buffer control unit is respectively connected with the input buffer unit, the non-flow unit and the output buffer unit, obtains the operation times through the effective signals of the operation results of the non-flow unit, generates instruction operation enabling signals through the operation times and the first instruction operation enabling signals of the input buffer unit, and outputs the instruction operation enabling signals to the non-flow unit;
the output buffer unit is respectively connected with the non-pipeline unit and the buffer control unit, n-level buffer storage is respectively carried out on the operation result of the non-pipeline unit according to n operation completion signals of the buffer control unit, an operation result effective signal is output at the last n beats after the nth operation is finished, n periods are effective, and the operation results of n times are output together.
2. The non-pipelined unit multi-cycle same-instruction-execution processing circuit of claim 1 further comprising: the non-pipelined unit can receive the same 1 instruction in a plurality of cycles and complete the same 1 instruction operation in a plurality of cycles.
3. The non-pipelined unit multi-cycle same-instruction-execution processing circuit of claim 1 further comprising: the buffer control unit is provided with an m-bit shift register and controls the non-pipeline unit to execute n times of operation, and the n times of operation are executed sequentially.
4. A method of implementing the non-pipelined, multi-cycle, same-instruction-execution processing circuit of claim 1, wherein: the method comprises the following steps:
1) the input cache unit performs n-level cache on externally input instruction data of n periods, then outputs the instruction data to the non-flow unit by n times through the operation times of the cache control unit, and outputs a first instruction operation enabling signal to the cache control unit;
2) the non-pipeline unit completes the operation function of the instruction data given by the input cache unit, the operation period is m, the operation result is output to the output cache unit, and the effective signal of the operation result is output to the buffer control unit;
3) the buffer control unit obtains the operation times through the effective signals of the operation results of the non-flow unit, n operation completion signals, and generates instruction operation enabling signals through the operation times and the first instruction operation enabling signals input into the buffer unit and outputs the instruction operation enabling signals to the non-flow unit;
4) the output buffer unit respectively carries out n-level buffer storage on the operation result of the non-pipeline unit according to n operation completion signals of the buffer control unit, and outputs an operation result effective signal for n periods in the last n beats after the nth operation is finished, and simultaneously outputs the operation results for n times together.
CN202011397778.2A 2020-12-05 2020-12-05 Processing circuit and method for multi-cycle same-instruction execution of non-pipeline unit Active CN112579172B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011397778.2A CN112579172B (en) 2020-12-05 2020-12-05 Processing circuit and method for multi-cycle same-instruction execution of non-pipeline unit

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011397778.2A CN112579172B (en) 2020-12-05 2020-12-05 Processing circuit and method for multi-cycle same-instruction execution of non-pipeline unit

Publications (2)

Publication Number Publication Date
CN112579172A true CN112579172A (en) 2021-03-30
CN112579172B CN112579172B (en) 2022-09-23

Family

ID=75127843

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011397778.2A Active CN112579172B (en) 2020-12-05 2020-12-05 Processing circuit and method for multi-cycle same-instruction execution of non-pipeline unit

Country Status (1)

Country Link
CN (1) CN112579172B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112579173A (en) * 2020-12-05 2021-03-30 西安翔腾微电子科技有限公司 Multi-warp multi-cycle dual-emission instruction state recording circuit and method

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1841313A (en) * 2005-03-31 2006-10-04 国际商业机器公司 Processor and method for handling multi-cycle non-pipelined instruction sequencing
US20140164734A1 (en) * 2012-12-06 2014-06-12 International Business Machines Corporation Concurrent multiple instruction issue of non-pipelined instructions using non-pipelined operation resources in another processing core
CN111399912A (en) * 2020-03-26 2020-07-10 超验信息科技(长沙)有限公司 Instruction scheduling method, system and medium for multi-cycle instruction

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1841313A (en) * 2005-03-31 2006-10-04 国际商业机器公司 Processor and method for handling multi-cycle non-pipelined instruction sequencing
US20060224864A1 (en) * 2005-03-31 2006-10-05 Dement Jonathan J System and method for handling multi-cycle non-pipelined instruction sequencing
US20140164734A1 (en) * 2012-12-06 2014-06-12 International Business Machines Corporation Concurrent multiple instruction issue of non-pipelined instructions using non-pipelined operation resources in another processing core
CN111399912A (en) * 2020-03-26 2020-07-10 超验信息科技(长沙)有限公司 Instruction scheduling method, system and medium for multi-cycle instruction

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
A.KONDRATYEV ET AL.: "Realistic performance-constrained pipelining in high-level synthesis", 《2011 DESIGN, AUTOMATION & TEST IN EUROPE》 *
沈春江: "面向OpenGL图形流水线的纹理载入单元设计与验证", 《中国优秀硕士学位论文全文数据库-信息科技辑》 *
王晓未: "嵌入式GPU中统一染色器核的研究与设计", 《中国优秀硕士学位论文全文数据库-信息科技辑》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112579173A (en) * 2020-12-05 2021-03-30 西安翔腾微电子科技有限公司 Multi-warp multi-cycle dual-emission instruction state recording circuit and method

Also Published As

Publication number Publication date
CN112579172B (en) 2022-09-23

Similar Documents

Publication Publication Date Title
CN109598338B (en) Convolutional neural network accelerator based on FPGA (field programmable Gate array) for calculation optimization
CN105912501B (en) A kind of SM4-128 Encryption Algorithm realization method and systems based on extensive coarseness reconfigurable processor
CN1809810A (en) Instruction controlled data processing device
US9170816B2 (en) Enhancing processing efficiency in large instruction width processors
US20240004666A1 (en) Floating-point supportive pipeline for emulated shared memory architectures
CN112099762B (en) Synergistic processing system and method for rapidly realizing SM2 cryptographic algorithm
CN112487750A (en) Convolution acceleration computing system and method based on memory computing
CN112579172B (en) Processing circuit and method for multi-cycle same-instruction execution of non-pipeline unit
EP2352082B1 (en) Data processing device for performing a plurality of calculation processes in parallel
CN112667289A (en) CNN reasoning acceleration system, acceleration method and medium
CN110928832A (en) Asynchronous pipeline processor circuit, device and data processing method
US6898693B1 (en) Hardware loops
US6748523B1 (en) Hardware loops
US6766444B1 (en) Hardware loops
US20020078333A1 (en) Resource efficient hardware loops
US11531638B2 (en) Reconfigurable circuit array using instructions including a fetch configuration data portion and a transfer configuration data portion
CN108196881B (en) Fixed-point operation acceleration unit based on configurable technology
JP2014215624A (en) Arithmetic processing device
CN115048334A (en) Programmable array processor control apparatus
CN113986354A (en) RISC-V instruction set based six-stage pipeline CPU
CN109343826B (en) Reconfigurable processor operation unit for deep learning
US6920547B2 (en) Register adjustment based on adjustment values determined at multiple stages within a pipeline of a processor
CN112905528A (en) Intelligent household chip based on Internet of things
Hu et al. 3D waveform oscilloscope implemented on coupled FPGA-GPU embedded system
US20160162290A1 (en) Processor with Polymorphic Instruction Set Architecture

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant