CN102779026B - Multi-emission method of instructions in high-performance DSP (digital signal processor) - Google Patents

Multi-emission method of instructions in high-performance DSP (digital signal processor) Download PDF

Info

Publication number
CN102779026B
CN102779026B CN201210222667.7A CN201210222667A CN102779026B CN 102779026 B CN102779026 B CN 102779026B CN 201210222667 A CN201210222667 A CN 201210222667A CN 102779026 B CN102779026 B CN 102779026B
Authority
CN
China
Prior art keywords
instruction
buffer
fetch
transmitting
transmitting register
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201210222667.7A
Other languages
Chinese (zh)
Other versions
CN102779026A (en
Inventor
杨晓刚
张庆文
黄嵩人
屈凌翔
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CETC 58 Research Institute
Original Assignee
CETC 58 Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by CETC 58 Research Institute filed Critical CETC 58 Research Institute
Priority to CN201210222667.7A priority Critical patent/CN102779026B/en
Publication of CN102779026A publication Critical patent/CN102779026A/en
Application granted granted Critical
Publication of CN102779026B publication Critical patent/CN102779026B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Advance Control (AREA)

Abstract

The invention discloses a multi-emission method of instructions in a high-performance DSP (digital signal processor). A multi-emission mechanism is realized by comprising an organization structure of an instruction Cache, instruction aligning and pre-fetching and a multi-emission device of the instructions, wherein the organization structure of the instruction can support fetching of a plurality of instructions, in the aligning and pre-fetching of the instructions, and the required instruction can be located rapidly according to a fetching address transmitted by the CPU (central processing unit); and the multi-emission device of the instructions can transmit the instructions to a corresponding assembly line according to the types of the instructions. The multi-emission method disclosed by the invention has the advantages that the efficiency of fetching and emitting the instructions can be improved, thereby improving the overall performance of the DSP.

Description

A kind of instruction multi-emitting method in High Performance DSP processor
Technical field
The present invention relates to the design of dsp processor kernel, specifically a kind of instruction multi-emitting method in High Performance DSP processor.
Background technology
Digital signal processor DSP (Digital Signal Processor) is a kind of microprocessor that is specifically designed to digital signal processing, and it can complete the algorithm process of various digital signals real-time.Because DSP has quick response and high-speed computation, so it is widely used in fields such as consumer electronics, communication, Aero-Space and national defense safeties.The develop rapidly of DSP microprocessor has all produced tremendous influence to national defense safety and daily life.
The research and development of High Performance DSP processor are the lofty perches that every country scientific and technological strength is competed in the world, China also pays much attention at microprocessor and researches and develops this field now, there are at present many colleges and universities and scientific research institution all microprocessor to be launched to research and design effort, and also obtained certain achievement, as " Godson ", " Noah's ark ", " milky way is soared " are come out one after another.But because our country starts late in this field, the processor of developing is all also cannot be at war with external chip in performance or in concrete practical application, still under one's control in the gordian technique of some cores.Therefore the high-performance processor that development has China's independent intellectual property right is all significant to China's economic development and national defense safety.
Advanced semiconducter process allows the transistor of integrated greater number above the silicon chip of same homalographic become possibility, and this realization for higher complexity processor chips provides strong guarantee.Meanwhile, advanced designing technique has further promoted the development of high-performance processor.Such as the multi-emitting technology of instruction, make microprocessor and carry out many instructions in same clock period transmitting, further improved the degree of parallelism of processor operating instruction, reduce every level production line in each path and moved the required time.The multi-emitting technology of instruction has greatly promoted the performance boost of processor.
Summary of the invention
The object of the invention is to overcome the deficiencies in the prior art, the method for the instruction multi-emitting in a kind of High Performance DSP processor is provided, improve whole dsp processor performance.
According to technical scheme provided by the invention, a kind of instruction multi-emitting method in High Performance DSP processor, comprises the following steps:
A. CPU fetching unit sends fetching address, takes out required instruction from instruction buffer; If instruction buffer lacks, instruction fetch from external memory unit, and upgrade corresponding cache lines;
B. will after the instruction alignment of getting, put into prefetch buffer, prefetch buffer is undertaken aligned according to the bits of offset of CPU fetching address by instruction, after the instruction alignment that value unit is got first, put into the low 4 of prefetch buffer, after the instruction alignment that value unit is got for the 2nd time, put into the high 4 of prefetch buffer, it is medium to be launched to corresponding streamline that last prefetch buffer is put into fetch buffer by aligned good instruction;
C. fetch buffer is a stack architecture, after the instruction that fetch buffer is low 4 is launched by transmitting register, it is low 4 that fetch buffer just starts to be depressed into fetch buffer by the instruction in high 4, prepare to start transmitting next time, and vacate high-order space and deposit from the instruction in prefetch buffer, wait to be launched;
D. the transmitting of instruction is by two transmitting registers: the 0th transmitting register and the first transmitting register are realized the multi-emitting of instruction, the 0th transmitting register is got the instruction in fetch buffer low level all the time, the first transmitting register is got ensuing instruction, and according to the type of got instruction and size, respectively instruction is sent in various flows waterline and gone.
The institutional framework of described instruction buffer adopts two-way set associative structure, and row size is 256, can support to get many instructions; For not cacheable address, adding specially size is that 256 line buffers are as single file cache lines.
After instruction being got from instruction buffer in prefetch buffer, carry out alignedly according to the bits of offset of CPU fetching address, and it is medium to be launched to corresponding streamline that aligned good instruction is put into fetch buffer; Fetch buffer is a stack architecture, during each fetching, is all the instruction of getting in fetch buffer low level; After instruction in fetch buffer low level sends, instruction high-order in fetch buffer just starts to be depressed into low level, and vacates high-order space and deposit from the instruction in instruction buffer.
The 0th transmitting register cooperatively interacts with the first transmitting register, responsible from fetch buffer fetching, and according to the size of instruction and type, by order structure in different streamlines; The 0th transmitting register root is according to the type of fetched instruction, by instruction issue in respective streams waterline, the first transmitting register root is according to the size of the 0th transmitting register fetched instruction, get ensuing instruction in fetch buffer, and the type of decision instruction, if identical with the 0th transmitting register fetched instruction type, give the 0th transmitting register etc. to be launched, avoid pipeline blocking, if different from the 0th transmitting register fetched instruction type, directly send in another streamline; Once after instruction sends in transmitting register, insert immediately the instruction in fetch buffer.
Advantage of the present invention is: 1. an external memory element address be divided into can Cache part with can not Cache part, for can not Cache this sector address space of part, add Line Buffer as single file Cache, greatly improved the fetching efficiency of this part address.Specialized designs two transmitting registers, realize instruction type and big or small judgement, particularly first launches register, according to the type and size of the 0th transmitting register fetched instruction, determine the next size of fetched instruction, which and to, according to the type of fetched instruction, judge and send to bar streamline.With two transmitting registers, send instruction, greatly improved mandatory emission efficiency, improved the performance of dsp processor.
Accompanying drawing explanation
Fig. 1 is the organization chart of instruction buffer (Cache).
Fig. 2 is line buffer (Line Buffer) structural drawing.
Fig. 3 prefetch buffer (prefetch buffer) deposits the schematic diagram of (fetch buffer) in fetch buffer in after alignment operation.
The multi-emitting schematic diagram of Fig. 4 instruction.
Embodiment
Below in conjunction with drawings and Examples, the invention will be further described.
Instruction multi-emitting method in High Performance DSP processor of the present invention, step is as follows:
A. CPU fetching unit sends fetching address, takes out required instruction from instruction buffer; If instruction buffer lacks, instruction fetch from external memory unit, and upgrade corresponding cache lines; 64 big or small instructions are got in CPU fetching unit at every turn;
B. will after the instruction alignment of getting, put into prefetch buffer, the total size of prefetch buffer is 128,8 16 prefetch buffers, consists of: prefetch buffer 0, and prefetch buffer 1 ... prefetch buffer 7.Prefetch buffer carries out instruction according to the bits of offset of CPU fetching address aligned, puts into prefetch buffer low 4 after the instruction alignment that value unit is got first: prefetch buffer 0, prefetch buffer 1, prefetch buffer 2, prefetch buffer 3.After the instruction alignment that value unit is got for the 2nd time, put into prefetch buffer high 4: prefetch buffer 4, prefetch buffer 5, prefetch buffer 6, prefetch buffer 7.Last prefetch buffer is delivered to instruction in fetch buffer, waits for and being transmitted in corresponding streamline;
C. fetch buffer is that a size is the stack architecture of 128,8 16 fetch buffers, consists of: fetch buffer 0, and fetch buffer 1 ... fetch buffer 7.Wherein fetch buffer is low 4: fetch buffer 0, and fetch buffer 1, fetch buffer 2, after first the instruction in fetch buffer 3 launches by transmitting register.Fetch buffer is high 4: fetch buffer 4, fetch buffer 5, fetch buffer 6, instruction in fetch buffer 7 starts to be depressed into fetch buffer 0,1,2,3, prepare to start transmitting, and vacating space deposits from the instruction in prefetch buffer next time, wait to be launched;
D. the transmitting of instruction is by two transmitting registers: the 0th transmitting register and the first transmitting register are realized the multi-emitting of instruction, the 0th transmitting register is got the instruction in fetch buffer low level all the time, the first transmitting register is got ensuing instruction, and according to the type of got instruction and size, respectively instruction is sent in various flows waterline and gone.
The institutional framework of the Instruction Cache of dsp processor, as row size, set association, can support the fetching of many instructions.The institutional framework of described Instruction Cache adopts two-way set associative structure, and row size is 256, can support to get many instructions; For address that can not Cache, adding specially size is that 256 line buffers (Line Buffer) are as single file cache lines (Cache Line).It is 256 Line Buffer that the present invention has added size between Instruction Cache and CPU fetching unit.For address that can Cache, Line Buffer is just as the impact damper of an instruction stream; For address that can not Cache, Line Buffer is as single file Cache Line.The recursion instruction and the subprogram segment that in most application programs, all contain quite a few, CPU can frequently access the program of the address of this part, and to the routine access outside this part address realm seldom.Add Line Buffer for can not Cache address portion, can greatly improve the efficiency of getting the instruction in this part address, effectively reduced the number of communications between CPU and sheet external memory unit.
The looking ahead and align of instruction: the instruction of taking out from Instruction Cache is put into after prefetch buffer (prefetch buffer), aligns according to CPU fetching address.In Instruction Cache, every delaying one-row capable (Cache Line) memory bank is comprised of four 32 big or small storage unit pmem0, pmem1, pmem2, pmem3.The instruction that fetching unit is got from Instruction Cache is at every turn followed successively by the instruction pmem3, pmem2, pmem1, pmem0 from high to low, instruction is put into after prefetch buffer, according to the bits of offset of CPU fetching address, carry out again further alignedly, finally aligned good instruction is put into fetch buffer (fetch buffer) medium to be launched to corresponding streamline.
Dsp processor always has 4 streamlines: IP streamline, LS streamline, MAC streamline and SIMD streamline.In the situation that not there is not data hazard, each clock period can be moved 4 instructions.Two transmitting registers are passed through in the transmitting of instruction: the 0th transmitting register issue_stream_inst0 and the first transmitting register issue_stream_inst1.The 0th transmitting register issue_stream_inst0 gets zero-bit fetch buffer fetch buffer0(all the time if 16 bit instructions) or zero-bit fetch buffer fetch buffer0 and first fetch buffer fetch buffer1 in instruction (if 32 bit instructions).The instruction that the instruction that the first transmitting register issue_stream_inst1 gets might not be got with the 0th transmitting register issue_stream_inst0 is dissimilar, but will determine according to situations such as the type of the 0th transmitting register issue_stream_inst0 fetched instruction and sizes, and got command assignment is gone to each streamline.Fetch buffer is a stack architecture, is all the instruction of getting in low 4 of fetch buffer during each fetching; After instruction in fetch buffer sends, in fetch buffer, the instruction of high 4 just starts to be depressed into low level, and vacates high-order space and deposit from the instruction in Instruction Cache.
As shown in Figure 1, Instruction Cache adopts two-way set associative structure, and fetching address is divided into tag bits tag, index bit index and bits of offset byte, is mainly used in the quick location of instruction and the judgement of hit situation.Instruction Cache is divided into label (tag) part and data (data) part, and whether in every delaying one-row is capable, designed the data that significance bit valid bit is used to refer in current cache row effective.Tag bits comparison match and significance bit in tag bits and instruction Cache in fetching address are effective, and Instruction Cache hits.When tag bits more do not mate or present instruction Cache in data invalid, Instruction Cache lacks.
When Instruction Cache hits, the output of Instruction Cache comes from hits Na mono-tunnel, and every delaying one-row capable (Cache Line) memory bank is comprised of four 32 big or small storage unit pmem0, pmem1, pmem2, pmem3.High 16 the 1st tunnels that form Instruction Cache of these four storage unit, low 16 the 0th tunnels that form Instruction Cache.Location executed in parallel within the same clock period of 64 bit instructions in the comparison of the label of two-way (Tag) and Cache memory bank, once label completes comparison, Instruction Cache just sends the instruction of hitting in Na mono-tunnel.For example the result of hit logic unit label comparison is that hit on 1 tunnel, 64 bit instructions that CPU fetching unit is got are so: high 16 bit instructions in pmem3 storage unit, high 16 bit instructions in pmem2 storage unit, high 16 bit instructions in pmem1 storage unit, high 16 bit instructions in pmem0 storage unit.
Instruction Cache to the instruction of first 16 in Instruction Cache memory bank, then as plot, is got ensuing three 16 bit instructions according to fetching address location.This design also exists a kind of special circumstances: when is expert in end in the request of CPU fetching, at this moment the off-set value of 4 of Instruction Cache memory banks has also navigated to last column, CPU can not once get the instruction of 64 in this case, but needs at twice, to carry out inter-bank fetching.
As shown in Figure 2, line buffer size is decided to be 256, identical with the size of cache lines, can deposit the instruction of 8 32 in the design of line buffer (Line Buffer) memory bank, and supports crucial double word precedence technique.During for the ease of CPU fetching, can navigate to fast required instruction, line buffer is divided into the memory bank of 4 64, each memory bank again Yi16Wei Wei unit is divided into 4 row, gets respectively 1 row in 4 memory banks during the each fetching of CPU.As shown in Figure 2, its 4 memory banks organize structure according to the division of encoding of 4 to 1 of CPU fetching addresses, and wherein the 3rd, 4 are used for determine selecting which as line displacement, and the 1st, 2 are used for determining as line skew and select which row.
After instruction is taken out from Instruction Cache, next with regard to operation will carry out alignment operation according to the bits of offset of CPU fetching address.As shown in Figure 3, in Cache memory bank in 4 storage unit the highest 16 of instruction be instruction in pmem3 storage unit, minimum 16 is instruction in pmem0 storage unit.Because always have 4 storage unit, so just can realize its alignment as long as 2 bits of offset are set in CPU fetching address.We carry out the alignment of instruction as follows, when the bits of offset of fetching address is " 00 ", and prefetch buffer3=pmem3, prefetch buffer2=pmem2, prefetch buffer1=pmem1, prefetch buffer0=pmem0; When the bits of offset of fetching address is " 01 ", prefetch buffer3=pmem0, prefetch buffer2=pmem3, prefetch buffer1=pmem2, prefetch buffer0=pmem1; When the bits of offset of fetching address is " 10 ", prefetch buffer3=pmem1, prefetch buffer2=pmem0, prefetch buffer1=pmem3, prefetch buffer0=pmem2; When the bits of offset of fetching address is " 11 ", prefetch buffer3=pmem2, prefetch buffer2=pmem1, prefetch buffer1=pmem0, prefetch buffer0=pmem3.Equal sign represent left side prefetch buffer get the right corresponding stored unit instruction.Identical with low 4 bit positions of aforementioned prefetch buffer, high 4 bit positions of prefetch buffer: prefetch buffer7, prefetch buffer6, prefetch buffer5, prefetch buffer4 also gets next 64 bit instructions by alignment thereof of the same race.
After instruction alignment completes, again the instruction after alignment is put into fetch buffer (fetch buffer), during instruction issue, according to order from low to high, launch, first get fetch buffer low 4: fetch buffer0, fetch buffer1, fetch buffer2, the instruction in fetch buffer3.Fetch buffer is actual is a stack architecture, and after instruction is taken out, fetch buffer is high 4: fetch buffer4, and fetch buffer5, the instruction in fetch buffer6 and fetch buffer7 starts to press down, and prepares to start transmitting next time.And vacate high-order space and deposit from the instruction in prefetch buffer, wait to be launched.
The 0th transmitting register issue_stream_inst0 and the first transmitting register issue_stream_inst1 are two transmitting registers of instruction launching phase, are responsible for fetching from fetch buffer, as shown in Figure 4.The 0th transmitting register issue_stream_inst0 gets zero-bit fetch buffer fetch buffer0(all the time if 16 bit instructions) or zero-bit fetch buffer fetch buffer0 and first fetch buffer fetch buffer1 in instruction (if 32 bit instructions).The instruction that the first transmitting register issue_stream_inst1 gets might not be exactly the instruction of launching register issue_stream_inst0 same kind with the 0th, but to determine according to situations such as the type of issue_stream_inst0 fetched instruction and sizes total total following 3 kinds of situations:
1. integer arithmetic class (IP) instruction that is 16-bit when article one instruction, issue_stream_inst1 gets the instruction in fetch buffer1 and fetch buffer2 so.
2. the IP instruction that is 32-bit when previous instruction, issue_stream_inst1 gets the instruction in fetch buffer2 and fetch buffer3 so.
3. when previous instruction is LS instruction, that issue_stream_inst1 gets the instruction in fetch buffer0 and fetch buffer1.
If the instruction that the first transmitting register issue_stream_inst1 gets is the instruction of access (LS) class, so just directly send to LS streamline, if the instruction that the first transmitting register issue_stream_inst1 gets is the instruction of single instruction multiple data (SIMD) class, so just directly send to SIMD streamline, if the instruction that the first transmitting register issue_stream_inst1 gets is the instruction of multiply accumulating (MAC) class, so just directly send to MAC streamline.Instruction and present got instruction that more common situation is launched before being are a same type, i.e. these two instructions will be assigned in same streamline.Because the instruction of same kind only has one group of performance element, for fear of pipeline blocking, it is to be launched that the first transmitting register issue_stream_inst1 will give got instruction the 0th transmitting register issue_stream_inst0 etc., and first launches register issue_stream_inst1 and get ensuing instruction issue in streamline simultaneously.

Claims (4)

1. the instruction multi-emitting method in High Performance DSP processor, is characterized in that:
A. CPU fetching unit sends fetching address, takes out required instruction from instruction buffer; If instruction buffer lacks, instruction fetch from external memory unit, and upgrade corresponding cache lines;
B. the instruction of getting is put into prefetch buffer, prefetch buffer is undertaken aligned according to the bits of offset of CPU fetching address by instruction, after the instruction alignment that value unit is got first, put into the low 4 of prefetch buffer, after the instruction alignment that value unit is got for the 2nd time, put into the high 4 of prefetch buffer, it is medium to be launched to corresponding streamline that last prefetch buffer is put into fetch buffer by aligned good instruction;
C. fetch buffer is a stack architecture, after the instruction that fetch buffer is low 4 is launched by transmitting register, it is low 4 that fetch buffer just starts to be depressed into fetch buffer by the instruction in high 4, prepare to start transmitting next time, and vacate high-order space and deposit from the instruction in prefetch buffer, wait to be launched;
D. the transmitting of instruction is by two transmitting registers: the 0th transmitting register and the first transmitting register are realized the multi-emitting of instruction, the 0th transmitting register is got the instruction in fetch buffer low level all the time, the first transmitting register is got ensuing instruction, and according to the type of got instruction and size, respectively instruction is sent in various flows waterline and gone.
2. the instruction multi-emitting method in High Performance DSP processor as claimed in claim 1, is characterized in that, the institutional framework of described instruction buffer adopts two-way set associative structure, and row size is 256, can support to get many instructions; For not cacheable address, adding specially size is that 256 line buffers are as single file cache lines.
3. the instruction multi-emitting method in High Performance DSP processor as claimed in claim 1, it is characterized in that, after instruction being got from instruction buffer in prefetch buffer, according to the bits of offset of CPU fetching address, carry out alignedly, and it is medium to be launched to corresponding streamline that aligned good instruction is put into fetch buffer; Fetch buffer is a stack architecture, during each fetching, is all the instruction of getting in fetch buffer low level; After instruction in fetch buffer low level sends, instruction high-order in fetch buffer just starts to be depressed into low level, and vacates high-order space and deposit from the instruction in instruction buffer.
4. the instruction multi-emitting method in High Performance DSP processor as claimed in claim 1, it is characterized in that, the 0th transmitting register and the first transmitting register cooperatively interact, and are responsible for fetching from fetch buffer, and according to the size of instruction and type, by order structure in different streamlines; The 0th transmitting register root is according to the type of fetched instruction, by instruction issue in respective streams waterline, the first transmitting register root is according to the size of the 0th transmitting register fetched instruction, get ensuing instruction in fetch buffer, and the type of decision instruction, if identical with the 0th transmitting register fetched instruction type, give the 0th transmitting register etc. to be launched, avoid pipeline blocking, if different from the 0th transmitting register fetched instruction type, directly send in another streamline; Once after instruction sends in transmitting register, insert immediately the instruction in fetch buffer.
CN201210222667.7A 2012-06-29 2012-06-29 Multi-emission method of instructions in high-performance DSP (digital signal processor) Active CN102779026B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210222667.7A CN102779026B (en) 2012-06-29 2012-06-29 Multi-emission method of instructions in high-performance DSP (digital signal processor)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210222667.7A CN102779026B (en) 2012-06-29 2012-06-29 Multi-emission method of instructions in high-performance DSP (digital signal processor)

Publications (2)

Publication Number Publication Date
CN102779026A CN102779026A (en) 2012-11-14
CN102779026B true CN102779026B (en) 2014-08-27

Family

ID=47123948

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210222667.7A Active CN102779026B (en) 2012-06-29 2012-06-29 Multi-emission method of instructions in high-performance DSP (digital signal processor)

Country Status (1)

Country Link
CN (1) CN102779026B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105988774A (en) * 2015-02-20 2016-10-05 上海芯豪微电子有限公司 Multi-issue processor system and method
CN105242904B (en) * 2015-09-21 2018-05-18 中国科学院自动化研究所 For processor instruction buffering and the device and its operating method of circular buffering
CN105094752B (en) * 2015-09-21 2018-09-11 中国科学院自动化研究所 Instruction buffer be aligned buffer unit and its operating method
CN111694767B (en) * 2019-05-16 2021-03-19 时擎智能科技(上海)有限公司 Accumulation buffer memory device

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6157988A (en) * 1997-08-01 2000-12-05 Micron Technology, Inc. Method and apparatus for high performance branching in pipelined microsystems

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6157988A (en) * 1997-08-01 2000-12-05 Micron Technology, Inc. Method and apparatus for high performance branching in pipelined microsystems

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
王晓勇,张盛兵,黄嵩人.一种多发射DSP的数据相关控制.《微型电脑应用》.2011,第27卷(第11期), *

Also Published As

Publication number Publication date
CN102779026A (en) 2012-11-14

Similar Documents

Publication Publication Date Title
US8904153B2 (en) Vector loads with multiple vector elements from a same cache line in a scattered load operation
US9104532B2 (en) Sequential location accesses in an active memory device
TWI742132B (en) Processors, methods, systems, and instructions to load multiple data elements to destination storage locations other than packed data registers
US11989555B2 (en) Instructions for remote atomic operations
US10275247B2 (en) Apparatuses and methods to accelerate vector multiplication of vector elements having matching indices
CN102750133B (en) 32-Bit triple-emission digital signal processor supporting SIMD
US20120060016A1 (en) Vector Loads from Scattered Memory Locations
WO2019152069A1 (en) Instruction architecture for a vector computational unit
US7568070B2 (en) Instruction cache having fixed number of variable length instructions
CN1890631A (en) Transitioning from instruction cache to trace cache on label boundaries
CN102779026B (en) Multi-emission method of instructions in high-performance DSP (digital signal processor)
CN102662634A (en) Memory access and execution device for non-blocking transmission and execution
US20200225956A1 (en) Operation cache
KR20190082079A (en) Spatial and temporal merging of remote atomic operations
CN107851017B (en) Apparatus and method for transmitting multiple data structures
CN112579175B (en) Branch prediction method, branch prediction device and processor core
CN111538679A (en) Processor data prefetching design based on embedded DMA
US20220206793A1 (en) Methods, systems, and apparatuses for a scalable reservation station implementing a single unified speculation state propagation and execution wakeup matrix circuit in a processor
US11915000B2 (en) Apparatuses, methods, and systems to precisely monitor memory store accesses
GB2515148A (en) Converting conditional short forward branches to computationally equivalent predicated instructions
US20080091924A1 (en) Vector processor and system for vector processing
CN106445472B (en) A kind of character manipulation accelerated method, device, chip, processor
US20180089141A1 (en) Data processing device
US6823430B2 (en) Directoryless L0 cache for stall reduction
CN103336681A (en) Instruction fetching method for pipeline organization processor using lengthened instruction sets

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant