CN114816769A - Vector processor processing method and system - Google Patents

Vector processor processing method and system Download PDF

Info

Publication number
CN114816769A
CN114816769A CN202210643486.5A CN202210643486A CN114816769A CN 114816769 A CN114816769 A CN 114816769A CN 202210643486 A CN202210643486 A CN 202210643486A CN 114816769 A CN114816769 A CN 114816769A
Authority
CN
China
Prior art keywords
vector
cpu
register
instruction
unit
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210643486.5A
Other languages
Chinese (zh)
Inventor
尚德龙
周玉梅
张磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhongke Nanjing Intelligent Technology Research Institute
Original Assignee
Zhongke Nanjing Intelligent Technology Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhongke Nanjing Intelligent Technology Research Institute filed Critical Zhongke Nanjing Intelligent Technology Research Institute
Priority to CN202210643486.5A priority Critical patent/CN114816769A/en
Publication of CN114816769A publication Critical patent/CN114816769A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Advance Control (AREA)

Abstract

The invention relates to a vector processor processing method and a vector processor processing system. In the method, the vector processor comprises: a CPU, a vector coprocessor and a random access memory; acquiring address data in a register in a CPU by using the vector coprocessor; and sending address data in a register in the CPU to a random access memory by using the vector coprocessor. The invention reduces the power consumption and area without affecting the programmability.

Description

Vector processor processing method and system
Technical Field
The present invention relates to the field of programming, and in particular, to a vector processor processing method and system.
Background
Vector processors commonly employ a Register File (Register File) scheme. Because the register file has a large capacity and is usually implemented by using an SRAM, random access to the registers will cause conflicts, resulting in a large performance penalty. In many vector processors and GPUs, Register File Cache and Operand Buffer are used to reduce conflicts, which results in some loss in power consumption and area. The size of a single register of the vector register file is fixed, a large amount of hardware logic is needed for realizing the operation of various lengths, and the programming difficulty of the vector processor is increased at the same time.
Disclosure of Invention
The invention aims to provide a vector processor processing method and a vector processor processing system, which reduce power consumption and area on the premise of not influencing programmability.
In order to achieve the purpose, the invention provides the following scheme:
a vector processor processing method, the vector processor comprising: a CPU, a Vector Processor and a Random Access Memory (RAM); the processing method comprises the following steps:
acquiring address data in a register in a CPU by using the vector coprocessor;
and sending address data in a register in the CPU to a random access memory by using the vector coprocessor.
Optionally, the obtaining, by the vector coprocessor, address data in a register in the CPU and sending the address data to the random access memory further includes:
launching a data instruction into an instruction queue in the vector coprocessor.
Optionally, the obtaining, by the vector coprocessor, address data in a register in the CPU specifically includes:
the instruction queue decodes according to the content of the data instruction;
sending the register ID in the decoded content to a CPU;
the CPU determines the value of the register according to the register ID and sends the value of the register to the vector processor.
Optionally, the sending, by the vector coprocessor, address data in a register in the CPU to a random access memory specifically includes:
determining a microinstruction according to the numerical value of the register and the decoded content;
driving an access unit MAU and an arithmetic unit ALU according to the micro instruction;
the arithmetic unit selects a vector arithmetic mode and prepares to receive address data sent by the access unit;
the memory access unit reads address data from the cache unit and the random access memory and sends the address data to the arithmetic unit; and allocating the address of the operation unit for writing back to the cache unit;
the arithmetic unit carries out operation according to the received data and sends the result to the cache unit;
after the arithmetic unit writes back the cache unit, the arithmetic unit sends out the information of instruction completion to the access unit.
Optionally, after the operation unit writes back the cache unit, it sends an end message to the memory access unit, and then further includes:
the memory access unit sends the information of the completion of the instruction to the CPU, and the instruction execution is finished;
the content in the cache unit is automatically written back to the random access memory.
Optionally, the cache unit is allocated for writing, and the priority of a read operation of the same instruction is higher than that of a write operation, and if the read operation is not written back, the read operation is not writable or readable.
A vector processor processing system, applied to the vector processor processing method, the vector processor comprising: a CPU, a vector coprocessor and a random access memory; the processing system comprises:
the data acquisition module is used for acquiring address data in a register in the CPU by using the vector coprocessor;
and the data sending module is used for sending the address data in the register in the CPU to a random access memory by using the vector coprocessor.
Optionally, the method further comprises:
and the instruction sending module is used for transmitting the data instruction to an instruction queue in the vector coprocessor.
According to the specific embodiment provided by the invention, the invention discloses the following technical effects:
the invention provides a vector processor processing method and a system, wherein the vector processor comprises the following steps: a CPU, a vector coprocessor and a random access memory; acquiring address data in a register in a CPU by using the vector coprocessor; and sending address data in a register in the CPU to a random access memory by using the vector coprocessor. The vector register file is removed, a general CPU serves as a control unit, and the vector processor reuses the registers in the CPU, so that the control complexity is reduced. The number of instructions is reduced compared to other vector offload processors when performing the same function. Therefore, power consumption and area are reduced without affecting programmability.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without inventive exercise.
FIG. 1 is a flow chart illustrating a processing method of a vector processor according to the present invention;
FIG. 2 is a schematic diagram of a vector processor architecture;
FIG. 3 is a schematic overall flow chart of a vector processor processing method according to the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The invention aims to provide a vector processor processing method and system, which reduce power consumption and area on the premise of not influencing programmability.
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.
As shown in fig. 2, the vector processor includes: a CPU, a vector coprocessor and a random access memory; the CPU transmits Vector instructions to a Vector Processor (Vector coprocessor) which interacts with RAM (which may be DRAM or SRAM) via a dedicated, independent bus. Fig. 1 is a schematic flow chart of a processing method of a vector processor according to the present invention, and as shown in fig. 1, the processing method of the vector processor according to the present invention includes:
s101, acquiring address data in a register in a CPU by using the vector coprocessor;
before S101, the method further includes:
launching a data instruction into an instruction queue in the vector coprocessor.
S101 specifically comprises the following steps:
the instruction queue decodes according to the content of the data instruction;
sending the register ID in the decoded content to a CPU;
the CPU determines the value of the register according to the register ID and sends the address data of the register to the vector processor.
S102, the vector processor is used for sending address data of the register in the CPU to a random access memory.
S102 specifically comprises the following steps:
determining a microinstruction according to the address data of the register and the decoded content;
driving the access unit and the arithmetic unit according to the micro instruction;
the arithmetic unit selects a vector arithmetic mode and prepares to receive address data sent by the access unit;
the memory access unit reads address data from a Cache unit (Cache) and a random access memory and sends the address data to the arithmetic unit; and allocating addresses written back to the cache by the arithmetic unit;
the arithmetic unit carries out operation according to the received data and sends the result to the cache unit;
after the arithmetic unit writes back the cache unit, the arithmetic unit sends out the information of instruction completion to the access unit.
After the arithmetic unit writes back the cache unit, the arithmetic unit sends out end information to the memory access unit, and then the arithmetic unit further comprises:
the memory access unit sends the information of the completion of the instruction to the CPU, and the instruction execution is finished;
the content in the cache unit is automatically written back to the random access memory.
In order to solve the problem of data contention of write after Read (RAW), Write After Write (WAW) and read after Write (WAR), the cache unit is allocated for writing, the priority of the read operation of the same instruction is higher than that of the write operation, and if the write operation is not written back, the write operation cannot be written or cannot be read. The ALU operation result can only be written back to the cache unit, which initiates a Clean operation write back to RAM.
As shown in fig. 3, the broken line in the middle is a control line, and the solid line is a data path. The CPU may be any architecture central processing unit, and INST Queue is an instruction Queue for a vector processor. Decoder is a vector processor coding unit. The Reg2Addr sends the decoded register id to a CPU to obtain the numerical value of the corresponding register, and the numerical value and the content obtained by decoding jointly form a micro instruction (Uop) and send the micro instruction (Uop Queue). Uop Queue transmits to memory access unit MAU and arithmetic unit ALU.
Take a vector multiply instruction as an example:
transmitting the instruction into an instruction Queue (INST Queue);
the instruction Queue decodes (Decode) according to the instruction content, knows that two read operands and one write operand are needed, Reg2Addr sends the corresponding Reg ID to CPU, and sends the rest information to the micro instruction Queue (Uop Queue);
the UOP Queue acquires the instruction operation content in the previous step and the two read-write specific address information returned by the CPU and then respectively drives the MAU and the ALU;
the ALU selects a vector multiplication operation mode, prepares to receive address data sent by the MAU, performs operation after receiving the address data and sends a result to the Cache;
the MAU reads address data from the Cache or the RAM according to the address information of the two read requests and sends the address data to the ALU; and assigns the address of the ALU write back Cache.
Sending end information to the MAU after the ALU is written back to the Cache, informing the CPU of the completion information of the instruction by the MAU, and ending the instruction execution;
and after the instruction execution is finished, the content in the Cache is automatically written back to the RAM.
A vector processor processing system, applied to the vector processor processing method, the vector processor comprising: a CPU, a vector coprocessor and a random access memory; the processing system comprises:
the data acquisition module is used for acquiring address data in a register in the CPU by using the vector coprocessor;
and the data sending module is used for sending the address data in the register in the CPU to a random access memory by using the vector coprocessor.
The vector coprocessor processing system provided by the invention further comprises:
an instruction issue module to issue a data instruction into an instruction queue in the vector processor.
The invention adopts a simplified instruction set, only supports an operation instruction, has selectable data carrying instructions and does not need jump and branch instruction support. The instruction set supports only Register-Register Operations (Register-Register Operations).
The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. For the system disclosed by the embodiment, the description is relatively simple because the system corresponds to the method disclosed by the embodiment, and the relevant points can be referred to the method part for description.
The principle and the embodiment of the present invention are explained by applying specific examples, and the above description of the embodiments is only used to help understanding the method and the core idea of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, the specific embodiments and the application range may be changed. In view of the above, the present disclosure should not be construed as limiting the invention.

Claims (8)

1. A vector processor processing method, wherein the vector processor comprises: a CPU, a vector coprocessor and a random access memory; the processing method comprises the following steps:
acquiring address data in a register in a CPU by using the vector coprocessor;
and sending address data in a register in the CPU to a random access memory by using the vector coprocessor.
2. The vector processor processing method according to claim 1, wherein said obtaining address data in a register in a CPU by the vector coprocessor further comprises:
launching a data instruction into an instruction queue in the vector coprocessor.
3. The method as claimed in claim 2, wherein said obtaining address data in a register of a CPU by the vector coprocessor comprises:
the instruction queue decodes according to the content of the data instruction;
sending the register ID in the decoded content to a CPU;
the CPU determines the value of the register according to the register ID and sends the value of the register to the vector processor.
4. The method according to claim 3, wherein the sending, by the vector coprocessor, address data in a register in the CPU to a random access memory, specifically comprises:
determining a microinstruction according to the numerical value of the register and the decoded content;
driving the access unit and the arithmetic unit according to the micro instruction;
the arithmetic unit selects a vector arithmetic mode and prepares to receive address data sent by the access unit;
the memory access unit reads address data from the cache unit and the random access memory and sends the address data to the arithmetic unit; and allocating the address of the operation unit for writing back to the cache unit;
the arithmetic unit carries out operation according to the received data and sends the result to the cache unit;
after the arithmetic unit writes back the cache unit, the arithmetic unit sends out the information of instruction completion to the access unit.
5. The vector processor processing method of claim 4, wherein after said arithmetic unit writes back to the cache unit, it sends an end message to the memory access unit, and then further comprising:
the memory access unit sends the information of the completion of the instruction to the CPU, and the instruction execution is finished;
the content in the cache unit is automatically written back to the random access memory.
6. The vector processor processing method of claim 5, wherein the cache unit is allocated for writes and a read operation of the same instruction has a higher priority than a write operation, and is not writable or readable if not written back.
7. A vector processor processing system applied to the vector processor processing method of any one of claims 1 to 6, wherein the vector processor comprises: a CPU, a vector coprocessor and a random access memory; the processing system comprises:
the data acquisition module is used for acquiring address data in a register in the CPU by using the vector coprocessor;
and the data sending module is used for sending the address data in the register in the CPU to a random access memory by using the vector coprocessor.
8. The vector processor processing system of claim 7, further comprising:
and the instruction sending module is used for transmitting a data instruction to an instruction queue in the vector coprocessor.
CN202210643486.5A 2022-06-08 2022-06-08 Vector processor processing method and system Pending CN114816769A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210643486.5A CN114816769A (en) 2022-06-08 2022-06-08 Vector processor processing method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210643486.5A CN114816769A (en) 2022-06-08 2022-06-08 Vector processor processing method and system

Publications (1)

Publication Number Publication Date
CN114816769A true CN114816769A (en) 2022-07-29

Family

ID=82521247

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210643486.5A Pending CN114816769A (en) 2022-06-08 2022-06-08 Vector processor processing method and system

Country Status (1)

Country Link
CN (1) CN114816769A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116501388A (en) * 2023-06-26 2023-07-28 深流微智能科技(深圳)有限公司 Instruction processing method, device, storage medium and equipment

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116501388A (en) * 2023-06-26 2023-07-28 深流微智能科技(深圳)有限公司 Instruction processing method, device, storage medium and equipment
CN116501388B (en) * 2023-06-26 2023-10-03 深流微智能科技(深圳)有限公司 Instruction processing method, device, storage medium and equipment

Similar Documents

Publication Publication Date Title
US6000029A (en) Method and apparatus for affecting subsequent instruction processing in a data processor
US20080091920A1 (en) Transferring data between registers in a RISC microprocessor architecture
JPH0786870B2 (en) Data transfer control method of coprocessor and circuit thereof
US20240045593A1 (en) Apparatus and method for accessing data, processing apparatus and computer system
US20230267079A1 (en) Processing apparatus, method and system for executing data processing on a plurality of channels
US20230267000A1 (en) Processing apparatus and system for executing data processing on a plurality of pieces of channel information
JP2773471B2 (en) Information processing device
CN114816769A (en) Vector processor processing method and system
KR20030029030A (en) Memory control method, memory control circuit using the control method, and integrated circuit device with the memory control circuit
CN112559403B (en) Processor and interrupt controller therein
US5455918A (en) Data transfer accelerating apparatus and method
US20210089305A1 (en) Instruction executing method and apparatus
US6349370B1 (en) Multiple bus shared memory parallel processor and processing method
JPS63197232A (en) Microprocessor
EP4293502A1 (en) Processing unit, computing device and instruction processing method
US11989582B2 (en) Apparatus and method for low-latency decompression acceleration via a single job descriptor
JP2002182975A (en) Multi-processor system
US20230401062A1 (en) INSTRUCTION RETIREMENT UNIT, INSTRUCTION EXECUTION UNIT, PROCESSING UNIT, CoMPUTING DEVICE, AND INSTRUCTION PROCESSING METHOD
EP1384145B1 (en) Expanded functionality of processor operations within a fixed width instruction encoding
CN116560726A (en) Method and device for reading and writing register file, electronic equipment and storage medium
CN117714711A (en) Decoding method, decoding system, electronic device and readable storage medium
JPH0241770B2 (en)
JP2513806B2 (en) Expansion board
CN115269011A (en) Instruction execution unit, processing unit and related device and method
JP2003029966A (en) Data processor

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination