CN114816769A

CN114816769A - Vector processor processing method and system

Info

Publication number: CN114816769A
Application number: CN202210643486.5A
Authority: CN
Inventors: 尚德龙; 周玉梅; 张磊
Original assignee: Zhongke Nanjing Intelligent Technology Research Institute
Current assignee: Zhongke Nanjing Intelligent Technology Research Institute
Priority date: 2022-06-08
Filing date: 2022-06-08
Publication date: 2022-07-29

Abstract

The invention relates to a vector processor processing method and a vector processor processing system. In the method, the vector processor comprises: a CPU, a vector coprocessor and a random access memory; acquiring address data in a register in a CPU by using the vector coprocessor; and sending address data in a register in the CPU to a random access memory by using the vector coprocessor. The invention reduces the power consumption and area without affecting the programmability.

Description

Vector processor processing method and system

Technical Field

The present invention relates to the field of programming, and in particular, to a vector processor processing method and system.

Background

Vector processors commonly employ a Register File (Register File) scheme. Because the register file has a large capacity and is usually implemented by using an SRAM, random access to the registers will cause conflicts, resulting in a large performance penalty. In many vector processors and GPUs, Register File Cache and Operand Buffer are used to reduce conflicts, which results in some loss in power consumption and area. The size of a single register of the vector register file is fixed, a large amount of hardware logic is needed for realizing the operation of various lengths, and the programming difficulty of the vector processor is increased at the same time.

Disclosure of Invention

The invention aims to provide a vector processor processing method and a vector processor processing system, which reduce power consumption and area on the premise of not influencing programmability.

In order to achieve the purpose, the invention provides the following scheme:

a vector processor processing method, the vector processor comprising: a CPU, a Vector Processor and a Random Access Memory (RAM); the processing method comprises the following steps:

acquiring address data in a register in a CPU by using the vector coprocessor;

and sending address data in a register in the CPU to a random access memory by using the vector coprocessor.

Optionally, the obtaining, by the vector coprocessor, address data in a register in the CPU and sending the address data to the random access memory further includes:

launching a data instruction into an instruction queue in the vector coprocessor.

Optionally, the obtaining, by the vector coprocessor, address data in a register in the CPU specifically includes:

the instruction queue decodes according to the content of the data instruction;

sending the register ID in the decoded content to a CPU;

the CPU determines the value of the register according to the register ID and sends the value of the register to the vector processor.

Optionally, the sending, by the vector coprocessor, address data in a register in the CPU to a random access memory specifically includes:

determining a microinstruction according to the numerical value of the register and the decoded content;

driving an access unit MAU and an arithmetic unit ALU according to the micro instruction;

the arithmetic unit selects a vector arithmetic mode and prepares to receive address data sent by the access unit;

the memory access unit reads address data from the cache unit and the random access memory and sends the address data to the arithmetic unit; and allocating the address of the operation unit for writing back to the cache unit;

the arithmetic unit carries out operation according to the received data and sends the result to the cache unit;

after the arithmetic unit writes back the cache unit, the arithmetic unit sends out the information of instruction completion to the access unit.

Optionally, after the operation unit writes back the cache unit, it sends an end message to the memory access unit, and then further includes:

the memory access unit sends the information of the completion of the instruction to the CPU, and the instruction execution is finished;

the content in the cache unit is automatically written back to the random access memory.

Optionally, the cache unit is allocated for writing, and the priority of a read operation of the same instruction is higher than that of a write operation, and if the read operation is not written back, the read operation is not writable or readable.

A vector processor processing system, applied to the vector processor processing method, the vector processor comprising: a CPU, a vector coprocessor and a random access memory; the processing system comprises:

the data acquisition module is used for acquiring address data in a register in the CPU by using the vector coprocessor;

and the data sending module is used for sending the address data in the register in the CPU to a random access memory by using the vector coprocessor.

Optionally, the method further comprises:

and the instruction sending module is used for transmitting the data instruction to an instruction queue in the vector coprocessor.

According to the specific embodiment provided by the invention, the invention discloses the following technical effects:

the invention provides a vector processor processing method and a system, wherein the vector processor comprises the following steps: a CPU, a vector coprocessor and a random access memory; acquiring address data in a register in a CPU by using the vector coprocessor; and sending address data in a register in the CPU to a random access memory by using the vector coprocessor. The vector register file is removed, a general CPU serves as a control unit, and the vector processor reuses the registers in the CPU, so that the control complexity is reduced. The number of instructions is reduced compared to other vector offload processors when performing the same function. Therefore, power consumption and area are reduced without affecting programmability.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without inventive exercise.

FIG. 1 is a flow chart illustrating a processing method of a vector processor according to the present invention;

FIG. 2 is a schematic diagram of a vector processor architecture;

FIG. 3 is a schematic overall flow chart of a vector processor processing method according to the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The invention aims to provide a vector processor processing method and system, which reduce power consumption and area on the premise of not influencing programmability.

In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.

As shown in fig. 2, the vector processor includes: a CPU, a vector coprocessor and a random access memory; the CPU transmits Vector instructions to a Vector Processor (Vector coprocessor) which interacts with RAM (which may be DRAM or SRAM) via a dedicated, independent bus. Fig. 1 is a schematic flow chart of a processing method of a vector processor according to the present invention, and as shown in fig. 1, the processing method of the vector processor according to the present invention includes:

s101, acquiring address data in a register in a CPU by using the vector coprocessor;

before S101, the method further includes:

S101 specifically comprises the following steps:

the instruction queue decodes according to the content of the data instruction;

sending the register ID in the decoded content to a CPU;

the CPU determines the value of the register according to the register ID and sends the address data of the register to the vector processor.

S102, the vector processor is used for sending address data of the register in the CPU to a random access memory.

S102 specifically comprises the following steps:

determining a microinstruction according to the address data of the register and the decoded content;

driving the access unit and the arithmetic unit according to the micro instruction;

the memory access unit reads address data from a Cache unit (Cache) and a random access memory and sends the address data to the arithmetic unit; and allocating addresses written back to the cache by the arithmetic unit;

After the arithmetic unit writes back the cache unit, the arithmetic unit sends out end information to the memory access unit, and then the arithmetic unit further comprises:

In order to solve the problem of data contention of write after Read (RAW), Write After Write (WAW) and read after Write (WAR), the cache unit is allocated for writing, the priority of the read operation of the same instruction is higher than that of the write operation, and if the write operation is not written back, the write operation cannot be written or cannot be read. The ALU operation result can only be written back to the cache unit, which initiates a Clean operation write back to RAM.

As shown in fig. 3, the broken line in the middle is a control line, and the solid line is a data path. The CPU may be any architecture central processing unit, and INST Queue is an instruction Queue for a vector processor. Decoder is a vector processor coding unit. The Reg2Addr sends the decoded register id to a CPU to obtain the numerical value of the corresponding register, and the numerical value and the content obtained by decoding jointly form a micro instruction (Uop) and send the micro instruction (Uop Queue). Uop Queue transmits to memory access unit MAU and arithmetic unit ALU.

Take a vector multiply instruction as an example:

transmitting the instruction into an instruction Queue (INST Queue);

the instruction Queue decodes (Decode) according to the instruction content, knows that two read operands and one write operand are needed, Reg2Addr sends the corresponding Reg ID to CPU, and sends the rest information to the micro instruction Queue (Uop Queue);

the UOP Queue acquires the instruction operation content in the previous step and the two read-write specific address information returned by the CPU and then respectively drives the MAU and the ALU;

the ALU selects a vector multiplication operation mode, prepares to receive address data sent by the MAU, performs operation after receiving the address data and sends a result to the Cache;

the MAU reads address data from the Cache or the RAM according to the address information of the two read requests and sends the address data to the ALU; and assigns the address of the ALU write back Cache.

Sending end information to the MAU after the ALU is written back to the Cache, informing the CPU of the completion information of the instruction by the MAU, and ending the instruction execution;

and after the instruction execution is finished, the content in the Cache is automatically written back to the RAM.

The vector coprocessor processing system provided by the invention further comprises:

an instruction issue module to issue a data instruction into an instruction queue in the vector processor.

The invention adopts a simplified instruction set, only supports an operation instruction, has selectable data carrying instructions and does not need jump and branch instruction support. The instruction set supports only Register-Register Operations (Register-Register Operations).

The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. For the system disclosed by the embodiment, the description is relatively simple because the system corresponds to the method disclosed by the embodiment, and the relevant points can be referred to the method part for description.

The principle and the embodiment of the present invention are explained by applying specific examples, and the above description of the embodiments is only used to help understanding the method and the core idea of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, the specific embodiments and the application range may be changed. In view of the above, the present disclosure should not be construed as limiting the invention.

Claims

1. A vector processor processing method, wherein the vector processor comprises: a CPU, a vector coprocessor and a random access memory; the processing method comprises the following steps:

acquiring address data in a register in a CPU by using the vector coprocessor;

2. The vector processor processing method according to claim 1, wherein said obtaining address data in a register in a CPU by the vector coprocessor further comprises:

3. The method as claimed in claim 2, wherein said obtaining address data in a register of a CPU by the vector coprocessor comprises:

the instruction queue decodes according to the content of the data instruction;

sending the register ID in the decoded content to a CPU;

4. The method according to claim 3, wherein the sending, by the vector coprocessor, address data in a register in the CPU to a random access memory, specifically comprises:

5. The vector processor processing method of claim 4, wherein after said arithmetic unit writes back to the cache unit, it sends an end message to the memory access unit, and then further comprising:

6. The vector processor processing method of claim 5, wherein the cache unit is allocated for writes and a read operation of the same instruction has a higher priority than a write operation, and is not writable or readable if not written back.

7. A vector processor processing system applied to the vector processor processing method of any one of claims 1 to 6, wherein the vector processor comprises: a CPU, a vector coprocessor and a random access memory; the processing system comprises:

8. The vector processor processing system of claim 7, further comprising:

and the instruction sending module is used for transmitting a data instruction to an instruction queue in the vector coprocessor.