CN117312182A

CN117312182A - Vector data dispersion method and device based on note storage and computer equipment

Info

Publication number: CN117312182A
Application number: CN202311613753.5A
Authority: CN
Inventors: 方建滨; 张鹏; 黄春; 唐滔; 彭林; 崔英博; 姜浩; 沈洁; 范小康; 于恒彪; 苏醒; 易昕
Original assignee: National University of Defense Technology
Current assignee: National University of Defense Technology
Priority date: 2023-11-29
Filing date: 2023-11-29
Publication date: 2023-12-29
Anticipated expiration: 2043-11-29
Also published as: CN117312182B

Abstract

The application relates to a vector data dispersion method and device based on note type storage and computer equipment. The method comprises the following steps: the vector register carries target data to the memo memory by executing the write-in instruction through the execution unit, and a first base address is obtained. And reading the index of the target data in the off-chip memory, and writing the index into the note memory to obtain a first index address. And calculating the address of the target data in the off-chip memory to obtain the off-chip memory address. And extracting the first base address and the first index address which are arranged in sequence from the note memory to obtain a scattered parameter packet of the target data. And writing the dispersion parameter packet into the off-chip memory by executing the dispersion instruction through the execution unit according to the off-chip memory address. By adopting the method, the dispersion efficiency of large-scale vector data from the vector register to the on-chip memory can be improved, so that vector calculation operation and vector data dispersion operation are performed concurrently, and the execution efficiency of the vector processor is improved.

Description

Vector data dispersion method and device based on note storage and computer equipment

Technical Field

The present disclosure relates to the field of high performance memory access technologies, and in particular, to a method and an apparatus for distributing vector data based on scratch pad storage, and a computer device.

Background

To address the growing demand for computing power and the limitations in power consumption constraints of modern processors, vector registers are often used in computer architectures as an important component of high-speed data processing. The vector register can support vector operation instructions and can operate on a plurality of data in parallel, so that the data processing efficiency is improved. Vector data is generally distributed in different memory locations, each element in a vector register needs to be stored in different addresses of a memory respectively, the execution unit accesses a plurality of memories frequently to calculate the storage address of each element, meanwhile, writing and storing are performed on the vector data according to the addresses, vector calculation cannot be performed concurrently with data dispersion operation in the traditional vector data dispersion technology, and therefore the utilization rate of a vector processor is low.

Disclosure of Invention

In view of the foregoing, it is desirable to provide a method, an apparatus, and a computer device for distributing vector data based on scratch pad storage, which can improve the efficiency of vector data processing.

The vector data dispersion method based on the note type storage is applied to a computer system, and the computer system comprises: execution unit, vector register, scratch pad memory, and off-chip memory.

The method comprises the following steps:

the vector register carries target data to the memo memory by executing the write-in instruction through the execution unit, and a first base address is obtained.

And reading the index of the target data in the off-chip memory, and writing the index into the note memory to obtain a first index address.

And calculating the address of the target data in the off-chip memory to obtain the off-chip memory address. The off-chip memory address includes: a base address of the target data and an offset index of the target data.

And extracting the first base address and the first index address which are arranged in sequence from the note memory to obtain a scattered parameter packet of the target data.

And writing the dispersion parameter packet into the off-chip memory by executing the dispersion instruction through the execution unit according to the off-chip memory address.

In one embodiment, the execution unit performs instruction operations according to the operating instructions received by the computer processor. The instruction operations include: write, read, store, and load.

In one embodiment, the method further comprises: the method comprises the steps of reading a plurality of target data which are discretely distributed in an off-chip memory through an execution unit, extracting indexes of the target data, and writing the indexes into a note memory according to a first base address sequence to obtain a first index address.

In one embodiment, the method further comprises: and carrying out one-by-one addition operation on the base address of the target data and each index of the target data element in the off-chip memory to obtain the off-chip memory address of the target data.

In one embodiment, the method further comprises: and matching the offset index of the target data in the off-chip storage address with the first index address of the scattered parameter packet, and writing the first data base address of the scattered parameter packet into the off-chip memory through the execution unit.

In one embodiment, the scratch pad memory handles discrete parameter packets in 1024 bits for data units to off-chip memory.

A vector data dispersion device based on a note-on storage, the device comprising:

the first base address acquisition module is used for the vector register to carry target data to the note type memory through the execution unit executing the writing instruction, and a first base address is obtained.

The first index address acquisition module is used for reading the index of the target data in the off-chip memory and writing the index into the note memory to obtain a first index address.

The source address acquisition module is used for calculating the address of the target data in the off-chip memory to obtain the off-chip memory address. The off-chip memory address includes: a base address of the target data and an offset index of the target data.

And the parameter packet acquisition module is used for extracting the first base address and the first index address which are arranged in sequence from the note memory to obtain a scattered parameter packet of the target data.

And the dispersion module is used for writing the dispersion parameter packet into the off-chip memory by executing the dispersion instruction through the execution unit according to the off-chip memory address.

A computer device comprising a memory storing a computer program and a processor which when executing the computer program performs the steps of:

According to the vector data dispersion method, the vector data dispersion device and the computer equipment based on the note type storage, the index of the target data is written into the note type storage, and the note type storage is used for storing the index of the target data, so that the expenditure of recalculating the index during each data dispersion operation is avoided, and the efficiency is improved. And then, the storage position of the target data in the off-chip memory is obtained by calculating the base address and the offset index of the target data, address information is provided for the subsequent scattered operation, the base address and the index address of the target data are extracted, the base address and the index address are packed into scattered parameter packets, parameter information is provided for the subsequent scattered operation, and finally, the data is scattered into the off-chip memory according to the calculated off-chip storage address. The characteristic of low delay and high bandwidth of the note storage is fully utilized as an intermediary for exchanging data between the vector register and the off-chip memory, discrete vector data is efficiently dispersed into the off-chip memory, and the data is accessed through the dispersion parameter packet. Compared with the traditional vector data dispersion method, the method can realize data calculation while the execution unit carries out dispersion instructions, and has higher efficiency and better flexibility.

Drawings

FIG. 1 is an application scenario diagram of a vector data dispersion method based on scratch pad storage in one embodiment;

FIG. 2 is a flow diagram of a method for vector data dispersion based on scratch pad storage in one embodiment;

FIG. 3 is a diagram of vector data types in one embodiment, wherein FIG. 3 (a) is single word vector data and FIG. 3 (b) is half word vector data;

FIG. 4 is a block diagram of a vector data dispersion device based on scratch pad storage in one embodiment;

fig. 5 is an internal structural diagram of a computer device in one embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.

The vector data dispersion method based on the note storage can be applied to a computer system shown in fig. 1. The system comprises a vector processor, an off-chip memory, a data bus and the like, wherein the vector processor comprises a vector control register, a vector execution unit, a vector register file, a note type memory and the like.

In one embodiment, as shown in fig. 2, a method for dispersing vector data based on a note type storage is provided, and the method is applied to the system in fig. 1 for illustration, and includes the following steps:

in step 202, the vector register carries target data to the scratchpad memory by executing the write instruction by the execution unit, and obtains a first base address.

The execution unit executes instruction operation according to the operation instruction received by the computer processor. The instruction operations include: vector write, vector read, vector store, vector load, etc.

Specifically, the execution unit executes the received computer scatter instruction, moves the target data element in the vector register file to a continuous position in the scratch pad memory, and obtains a first base address of the target data element in the scratch pad memory.

Step 204, the index of the target data in the off-chip memory is read, and the index is written into the note memory to obtain a first index address.

Meanwhile, the execution unit executes a scatter instruction on the off-chip memory, reads a plurality of target data which are discretely distributed in the off-chip memory, extracts indexes of the target data, transmits indexes (idx) of target data elements from the off-chip memory to the note type memory, and sequentially writes the indexes of the target data into the note type memory according to the first base address order to obtain a base address, namely a first index address, of the indexes of the target data elements in the note type memory.

In step 206, the address of the target data in the off-chip memory is calculated to obtain the off-chip memory address.

Specifically, calculating an offset index of a base address of each target data in the off-chip memory to obtain off-chip storage addresses of all target data elements, and adding the off-chip storage base address (src) of the target data in the off-chip memory and each offset index of the target data element in the off-chip memory one by one to obtain the off-chip storage address of each target data element, wherein the off-chip storage addresses comprise: a base address of the target data and an offset index of the target data.

The target data includes: a target data element, a base address of target data, and an index of target data.

Step 208, extracting the first base address and the first index address from the note memory, so as to obtain the dispersion parameter packet of the target data.

The scatter parameter packet comprises a first base address and a first index address of the target data element in the scratch pad memory, namely a scratch pad memory address, and further comprises an off-chip memory address and the target data element.

Step 210, according to the off-chip memory address, the scatter parameter packet is written into the off-chip memory by the execution unit executing the scatter instruction.

The computer system is a 1024-bit vector processor, and the type of vector data which can be processed is 16 single words or 32 half words, wherein the single words occupy 64-bit width, and the half words occupy 32-bit width, so that one vector register has 1024-bit width and can store 16 single words or 32 half words.

Specifically, the target data in the scatter parameter packet is sequentially and continuously arranged in sequence, the execution unit executes the scatter instruction, calls the DMA (Direct Memory Access) component, and carries the scatter parameter packet from the scratch pad memory to the off-chip memory in 1024-bit data units. Further, after receiving a placing instruction of the computer system, the execution unit invokes the DMA component to execute the placing operation, performs position matching according to the offset index of the off-chip storage address and the first index address of the scattered parameter packet, takes out the scattered parameter packets stored sequentially from the note memory, and writes the first data base address of the scattered parameter packet into the corresponding position of the off-chip memory.

In one embodiment, the execution unit reads a plurality of target data which are discretely distributed in the off-chip memory, extracts indexes of the target data, and writes the indexes into the note memory according to the first base address sequence to obtain a first index address.

It should be noted that, since the target data is discretely distributed in the off-chip memory, a lot of time is wasted if the target data is read from the off-chip memory once every time it needs to be accessed. By using the technical means, the indexes of all target data can be read at one time and written into the note memory according to a certain sequence, so that the access times to the off-chip memory can be greatly reduced, and the processing efficiency is improved. In addition, the indexes of the target data are written into the note-taking memory in advance and are arranged according to a certain sequence, so that the target data can be directly read from the note-taking memory according to the indexes when vector data are scattered, and the off-chip memory is not required to be accessed again, thereby improving the efficiency of vector data scattering.

In one embodiment, the base address of the target data and each index of the target data element in the off-chip memory are added one by one to obtain the off-chip memory address of the target data.

It should be noted that, by calculating the offset index of the base address of each target data in the off-chip memory, the address of each target data in the off-chip memory can be obtained. Thus, subsequent data reading and processing can be facilitated. Meanwhile, the process of calculating the off-chip storage address is handed to computer hardware for execution, so that the calculation efficiency can be greatly improved, the calculation error can be reduced, the execution unit can calculate vector data at the same time when executing instruction operation, and the efficiency and accuracy of a computer system in processing the vector data can be remarkably improved.

In one embodiment, the offset index according to the off-chip memory address is matched with the first index address of the parameter packet, and the first data base address of the scattered parameter packet is written into the off-chip memory by the execution unit.

It should be noted that the scatter parameter packet includes a first base address and a first index address of the target data element in the scratch pad memory, i.e. the scratch pad memory address, and further includes an off-chip memory address and the target data element. By matching the offset index of the off-chip storage address with the first index address of the scatter parameter packet, a corresponding scatter parameter packet can be found in the scratch pad memory, thereby obtaining the scatter parameter of the target data. The first data base address of the scattered parameter packet is written into the off-chip memory through the execution unit, so that the corresponding position of the target data in the vector register is the first data of the scattered parameter packet, the scattering of the vector data is realized, and the efficiency of vector data processing can be greatly improved.

It should be noted that, the computer system is a 1024-bit vector processor, and the types of vector data that can be processed are 16 single words or 32 half words as shown in fig. 3, where fig. 3 (a) is single word vector data, fig. 3 (b) is half word vector data, a single word occupies 64 bit width, and a half word occupies 32 bit width, so that one vector register has 1024 bit width, and can store 16 single words or 32 half words, and compared with smaller data units, the number of memory accesses can be reduced by using larger data units, thereby reducing the occupation of memory bandwidth. Meanwhile, the larger data unit can also reduce the number of times of register overflow and register preservation when carrying data, thereby reducing the number of instructions and execution time, and enabling a computer system to complete tasks more quickly and efficiently when processing large-scale vector data.

It should be understood that, although the steps in the flowcharts of fig. 1-2 are shown in order as indicated by the arrows, these steps are not necessarily performed in order as indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in fig. 1-2 may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, nor does the order in which the sub-steps or stages are performed necessarily occur in sequence, but may be performed alternately or alternately with at least a portion of the other steps or sub-steps or stages of other steps.

In one embodiment, as shown in fig. 4, there is provided a vector data dispersion apparatus based on a note storage, including: a first base address acquisition module 402, a first index address acquisition module 404, a source address acquisition module 406, a parameter packet acquisition module 408, and a dispersion module 410, wherein:

the first base address obtaining module 402 is configured to obtain a first base address by performing the write instruction to carry the target data to the scratchpad memory by the vector register through the execution unit.

The first index address obtaining module 404 is configured to read an index of the target data in the off-chip memory, and write the index into the scratch pad memory to obtain a first index address.

The source address obtaining module 406 is configured to calculate an address of the target data in the off-chip memory, and obtain an off-chip memory address. The off-chip memory address includes: a base address of the target data and an offset index of the target data.

The parameter packet obtaining module 408 is configured to extract the first base address and the first index address from the scratch pad memory, so as to obtain a scattered parameter packet of the target data.

The scatter module 410 is configured to write the scatter parameter packet into the off-chip memory according to the off-chip memory address by executing the scatter instruction by the execution unit.

For specific limitations on the vector data dispersion apparatus based on the note-on storage, reference may be made to the above limitation on the vector data dispersion method based on the note-on storage, and the description thereof will not be repeated here. The various modules in the vector data dispersion apparatus based on the note storage described above may be implemented in whole or in part by software, hardware, and combinations thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.

In one embodiment, a computer device is provided, which may be a terminal, and the internal structure of which may be as shown in fig. 5. The computer device includes a processor, a memory, a network interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program, when executed by a processor, implements a method for vector data dispersion based on scratch pad storage. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, can also be keys, a track ball or a touch pad arranged on the shell of the computer equipment, and can also be an external keyboard, a touch pad or a mouse and the like.

It will be appreciated by those skilled in the art that the structures shown in fig. 4-5 are block diagrams of only some of the structures associated with the present application and are not intended to limit the computer device to which the present application may be applied, and that a particular computer device may include more or fewer components than shown, or may combine certain components, or have a different arrangement of components.

In one embodiment, a computer device is provided comprising a memory storing a computer program and a processor that when executing the computer program performs the steps of:

In one embodiment, the processor when executing the computer program further performs the steps of: and reading a plurality of target data which are discretely distributed in the off-chip memory through an execution unit, extracting indexes of the target data, and writing the indexes into the note type memory according to a first base address sequence to obtain a first index address.

In one embodiment, the processor when executing the computer program further performs the steps of: and carrying out one-by-one addition operation on the base address of the target data and each index of the target data element in the off-chip memory to obtain the off-chip memory address of the target data.

In one embodiment, the processor when executing the computer program further performs the steps of: and matching the offset index of the off-chip storage address with the first index address of the parameter packet, and writing the first data base address of the scattered parameter packet into the off-chip memory through the execution unit.

Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the various embodiments provided herein may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), memory bus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.

The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The foregoing examples represent only a few embodiments of the present application, which are described in more detail and detail, but are not to be construed as limiting the scope of the invention. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, which would be within the scope of the present application. Accordingly, the scope of protection of the present application shall be subject to the appended claims.

Claims

1. The vector data dispersion method based on the note storage is characterized by being applied to a computer system, wherein the computer system comprises the following components: the device comprises an execution unit, a vector register, a note memory and an off-chip memory;

the method comprises the following steps:

the vector register carries target data to the note memory through the execution unit executing the writing instruction to obtain a first base address;

reading an index of the target data in the off-chip memory, and writing the index into a note memory to obtain a first index address;

calculating the address of the target data in the off-chip memory to obtain an off-chip memory address; the off-chip memory address includes: a base address of the target data and an offset index of the target data;

extracting the first base address and the first index address which are arranged in sequence from the note memory to obtain a scattered parameter packet of the target data;

and writing the dispersion parameter packet into the off-chip memory by executing a dispersion instruction through the execution unit according to the off-chip memory address.

2. The method of claim 1, wherein the execution unit performs instruction operations in accordance with the operation instructions received by the computer processor; the instruction operations include: write, read, store, and load.

3. The method of claim 2, wherein reading an index of the target data in off-chip memory, writing the index to a scratch pad memory, obtaining a first index address, comprises:

and reading a plurality of target data which are discretely distributed in the off-chip memory through the execution unit, extracting indexes of the target data, and sequentially writing the indexes into the note memory according to the first base address to obtain a first index address.

4. A method according to claim 3, wherein calculating the address of the target data in the off-chip memory to obtain the off-chip memory address comprises:

and carrying out one-by-one addition operation on the base address of the target data and each index of the target data element in the off-chip memory to obtain the off-chip memory address of the target data.

5. The method of claim 4, wherein writing the scatter parameter packet to the off-chip memory by the execution unit according to the off-chip memory address comprises:

and matching the offset index of the target data in the off-chip storage address with the first index address of the scattered parameter packet, and writing the first data base address of the scattered parameter packet into the off-chip memory through the execution unit.

6. The method of any one of claims 1 to 5, wherein the scratch pad memory handles the scatter parameter packets in 1024 bits for data units to the off-chip memory.

7. Vector data dispersion device based on note type storage, characterized in that it comprises:

the first base address acquisition module is used for the vector register to carry target data to the note memory through the execution unit executing the writing instruction, so as to obtain a first base address;

the first index address acquisition module is used for reading the index of the target data in the off-chip memory, writing the index into the note memory and obtaining a first index address;

the source address acquisition module is used for calculating the address of the target data in the off-chip memory to obtain an off-chip memory address; the off-chip memory address includes: a base address of the target data and an offset index of the target data;

the parameter packet acquisition module is used for extracting the first base address and the first index address which are arranged in sequence from the note memory to obtain a scattered parameter packet of the target data;

and the dispersion module is used for writing the dispersion parameter packet into the off-chip memory through the execution unit executing the dispersion instruction according to the off-chip memory address.

8. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any of claims 1 to 6 when the computer program is executed.