CN117312330B - Vector data aggregation method and device based on note storage and computer equipment - Google Patents

Vector data aggregation method and device based on note storage and computer equipment Download PDF

Info

Publication number
CN117312330B
CN117312330B CN202311614079.2A CN202311614079A CN117312330B CN 117312330 B CN117312330 B CN 117312330B CN 202311614079 A CN202311614079 A CN 202311614079A CN 117312330 B CN117312330 B CN 117312330B
Authority
CN
China
Prior art keywords
target data
address
index
memory
chip
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311614079.2A
Other languages
Chinese (zh)
Other versions
CN117312330A (en
Inventor
方建滨
张鹏
黄春
唐滔
彭林
崔英博
姜浩
沈洁
范小康
于恒彪
苏醒
易昕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National University of Defense Technology
Original Assignee
National University of Defense Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National University of Defense Technology filed Critical National University of Defense Technology
Priority to CN202311614079.2A priority Critical patent/CN117312330B/en
Publication of CN117312330A publication Critical patent/CN117312330A/en
Application granted granted Critical
Publication of CN117312330B publication Critical patent/CN117312330B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2237Vectors, bitmaps or matrices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/78Architectures of general purpose stored program computers comprising a single central processing unit
    • G06F15/7807System on chip, i.e. computer system on a single chip; System in package, i.e. computer system on one or more chips in a single package
    • G06F15/781On-chip cache; Off-chip memory
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The application relates to a vector data aggregation method and device based on note type storage and computer equipment. The method comprises the following steps: and reading the index of target data of the off-chip memory, and writing the index into the note memory through an execution unit to obtain a first index base address. And calculating the address of the target data in the off-chip memory to obtain the off-chip memory address. The off-chip memory address includes: a base address of the target data and an offset index of the target data. And acquiring an off-chip storage address through an execution unit, and writing the off-chip storage address into a note memory to obtain an aggregation parameter packet. And storing the aggregation parameter package in sequence according to the offset index and the first index base address, and carrying the aggregation parameter package to the vector register. By adopting the method, vector calculation and vector data aggregation operation can be performed concurrently, and the aggregation efficiency of vector data from on-chip memory to vector registers and the performance of an application program are improved.

Description

Vector data aggregation method and device based on note storage and computer equipment
Technical Field
The present disclosure relates to the field of high performance memory access technologies, and in particular, to a method and an apparatus for aggregating vector data based on scratch pad storage, and a computer device.
Background
To address the growing demand for computing power and the limitations in power consumption constraints of modern processors, current multi-core or many-core processors use Vector unit (Vector Units) technology and Scratch-pad Memory (SPM) technology. In one aspect, the vector unit uses a vector instruction to initiate the operation of a set of data elements, and before using the vector unit to perform computation, the required data needs to be loaded into a vector register close to the computing unit from an off-chip memory, so that the performance of an application program can be doubled by fully using the vector unit. On the other hand, a scratch pad is a quickly accessible memory bank located on a processor chip. Practical applications often require loading data elements scattered in different locations in memory into a vector register, i.e., vector data aggregation (Vector Data Gather) operations. In conventional computer systems, vector data aggregation operations typically require the use of loops to traverse the data, computing one by one. Because of the large amount of calculation, the method has lower efficiency, and is difficult to meet the high-efficiency processing requirement of large-scale data. For example, the non-zero elements of the sparse matrix are scattered, and conventional vector data aggregation methods require accessing data in memory one by one, often require reading elements from different memory locations into the same vector register, and are inefficient and less parallel for aggregate processing and access of large volumes of data.
Disclosure of Invention
In view of the foregoing, it is desirable to provide a method, an apparatus, and a computer device for aggregating vector data based on scratch pad storage, which can improve the efficiency of vector data aggregation processing.
The vector data aggregation method based on the note storage is applied to a computer system, and the computer system comprises: execution unit, vector register, scratch pad memory, and off-chip memory.
The method comprises the following steps:
and reading the index of target data of the off-chip memory, and writing the index into the note memory through an execution unit to obtain a first index base address.
And calculating the address of the target data in the off-chip memory to obtain the off-chip memory address. The off-chip memory address includes: a base address of the target data and an offset index of the target data.
And acquiring an off-chip storage address through an execution unit, and writing the off-chip storage address into a note memory to obtain an aggregation parameter packet.
And storing the aggregation parameter package in sequence according to the offset index and the first index base address, and carrying the aggregation parameter package to the vector register.
In one embodiment, the target data includes: target data element, index of target data, and base address of target data.
In one embodiment, the method further comprises: the application program reads a plurality of target data which are discretely distributed in the off-chip memory, the execution unit receives an aggregation instruction of the computer system to execute extraction operation, the index of the target data is extracted from the off-chip memory, and the index is written into the note memory in sequence to obtain a first index base address.
In one embodiment, the method further comprises: and carrying out one-by-one addition operation on the base address of the target data and each index of the target data element in the off-chip memory to obtain the off-chip memory address of the target data.
In one embodiment, the method further comprises: and acquiring an off-chip storage address from the off-chip memory through the execution unit, and writing the offset index into the note memory to obtain the note storage address of the target data. And storing the target data elements in sequence according to the note type storage address to obtain an aggregation parameter packet. The aggregation parameter package includes: a note memory address, an off-chip memory address, and a target data element.
In one embodiment, the scratch pad memory handles aggregation parameter packets in 1024 bits for data units to the vector registers.
A vector data aggregating device based on a note storage, the device comprising:
the first index base address acquisition module is used for reading the index of the target data of the off-chip memory, and writing the index into the note memory through the execution unit to obtain a first index base address.
The source address acquisition module is used for calculating the address of the target data element in the off-chip memory to obtain the off-chip memory address. The off-chip memory address includes: a base address of the target data and an offset index of the target data.
And the aggregation parameter package acquisition module is used for acquiring the off-chip storage address through the execution unit and writing the off-chip storage address into the note memory to obtain an aggregation parameter package.
And the aggregation module is used for storing the aggregation parameter packets in sequence according to the offset index and the first index base address and carrying the aggregation parameter packets to the vector register.
A computer device comprising a memory storing a computer program and a processor which when executing the computer program performs the steps of:
and reading the index of target data of the off-chip memory, and writing the index into the note memory through an execution unit to obtain a first index base address.
And calculating the address of the target data in the off-chip memory to obtain the off-chip memory address. The off-chip memory address includes: a base address of the target data and an offset index of the target data.
And acquiring an off-chip storage address through an execution unit, and writing the off-chip storage address into a note memory to obtain an aggregation parameter packet.
And storing the aggregation parameter package in sequence according to the offset index and the first index base address, and carrying the aggregation parameter package to the vector register.
According to the vector data aggregation method, the vector data aggregation device and the computer equipment based on the note type storage, the note type storage is used for storing the index of the target data and the first index base address, the data access efficiency can be improved, when certain target data needs to be accessed, the corresponding base address is only needed to be searched in the note type storage through the index of the target data, and then the actual address of the target data in the off-chip storage is obtained through the source address calculation, so that the target data is directly accessed, and the access and processing efficiency and the parallelism are effectively improved for large-batch data access.
Drawings
FIG. 1 is an application scenario diagram of a method of vector data aggregation based on scratch pad storage in one embodiment;
FIG. 2 is a flow diagram of a method of vector data aggregation based on scratch pad storage in one embodiment;
FIG. 3 is a diagram of vector data types in one embodiment, wherein FIG. 3 (a) is single word vector data and FIG. 3 (b) is half word vector data;
FIG. 4 is a block diagram of a vector data aggregation device based on scratch pad storage in one embodiment;
fig. 5 is an internal structural diagram of a computer device in one embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.
The vector data aggregation method based on the note storage can be applied to a computer system shown in fig. 1. The system comprises a vector processor, an off-chip memory, a data bus and the like, wherein the vector processor comprises a vector control register, a vector execution unit, a vector register file, a note type memory and the like.
In one embodiment, as shown in fig. 2, a method for aggregating vector data based on a note storage is provided, and the method is applied to the system in fig. 1 for illustration, and includes the following steps:
step 202, reading an index of target data of an off-chip memory, and writing the index into a note memory through an execution unit to obtain a first index base address.
The execution unit executes instruction operation according to the operation instruction received by the computer processor. The instruction operations include: write, read, store, load, and handle, etc. The target data includes: the data base address corresponds to an index address of the data base address.
Specifically, the execution unit executes the received computer aggregation instruction, calls the DMA (Direct Memory Access) component to read the indexes of the plurality of target data discretely distributed in the off-chip memory, extracts the indexes to the note-pad memory, and writes the indexes of the plurality of target data into the note-pad memory in sequence to obtain a plurality of continuously arranged first index addresses.
Step 204, the address of the target data in the off-chip memory is calculated to obtain the off-chip memory address.
Specifically, calculating an offset index of a base address of each target data in an off-chip memory in the off-chip memory to obtain off-chip storage addresses of all target data elements, and performing one-by-one addition operation on the off-chip storage base address (src) of the target data in the off-chip memory and each index of the target data element in the off-chip memory to obtain the off-chip storage address of each target data element, wherein the off-chip storage addresses comprise: a base address of the target data and an offset index of the target data.
Step 206, obtaining the off-chip storage address through the execution unit, and writing the off-chip storage address into the note memory to obtain the aggregation parameter package.
Specifically, after the execution unit receives the extraction instruction of the computer system, the DMA (Direct Memory Access) component executes the extraction operation, obtains the off-chip storage address from the off-chip memory, and writes the offset index into the note-type memory to obtain the note-type storage address of the target data. And storing the target data elements in sequence in a continuous mode according to the note storage addresses to obtain an aggregation parameter package. The aggregation parameter package includes: a note memory address, an off-chip memory address, and a target data element.
Step 208, storing the aggregation parameter package in sequence according to the offset index and the first index base address, and carrying the aggregation parameter package to the vector register.
It should be noted that, the execution unit executes the received computer aggregation instruction, loads the aggregation parameter packet of the scratch pad memory, executes the handling operation, and handles the aggregation parameter packet of the scratch pad memory to the destination vector register (dst_vr).
In the vector data aggregation method based on the note-on storage, the vector data aggregation device based on the note-on storage and the computer equipment are characterized in that the note-on storage is used for storing the index of target data and the first index base address, the note-on storage is used for storing the index of the target data and the first index base address, the data access efficiency can be improved, when certain target data needs to be accessed, the corresponding base address is only needed to be searched in the note-on storage through the index of the target data, and then the actual address of the target data in the off-chip storage is obtained through the calculation of the source address, so that the target data is directly accessed, and the access and processing efficiency and the parallelism are effectively improved for large-batch data access.
In one embodiment, the target data includes: target data element, index of target data, and base address of target data.
In one embodiment, an application program reads a plurality of target data discretely distributed in an off-chip memory, an execution unit receives an aggregation instruction of a computer system to execute extraction operation, and an index of the target data is extracted from the off-chip memory and sequentially written into a note memory to obtain a first index base address.
It should be noted that, by extracting the index of the target data from the off-chip memory and writing the index into the scratch pad in sequence, the first index base address may be obtained. When the subsequent vector data aggregation operation is performed, the actual address of the target data can be calculated directly through the index and the first index base address, so that the data in the off-chip memory can be directly accessed, frequent access to the off-chip memory can be avoided, the time and the power consumption of data access are reduced, and the efficiency of data access is improved. In addition, the extracted target data is stored in a vector register, so that efficient vector data aggregation can be realized. The vector register can store a plurality of data simultaneously, supports parallel operation, and can improve the calculation efficiency. Meanwhile, the efficiency of data access is improved, so that the calculation delay and the power consumption can be further reduced, and the calculation efficiency is further improved.
In one embodiment, the base address of the target data and each index of the target data element in the off-chip memory are added one by one to obtain the off-chip memory address of the target data.
It should be noted that, by calculating the offset index of each target data in the off-chip memory, the off-chip memory address of the target data may be obtained. By summing the offset index and the base address of the target data, the off-chip memory address of the target data can be obtained. Thus, the off-chip memory is not required to be accessed once every time the target data element is accessed, and the off-chip memory address of the target data can be obtained through calculation, so that the target data element is directly accessed. This approach may reduce the number of accesses to off-chip memory, thereby increasing access speed and reducing power consumption. It can be seen that efficient vector data aggregation can be achieved by calculating the off-chip memory address of the target data and storing it in the vector register.
In one embodiment, the execution unit obtains the off-chip memory address from the off-chip memory, and writes the offset index into the scratch pad memory to obtain the scratch pad memory address of the target data. And storing the target data elements in sequence according to the note type storage address to obtain an aggregation parameter packet. The aggregation parameter package includes: a note memory address, an off-chip memory address, and a target data element.
It should be noted that, by extracting the off-chip memory address of the target data from the off-chip memory, the number of accesses to the memory may be reduced. After the note type storage address of the target data is obtained, the target data elements are sequentially stored in the note type memory, and adjacent data can be stored in adjacent storage units by utilizing the locality principle, so that the access efficiency of the memory is improved, and the access delay and the power consumption of the memory are reduced. Therefore, the note type storage address, the off-chip storage address and the target data element are stored in the aggregation parameter packet, so that the retrieval, calling and aggregation of the target data element among memories are smoother and more efficient, and efficient vector data aggregation can be realized. The vector register can store a plurality of data simultaneously, supports parallel operation, and can improve the calculation efficiency. Meanwhile, as the efficiency of data access is improved, the delay and the power consumption of calculation can be further reduced, and the calculation efficiency is further improved, so that the performance of the whole calculation system is improved.
In one embodiment, the scratch pad memory handles aggregation parameter packets in 1024 bits for data units to the vector registers.
It should be noted that, as shown in fig. 3, the vector data types may be 16 single-word vector data and 32 half-word vector data, where fig. 3 (a) is 16 single-word vector data, each of which occupies 64-bit width, and fig. 3 (b) is 32 half-word vector data, each of which occupies 32-bit width. The vector register has a 1024-bit width and can store 16 single-word vector data or 32 half-word vector data, so the scratch pad memory carries the aggregation parameters to the vector register in 1024-bit data units. Because the cache of the vector register is much faster than the access speed of the off-chip memory, the aggregation parameter packet is directly carried into the vector register, so that the access delay and the memory bandwidth consumption of data can be greatly reduced, and the efficiency and the performance of the aggregation operation are improved. Meanwhile, the vector register supports a vectorization instruction, and the vector execution unit can execute a plurality of identical or similar operations in parallel, and can further improve the efficiency of aggregation operation.
It should be understood that, although the steps in the flowcharts of fig. 1-2 are shown in order as indicated by the arrows, these steps are not necessarily performed in order as indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in fig. 1-2 may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, nor does the order in which the sub-steps or stages are performed necessarily occur in sequence, but may be performed alternately or alternately with at least a portion of the other steps or sub-steps or stages of other steps.
In one embodiment, as shown in fig. 4, there is provided a vector data aggregation apparatus based on a note storage, including: a first index base address acquisition module 402, a source address acquisition module 404, an aggregation parameter package acquisition module 406, and an aggregation module 408, wherein:
the first index base address obtaining module 402 is configured to read an index of target data of the off-chip memory, and write the index into the scratch pad memory through the execution unit to obtain a first index base address.
The source address obtaining module 404 is configured to calculate an address of the target data element in the off-chip memory, and obtain an off-chip memory address. The off-chip memory address includes: a base address of the target data and an offset index of the target data.
The aggregation parameter package obtaining module 406 is configured to obtain, by using the execution unit, the off-chip storage address, and write the off-chip storage address into the scratch pad memory, thereby obtaining an aggregation parameter package.
An aggregation module 408, configured to store the aggregation parameter packages in sequence according to the offset index and the first index base address, and carry the aggregation parameter packages to the vector register.
For specific limitations on the vector data aggregation device based on the note-on storage, reference may be made to the above limitation on the vector data aggregation method based on the note-on storage, and the description thereof will not be repeated here. The various modules in the vector data aggregation device based on the note-pad storage can be implemented in whole or in part by software, hardware and a combination thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.
In one embodiment, a computer device is provided, which may be a terminal, and the internal structure of which may be as shown in fig. 5. The computer device includes a processor, a memory, a network interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program, when executed by a processor, implements a method for vector data aggregation based on scratch pad storage. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, can also be keys, a track ball or a touch pad arranged on the shell of the computer equipment, and can also be an external keyboard, a touch pad or a mouse and the like.
It will be appreciated by those skilled in the art that the structures shown in fig. 4-5 are block diagrams of only some of the structures associated with the present application and are not intended to limit the computer device to which the present application may be applied, and that a particular computer device may include more or fewer components than shown, or may combine certain components, or have a different arrangement of components.
In one embodiment, a computer device is provided comprising a memory storing a computer program and a processor that when executing the computer program performs the steps of:
and reading the index of target data of the off-chip memory, and writing the index into the note memory through an execution unit to obtain a first index base address.
And calculating the address of the target data element in the off-chip memory to obtain the off-chip memory address. The off-chip memory address includes: a base address of the target data and an offset index of the target data.
And acquiring an off-chip storage address through an execution unit, and writing the off-chip storage address into a note memory to obtain an aggregation parameter packet.
And storing the aggregation parameter package in sequence according to the offset index and the first index base address, and carrying the aggregation parameter package to the vector register.
In one embodiment, the processor when executing the computer program further performs the steps of: the application program reads a plurality of target data which are discretely distributed in the off-chip memory, the execution unit receives an aggregation instruction of the computer system to execute extraction operation, the index of the target data is extracted from the off-chip memory, and the index is written into the note memory in sequence to obtain a first index base address.
In one embodiment, the processor when executing the computer program further performs the steps of: and carrying out one-by-one addition operation on the base address of the target data and each index of the target data element in the off-chip memory to obtain the off-chip memory address of the target data.
In one embodiment, the processor when executing the computer program further performs the steps of: and acquiring an off-chip storage address from the off-chip memory through the execution unit, and writing the offset index into the note memory to obtain the note storage address of the target data. And storing the target data elements in sequence according to the note type storage address to obtain an aggregation parameter packet. The aggregation parameter package includes: a note memory address, an off-chip memory address, and a target data element.
Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the various embodiments provided herein may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), memory bus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.
The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The foregoing examples represent only a few embodiments of the present application, which are described in more detail and detail, but are not to be construed as limiting the scope of the invention. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, which would be within the scope of the present application. Accordingly, the scope of protection of the present application shall be subject to the appended claims.

Claims (6)

1. The vector data aggregation method based on the note storage is characterized by being applied to a computer system, wherein the computer system comprises the following components: the device comprises an execution unit, a vector register, a note memory and an off-chip memory;
the method comprises the following steps:
reading an index of target data of the off-chip memory, and writing the index into a note memory through the execution unit to obtain a first index base address;
calculating the address of the target data in the off-chip memory to obtain an off-chip memory address; the off-chip memory address includes: a base address of the target data and an offset index of the target data;
adding the base address of the target data and each index of the target data element in the off-chip memory one by one to obtain an off-chip memory address of the target data;
acquiring the off-chip storage address through the execution unit, and writing the off-chip storage address into the note memory to obtain an aggregation parameter packet;
acquiring the off-chip storage address from the off-chip memory through the execution unit, and writing the offset index into the note-type memory to obtain the note-type storage address of the target data;
sequentially storing the target data elements according to the note storage addresses to obtain an aggregation parameter packet; the aggregation parameter package includes: the note storage address, the off-chip storage address, and the target data element;
and storing the aggregation parameter package in sequence according to the offset index and the first index base address, and carrying the aggregation parameter package to the vector register.
2. The method of claim 1, wherein the target data comprises: a target data element, an index of the target data, and a base address of the target data.
3. The method of claim 2, wherein reading an index of target data of the off-chip memory, writing the index to a scratch pad memory by the execution unit, obtaining a first index base address, comprises:
the application program reads a plurality of target data which are discretely distributed in the off-chip memory, the execution unit receives an aggregation instruction of the computer system to execute extraction operation, the index of the target data is extracted from the off-chip memory, and the index is written into the note memory in sequence to obtain a first index base address.
4. A method according to any one of claims 1 to 3, wherein the scratch pad memory handles the aggregation parameter packets to the vector register in 1024 bits of data units.
5. Vector data aggregating device based on note storage, characterized in that it comprises:
the first index base address acquisition module is used for reading the index of target data of the off-chip memory, and writing the index into the note memory through the execution unit to obtain a first index base address;
the source address acquisition module is used for calculating the address of the target data in the off-chip memory to obtain an off-chip memory address; the off-chip memory address includes: a base address of the target data and an offset index of the target data; adding the base address of the target data and each index of the target data element in the off-chip memory one by one to obtain an off-chip memory address of the target data;
the aggregation parameter packet acquisition module is used for acquiring the off-chip storage address through the execution unit and writing the off-chip storage address into the note memory to obtain an aggregation parameter packet;
acquiring the off-chip storage address from the off-chip memory through the execution unit, and writing the offset index into the note-type memory to obtain the note-type storage address of the target data;
sequentially storing the target data elements according to the note storage addresses to obtain an aggregation parameter packet; the aggregation parameter package includes: the note storage address, the off-chip storage address, and the target data element;
and the aggregation module is used for storing the aggregation parameter packets in sequence according to the offset index and the first index base address and carrying the aggregation parameter packets to a vector register.
6. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any of claims 1 to 4 when the computer program is executed.
CN202311614079.2A 2023-11-29 2023-11-29 Vector data aggregation method and device based on note storage and computer equipment Active CN117312330B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311614079.2A CN117312330B (en) 2023-11-29 2023-11-29 Vector data aggregation method and device based on note storage and computer equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311614079.2A CN117312330B (en) 2023-11-29 2023-11-29 Vector data aggregation method and device based on note storage and computer equipment

Publications (2)

Publication Number Publication Date
CN117312330A CN117312330A (en) 2023-12-29
CN117312330B true CN117312330B (en) 2024-02-09

Family

ID=89281542

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311614079.2A Active CN117312330B (en) 2023-11-29 2023-11-29 Vector data aggregation method and device based on note storage and computer equipment

Country Status (1)

Country Link
CN (1) CN117312330B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117312182B (en) * 2023-11-29 2024-02-20 中国人民解放军国防科技大学 Vector data dispersion method and device based on note storage and computer equipment

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB1528062A (en) * 1975-06-19 1978-10-11 Honeywell Inf Systems Data processing systems
US4426684A (en) * 1979-07-09 1984-01-17 Etablissement Public De Diffusion Dit "Telediffusion De France" Scratch pad memory for cassette of magnetic tape recording
US5966734A (en) * 1996-10-18 1999-10-12 Samsung Electronics Co., Ltd. Resizable and relocatable memory scratch pad as a cache slice
CN101739358A (en) * 2009-12-21 2010-06-16 东南大学 Method for dynamically allocating on-chip heterogeneous memory resources by utilizing virtual memory mechanism
KR20110007360A (en) * 2009-07-16 2011-01-24 삼성전자주식회사 Apparatus and method for scratch pad memory management
CN102375800A (en) * 2010-08-11 2012-03-14 普莱姆森斯有限公司 Multiprocessor system-on-a-chip for machine vision algorithms
CN103150265A (en) * 2013-02-04 2013-06-12 山东大学 Fine grit data distributing method orienting to embedded on-chip heterogeneous memory
CN108292232A (en) * 2015-12-21 2018-07-17 英特尔公司 Instruction for loading index and scatter operation and logic
CN110941448A (en) * 2018-09-24 2020-03-31 英特尔公司 Apparatus and method for sheet accumulation and sheet dispersion
CN112130848A (en) * 2020-09-24 2020-12-25 中国科学院计算技术研究所 Band width sensing circulation blocking optimization technology facing scratch pad memory
CN115630013A (en) * 2022-10-31 2023-01-20 上海交通大学 Scratch pad type cache architecture construction method and system based on spatial reconfigurable array
CN115729845A (en) * 2021-08-30 2023-03-03 华为技术有限公司 Data storage device and data processing method

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7363459B2 (en) * 2002-10-31 2008-04-22 Hewlett-Packard Development Company, L.P. System and method of optimizing memory usage with data lifetimes
JP3659252B2 (en) * 2003-03-28 2005-06-15 セイコーエプソン株式会社 Vector data address reference method and vector processor
CN105608050B (en) * 2015-12-31 2019-02-01 华为技术有限公司 Date storage method and system
US10402425B2 (en) * 2016-03-18 2019-09-03 Oracle International Corporation Tuple encoding aware direct memory access engine for scratchpad enabled multi-core processors
KR102620843B1 (en) * 2021-11-22 2024-01-03 리벨리온 주식회사 Reconfigurable on-chip memory bank, Reconfigurable on-chip memory, System on Chip mounted same and Method for using Reconfigurable on-chip memory

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB1528062A (en) * 1975-06-19 1978-10-11 Honeywell Inf Systems Data processing systems
US4426684A (en) * 1979-07-09 1984-01-17 Etablissement Public De Diffusion Dit "Telediffusion De France" Scratch pad memory for cassette of magnetic tape recording
US5966734A (en) * 1996-10-18 1999-10-12 Samsung Electronics Co., Ltd. Resizable and relocatable memory scratch pad as a cache slice
KR20110007360A (en) * 2009-07-16 2011-01-24 삼성전자주식회사 Apparatus and method for scratch pad memory management
CN101739358A (en) * 2009-12-21 2010-06-16 东南大学 Method for dynamically allocating on-chip heterogeneous memory resources by utilizing virtual memory mechanism
CN102375800A (en) * 2010-08-11 2012-03-14 普莱姆森斯有限公司 Multiprocessor system-on-a-chip for machine vision algorithms
CN103150265A (en) * 2013-02-04 2013-06-12 山东大学 Fine grit data distributing method orienting to embedded on-chip heterogeneous memory
CN108292232A (en) * 2015-12-21 2018-07-17 英特尔公司 Instruction for loading index and scatter operation and logic
CN110941448A (en) * 2018-09-24 2020-03-31 英特尔公司 Apparatus and method for sheet accumulation and sheet dispersion
CN112130848A (en) * 2020-09-24 2020-12-25 中国科学院计算技术研究所 Band width sensing circulation blocking optimization technology facing scratch pad memory
CN115729845A (en) * 2021-08-30 2023-03-03 华为技术有限公司 Data storage device and data processing method
CN115630013A (en) * 2022-10-31 2023-01-20 上海交通大学 Scratch pad type cache architecture construction method and system based on spatial reconfigurable array

Non-Patent Citations (7)

* Cited by examiner, † Cited by third party
Title
Aristotle: A performance impact indicator for the OpenCL kernels using local memory;Fang, JB (Fang, Jianbin) 等;《Web of Science》;全文 *
CRISPR高通量筛选数据的整合分析技术研究;崔英博;《信息科技》;20180901;全文 *
Optimizing Streaming Parallelism on Heterogeneous Many-Core Architectures;peng zhang,jianbin fang等;《IEEE》;全文 *
peng zhang,jianbin fang等.Evaluating Multiple Streams on Heterogeneous Platforms.《Web of Science》.2016,全文. *
peng zhang,jianbin fang等.Optimizing Direct Convolutions on ARM Multi-Cores.《ACM》.2023,全文. *
多智能体合作对抗环境下策略优化技术研究及系统实现;姜浩;《信息科技》;20230215;全文 *
面向异构众核平台的多任务流编程与性能优化技术研究;张鹏;《信息科技》;全文 *

Also Published As

Publication number Publication date
CN117312330A (en) 2023-12-29

Similar Documents

Publication Publication Date Title
US10140251B2 (en) Processor and method for executing matrix multiplication operation on processor
US8984043B2 (en) Multiplying and adding matrices
CN103336758B (en) The sparse matrix storage means of a kind of employing with the sparse row of compression of local information and the SpMV implementation method based on the method
US11586577B2 (en) Autonomous memory architecture
CN117312330B (en) Vector data aggregation method and device based on note storage and computer equipment
US8676874B2 (en) Data structure for tiling and packetizing a sparse matrix
US8762655B2 (en) Optimizing output vector data generation using a formatted matrix data structure
US9612750B2 (en) Autonomous memory subsystem architecture
KR102287677B1 (en) Data accessing method, apparatus, device, and storage medium
CN107315716B (en) Device and method for executing vector outer product operation
US9058301B2 (en) Efficient transfer of matrices for matrix based operations
US20220342934A1 (en) System for graph node sampling and method implemented by computer
TWI696949B (en) Direct memory access method, device, dedicated computing chip and heterogeneous computing system
CN113900710B (en) Expansion memory assembly
CN111158757B (en) Parallel access device and method and chip
US10915470B2 (en) Memory system
US8826252B2 (en) Using vector atomic memory operation to handle data of different lengths
CN117312182B (en) Vector data dispersion method and device based on note storage and computer equipment
WO2022007597A1 (en) Matrix operation method and accelerator
US11467973B1 (en) Fine-grained access memory controller
CN110096307B (en) Communication processor
CN116303135B (en) Task data loading method and device and computer equipment
Sun et al. Optimizing sparse matrix-vector multiplication on GPUs via index compression
KR20180018269A (en) Computing apparatus and method for processing operations thereof
KR20220079987A (en) Near-memory data reduction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant