CN117008810A - Data processing method and device and related equipment - Google Patents

Data processing method and device and related equipment Download PDF

Info

Publication number
CN117008810A
CN117008810A CN202210451716.8A CN202210451716A CN117008810A CN 117008810 A CN117008810 A CN 117008810A CN 202210451716 A CN202210451716 A CN 202210451716A CN 117008810 A CN117008810 A CN 117008810A
Authority
CN
China
Prior art keywords
data
memory
accelerator
address
state
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210451716.8A
Other languages
Chinese (zh)
Inventor
熊鹰
徐栋
卢霄
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN202210451716.8A priority Critical patent/CN117008810A/en
Priority to PCT/CN2023/091041 priority patent/WO2023208087A1/en
Publication of CN117008810A publication Critical patent/CN117008810A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/06Addressing a physical block of locations, e.g. base addressing, module addressing, memory dedication
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/06Addressing a physical block of locations, e.g. base addressing, module addressing, memory dedication
    • G06F12/0646Configuration or reconfiguration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0877Cache access modes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0629Configuration or reconfiguration of storage systems

Abstract

The application provides a data processing method, a device and related equipment, wherein the method is applied to an accelerator, the accelerator comprises a first memory, and the method comprises the following steps: the method comprises the steps that an accelerator obtains a data processing instruction, wherein the data processing instruction comprises an address of data in a second memory, the data reading and writing efficiency of a first memory is higher than that of the second memory, the accelerator reads the data from the first memory according to the data processing instruction and processes the data, the first memory is used for storing data to be cached in the second memory, the data to be cached comprises data historically accessed by the accelerator, and therefore the accelerator can directly interact with the first memory with higher reading and writing efficiency when reading and writing the data, the data reading and writing efficiency of the accelerator is improved, and the problem that the reading and writing efficiency of the accelerator is limited by a memory bandwidth bottleneck is solved.

Description

Data processing method and device and related equipment
Technical Field
The present application relates to the field of computers, and in particular, to a data processing method, apparatus and related devices.
Background
With the continuous development of science and technology, accelerators (also referred to as acceleration processing units) such as application-specific integrated circuits (ASICs), field-programmable gate array, and field-programmable gate arrays (FPGAs) play increasingly important roles in data processing, such as matrix operations, image processing, and Machine Learning (ML), etc. In these fields, accelerators have a high demand for real-time data reading and writing in memories. However, when the current access frequency of the accelerator to the data in the memory and the available bandwidth provided by the memory for the accelerator are unbalanced, and the accelerator needs to perform a read-write operation on the data in the memory, a certain period of time is required to be additionally waited, which results in a decrease in the processing performance of the accelerator.
Disclosure of Invention
The application provides a data processing method, a data processing device and related equipment, which are used for solving the problem that the processing performance of an accelerator is reduced due to unbalanced access frequency of the accelerator to data in a memory and unbalanced available bandwidth provided by the memory for the accelerator.
In a first aspect, a data processing method is provided, the method being applicable to an accelerator, the accelerator comprising a first memory, the method comprising the steps of: the accelerator acquires a data processing instruction, wherein the data processing instruction comprises an address of data in a second memory, the data reading and writing efficiency of a first memory is larger than that of the second memory, the accelerator reads the data from the first memory according to the data processing instruction and processes the data, the first memory is used for storing data to be cached in the second memory, and the data to be cached comprises data accessed by the accelerator in a history mode.
By implementing the method described in the first aspect, the accelerator stores the data to be cached in the second memory in the first memory, wherein the read-write efficiency of the first memory is greater than that of the second memory, so that the accelerator can directly interact with the first memory with higher read-write efficiency when processing the data processing instruction, thereby improving the data read-write efficiency of the accelerator.
In one possible implementation, the accelerator is implemented by application-specific integrated circuit (ASIC) or field-programmable gate array (FPGA) technology, and the first memory includes Static RAM (SRAM), storage class memory (storage class memory, SCM), registers (registers), content-addressable memory (content-addressable memory, CAM), and the like, and the present application is not limited in particular.
In a specific implementation, the computing device where the accelerator is located may include a central processing unit (central processing unit, CPU) and the accelerator, where the accelerator may be a system-level chip implemented by FPGA, ASIC, or other technology, and may specifically be a processing unit in the computing device 100 that is used to assist the CPU in processing special types of computing tasks, where the special types of computing tasks may be graphics processing, vector computing, machine learning, and so on. The accelerator may be an image processor (graphics processing unit, GPU), a processor decentralized processing unit (data processing unit, DPU), a neural network processor (neural-network processing unit, NPU), or the like. Alternatively, the accelerator may be a CPU, in other words, the computing device may include a plurality of processors such as CPU1 and CPU2, where CPU1 is processor 110 and CPU2 is the accelerator, and it should be understood that the foregoing examples are for illustration, and the present application is not limited in particular.
In a specific implementation, the first memory and the second memory may be an original internal memory of the accelerator; alternatively, the first memory is an original internal memory of the accelerator, and the second memory is an original memory outside the accelerator, for example, the second memory is a memory in a computing device where the accelerator is located, and it should be understood that, according to the method provided by the application, a suitable memory can be selected as the first memory inside the accelerator in combination with a specific application scenario, if a plurality of memories are provided inside the accelerator, the first memory can be used as the second memory with low read-write efficiency, and the second memory with high read-write efficiency, and if no plurality of memories are provided inside the accelerator or the read-write efficiency of a plurality of memories is the same, the first memory can be used as the memory inside the accelerator, and the second memory can be used as the memory with low read-write efficiency but large storage capacity outside the accelerator.
According to the implementation mode, the original memory hardware in the accelerator is used, the data reading and writing efficiency of the accelerator 120 is improved through an algorithm, no additional cache hardware is required to be deployed, the implementation cost of the scheme is reduced, and the implementation of the scheme is stronger particularly for the accelerator which is small in hardware specification and size and is realized by an ASIC (application specific integrated circuit) or FPGA (field programmable gate array) technology.
In one possible implementation, the data to be cached includes data having an access frequency above a first threshold.
In a specific implementation, the accelerator may store the data in the second memory, and then store the data to be cached in the second memory in the first memory. The accelerator may update the data to be cached to the first memory in real time, or may update the data to be cached to the first memory in a delayed manner or according to a certain algorithm, and the present application is not limited in particular. Wherein the data to be cached comprises data accessed by the accelerator history. Therefore, when the accelerator accesses the data again, the data can be interacted with the first memory with higher reading and writing speed directly, so that the data reading and writing efficiency of the accelerator is improved.
Alternatively, the data to be cached may include data having a historical access frequency above a first threshold. That is, the data stored in the first memory is the data with higher access frequency stored in the second memory, and because the read-write efficiency of the first memory is faster, the accelerator can directly interact with the first memory when processing the frequently accessed data, thereby improving the read-write efficiency of the accelerator. The magnitude of the first threshold may be determined according to a specific application scenario, which is not specifically limited in the present application.
Alternatively, when the data in the first memory reaches the storage threshold, the data with the access frequency not higher than the second threshold in the first memory may be deleted, then the data with the accelerator accessing the second memory is continuously stored in the first memory, and when the data in the first memory reaches the storage threshold, the data with the access frequency not higher than the second threshold in the first memory is deleted, so that the data stored in the first memory is the data with the higher access frequency and accessed by the accelerator. The first threshold value and the second threshold value may be the same value, or the second threshold value is greater than the first threshold value, which is not particularly limited in the present application.
Alternatively, the data to be cached may be data accessed by the accelerator recently, where the latest may refer to data accessed by the accelerator within a time range, where the time range may be determined according to a storage capacity of the first memory, specifically, when a data amount of the first memory reaches a storage threshold, an access time of each data in the first memory is ordered, the data with an access time that is the longest from the current time is deleted, and so on, so as to ensure that the access time of the data stored in the first memory is the latest access time.
Alternatively, the data recently accessed by the accelerator may be data accessed by the accelerator within a preset time range, that is, the current time is T, and then the preset time range may be a time range from the time of T to the time of T, where the size of T may be determined according to a specific application scenario, and the application is not limited specifically.
Optionally, the data to be cached may also include prefetched data, in which, in brief, the storage controller may determine, through a prefetching algorithm, data that may be accessed by the accelerator, extract the data from the second memory in advance, and store the data in the first memory with a faster read-write efficiency, so that when the accelerator requests to read the data, the data may interact with the first memory directly, thereby improving the read-write efficiency of the accelerator.
It should be understood that the data to be cached may further include more types of data, and specifically may be determined according to an application scenario of the accelerator, which is not specifically limited by the present application.
In the implementation manner, the data which is accessed recently or prefetched is highly likely to be accessed again by the accelerator and stored in the first memory, so that the accelerator can directly interact with the first memory with higher read-write speed when accessing the data again, and the data read-write efficiency of the accelerator is improved.
In a possible implementation manner, the accelerator configures the first memory to obtain configuration information of the first memory, where the configuration information includes one or more of a buffer switch state, a buffer address range, and a buffer capacity, the buffer switch state is used to indicate whether the accelerator uses the first memory, the buffer address range is used to indicate that the accelerator stores data with a storage address in the buffer address range in the first memory, and the buffer capacity is used to indicate the capacity of the first memory.
In a specific implementation, the buffer switch is configured to be turned on to use the first memory for data storage, and turned off to not use the first memory for data storage. The configuration of the address range as the target address range means that: the data stored in the first memory is cached data for the stored data within the target address range of the second memory. The configuration of the cache depth to D refers to the storage capacity of the first memory being D.
According to the implementation mode, by configuring the first memory, whether the cache function of the memory controller is started or not can be selected according to service requirements, and the address space and the capacity of the cache are set, so that the scheme of the application is suitable for more application scenes, and the scheme flexibility is better.
In one possible implementation, the accelerator includes an address memory for storing a correspondence between an address of the data in the first memory and an address of the data in the second memory.
Optionally, the data processing instruction may be an instruction generated by the accelerator, or may be an instruction sent by a CPU in a computing device where the accelerator is located to the data processing instruction, where the data processing instruction may be a data writing instruction or a data reading instruction, or may further include other instructions for performing service processing after the data is read out, for example, after the accelerator reads the data from the first memory, operations such as updating, deleting, merging, etc. may be performed on the data, and may further perform computation processing on a plurality of read data, for example, matrix multiplication, convolution operation, etc., and may be specifically determined according to a processing service of the accelerator.
In a specific implementation, when the accelerator acquires a data writing instruction, a writing address in the data writing instruction is a second storage address of data, if the address memory stores the second storage address and corresponds to the first storage address, that is, a historical version of the data is written into the first storage address of the first memory, at this time, the data of the historical version corresponding to the first storage address can be updated, and the data is sent to the second memory to request the data corresponding to the second storage address to be updated.
If the address memory does not have the second storage address, that is, the data is written into the first memory for the first time, the accelerator may determine the first storage address corresponding to the second storage address, then store the data carried by the data writing instruction in the first storage address of the first memory, and store the corresponding relationship between the first storage address and the second storage address of the data in the address memory. It should be noted that, when the memory controller writes data for the first time, the first memory address of the data can be determined according to the free address of the current first memory.
Similarly, when the storage controller acquires the data reading instruction, the reading address in the data reading instruction is a second storage address of the data, at this time, the storage controller can match the second storage address with the address in the address memory, if the address memory includes the second storage address, at this time, the corresponding first storage address can be acquired according to the second storage address of the data, and then the data is read from the first memory.
If the second memory address is not stored in the address memory, i.e., the data is not stored in the first memory, the accelerator may read the data from the second memory according to the second memory address. Meanwhile, the accelerator can also determine a first storage address of the data, store the data read from the second memory into the first memory, so that the accelerator can directly read the data from the first memory with higher reading and writing efficiency, and store the mapping relation between the first storage address and the second storage address of the data in the address memory.
According to the implementation mode, the address memory is used for recording the corresponding relation between the first storage address and the second storage address, so that the data to be cached is addressed, and thus the data processing instruction carrying the second storage address is received, and the data can be read from the first memory according to the address recorded in the address memory, so that the accelerator can directly interact with the first memory with higher reading and writing speeds, and the data reading and writing efficiency of the accelerator is improved.
In a possible implementation manner, the accelerator comprises a state information memory, wherein the state information memory is used for storing state information of data, the state information comprises a first state and a second state, the state information comprises the first state and the second state, the first state is used for indicating that the data is in a modification state and is being written into the first memory or the second memory, at the moment, the data cannot be read from the first memory, and otherwise, the situation that the data is read into an error version may occur; the second state is used to indicate that the data is not in a modified state, at which time the data can be read from the first memory.
In a specific implementation, when the accelerator processes the data writing instruction, if the state information of the data is in the first state, that is, the data is currently being modified, the accelerator may set the state information of the data to the second state, then write the data into the first storage address, and then write the data into the second memory for data updating.
Optionally, if the state information of the data is the second state, that is, the data is not currently operated, the data in the data processing instruction may be written into the first memory for data update, and the data may be written into the second memory for data update.
In a specific implementation, when the accelerator processes the data reading instruction, if the address memory does not include the reading address, that is, the first memory does not have the data, the accelerator sets the state information of the data to the first state when the accelerator reads the data from the second memory, and after the data is retrieved, the accelerator sets the state information of the data to the second state when the state information of the data is the first state, and stores the data to the first storage address of the first memory. If the state information changes to the second state, it indicates that the memory controller has updated the latest version of the data, at which point the data may no longer be stored in the first memory. It will be appreciated that if during the data reading process a new version of the data is written to the first memory and the second memory, it can be ensured by the status information that the data is the latest version of the data.
Alternatively, if the address memory includes a read address and the state information is in the first state, i.e., the previous version of the data is being operated on, the latest version of the data may be read from the second memory. If the address memory includes a read address and the status information is in the second state, that is, the data is not being operated, the latest version of the data can be read from the first memory.
In a specific implementation, the state information may be represented by binary characters, for example, a first state is represented by a character "1", a second state is represented by a character "0", and the first state and the second state may be distinguished by other identifiers, which is not specifically limited by the present application.
In the above implementation manner, when the state information of the data is the first state, it indicates that the data is being written into the first memory or the second memory, where the data cannot be read from the first memory, otherwise, a situation that the data of the wrong version is read may occur; when the state information of the data is in the second state, the data is not in the modified state, at the moment, the data can be read from the first memory, the state information of the data is recorded through the state information memory, the fact that the user cannot read the data in the error version is ensured, and the fact that the data recorded in the first memory is the data in the latest version is ensured.
In a second aspect, there is provided an apparatus for data processing, the apparatus comprising a first memory, the apparatus for data processing comprising: the data processing device comprises an acquisition unit and a data processing unit, wherein the acquisition unit is used for acquiring a data processing instruction, the data processing instruction comprises an address of data in a second memory, and the data reading and writing efficiency of a first memory is greater than that of the second memory; and the processing unit is used for reading data from the first memory according to the data processing instruction and processing the data, wherein the first memory is used for storing data to be cached in the second memory, and the data to be cached comprises data accessed by the history of the data processing device.
The method described in the second aspect is implemented, the accelerator stores data to be cached in the second memory in the first memory, wherein the reading and writing efficiency of the first memory is greater than that of the second memory, so that the accelerator can directly interact with the first memory with higher reading and writing efficiency when processing the data processing instruction, and the data reading and writing efficiency of the accelerator is improved.
In a possible implementation, the data processing apparatus includes a state information memory for storing state information of the data, the state information including a first state for indicating that the data is in a modified state and a second state for indicating that the data is not in the modified state.
In a possible implementation, the data processing apparatus includes an address memory for storing a correspondence between an address of the data in the first memory and an address of the data in the second memory.
In one possible implementation, the data processing instructions include data writing instructions; and the processing unit is used for setting the state information of the data into the second state when the state information of the data is in the first state and the data processing instruction is in the data writing instruction, reading and updating the data from the first memory according to the data processing instruction, and storing the updated data in the first memory.
In one possible implementation, the data processing instructions include data reading instructions; a processing unit for determining that the address memory does not include an address in the data processing instruction before the data processing instruction is acquired by the data processing apparatus; a processing unit configured to set state information of data to a first state, read the data from the second memory; and a processing unit for setting the state information of the data to the second state and storing the data to the first storage address of the first memory when the state information of the data is the first state.
In a possible implementation manner, the apparatus further includes a configuration unit, where the configuration unit is configured to configure the first memory to obtain configuration information of the first memory, where the configuration information includes one or more of a buffer switch state, a buffer address range, and a buffer capacity, where the buffer switch state is used to indicate whether the apparatus for data processing uses the first memory, the buffer address range is used to indicate that the apparatus for data processing stores data with a storage address in the buffer address range in the first memory, and the buffer capacity is used to indicate a capacity of the first memory.
In a possible implementation, the means for data processing are implemented by ASIC or FPGA technology, and the first memory comprises one or more of SRAM, registers, SCM, CAM.
In one possible implementation, the data to be cached includes data having an access frequency above a first threshold.
In a third aspect, there is provided an accelerator comprising a processor and power supply circuitry for powering the processor, the processor being for carrying out the functions of the operational steps performed by the accelerator described in the first aspect.
In a fourth aspect, there is provided a computing device comprising a CPU for executing instructions to implement business functions of the computing device and an accelerator for implementing the functions of the operational steps performed by the accelerator as described in the first aspect.
Further combinations of the present application may be made to provide further implementations based on the implementations provided in the above aspects.
Drawings
FIG. 1 is a schematic diagram of a computing device provided by the present application;
FIG. 2 is a schematic diagram of another computing device provided by the present application;
FIG. 3 is a schematic view of an accelerator according to the present application;
FIG. 4 is a schematic diagram of a data processing apparatus according to the present application;
FIG. 5 is a schematic diagram of a data processing method according to the present application in an application scenario;
FIG. 6 is a flowchart illustrating steps for processing a data read command in a data processing method according to the present application;
fig. 7 is a flowchart illustrating a step of processing a data writing command in a data processing method according to the present application.
Detailed Description
The following description of the technical solutions according to the embodiments of the present application will be given with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
First, an application scenario according to the present application will be described.
In order to solve the problem that when an accelerator and a memory perform data reading and writing, a certain period of time is required to be additionally waited, so that the processing performance of the accelerator is limited by the memory bandwidth, the application provides a computing device.
The computing device provided by the present application is described in detail below with reference to the accompanying drawings. FIG. 1 is a schematic diagram of a computing device according to the present application, and as shown in FIG. 1, the computing device 100 may include a processor 110, an accelerator 120, a second memory 130, a communication interface 140, and a storage medium 150. Wherein a communication connection may be established between the processor 110, the accelerator 120, the second memory 130, the communication interface 140, and the storage medium 150 via a bus. The number of processors 110, accelerators 120, secondary memory 130, communication interface 140, and storage medium 150 may be one or more, and the application is not particularly limited.
The processor 110 and the accelerator 120 may be hardware accelerators or a combination of hardware accelerators. The hardware accelerator is an application-specific integrated circuit (ASIC), a programmed logic device (programmable logic device, PLD), or a combination thereof. The PLD is a complex programmable logic device (complex programmable logic device, CPLD), a field-programmable gate array (field-programmable gate array, FPGA), general-purpose array logic (generic array logic, GAL), or any combination thereof. Wherein the processor 110 is configured to execute instructions within the storage medium 150 to implement the business functions of the computing device 100.
Specifically, the processor 110 may be a central processing unit (central processing unit, CPU), the accelerator 120 (may also be referred to as an acceleration processing unit (accelerated processing unit, APU)) may be a system-on-chip implemented by FPGA, ASIC, or the like technology, and the accelerator 120 is a processing unit within the computing device 100 for assisting the CPU in processing special types of computing tasks, which may be graphics processing, vector computing, machine learning, or the like. The accelerator 120 may be an image processor (graphics processing unit, GPU), a processor decentralized processing unit (data processing unit, DPU), a neural network processor (neural-network processing unit, NPU), or the like.
Alternatively, the accelerator 120 may be a CPU, in other words, the computing device 100 may include a plurality of processors such as CPU1 and CPU2, where CPU1 is the processor 110 and CPU2 is the accelerator 120, and it should be understood that the foregoing examples are for illustration, and the present application is not limited thereto.
The storage medium 150 is a carrier storing data, such as a hard disk (hard disk), a USB (universal serial bus, USB), a flash (flash), an SD card (secure digital memory Card, SD card), a memory stick, etc., and the hard disk may be a Hard Disk Drive (HDD), a Solid State Disk (SSD), a mechanical hard disk (mechanical hard disk, HDD), etc., and the present application is not particularly limited. The storage medium 150 may include a second memory, which may be a DDR in a particular implementation.
The communication interface 140 is a wired interface (e.g., an ethernet interface), an internal interface (e.g., a high-speed serial computer expansion bus (Peripheral Component Interconnect express, PCIe) bus interface), a wired interface (e.g., an ethernet interface), or a wireless interface (e.g., a cellular network interface or using a wireless local area network interface) for communicating with other servers or units.
Bus 160 is a peripheral component interconnect express (Peripheral Component Interconnect Express, PCIe) bus, or an extended industry standard architecture (extended industry standard architecture, EISA) bus, a unified bus (Ubus or UB), a computer quick link (compute express link, CXL), a cache coherent interconnect protocol (cache coherent interconnect for accelerators, CCIX), or the like. The bus 160 is divided into an address bus, a data bus, a control bus, etc., and various buses are labeled as the bus 160 in the figures for clarity of illustration.
The second memory 130 includes volatile memory or nonvolatile memory, or both volatile and nonvolatile memory. The volatile memory is a random access memory (random access memory, RAM). Such as double data rate synchronous dynamic random access memory (DDR), dynamic Random Access Memory (DRAM), synchronous Dynamic Random Access Memory (SDRAM), double data rate synchronous dynamic random access memory (DDR SDRAM), enhanced SDRAM (ESDRAM), synchronous dynamic random access memory (SLDRAM), and direct memory bus RAM (DR RAM).
Further, the accelerator 120 may further include a first memory 121, wherein the data read-write efficiency of the first memory 121 is greater than the data read-write efficiency of the second memory 130. In a specific implementation, the first memory 121 may include a Static RAM (SRAM), a storage class memory (storage class memory, SCM), a register (register), a content-addressable memory (content-addressable memory, CAM), and the like, which is not particularly limited by the present application.
In the embodiment of the present application, the data to be cached stored in the second memory 130 may be cached in the first memory 121, so that when the accelerator 120 needs to access the data in the second memory 130, the data may be read from the first memory 121 with higher read-write efficiency, thereby making up the difference in processing speed between the accelerator 120 and the low-performance memory, and improving the data read-write efficiency of the accelerator 120.
It should be noted that, the second memory 130 in fig. 1 may be an internal memory of the computing device 100, and the first memory 121 may be an internal memory of the accelerator 120, and by using the computing device 100 and an internal hardware storage of the accelerator 120, the data read-write efficiency of the accelerator 120 is improved through an algorithm, no additional cache hardware is required to be deployed, and implementation cost of the scheme is reduced.
It should be understood that, in fig. 1, an exemplary partitioning manner of the present application is shown in fig. 2, alternatively, the second memory 130 may be disposed inside the accelerator 120, in other words, the inside of the accelerator 120 itself includes at least 2 memories, where one memory has a higher read-write efficiency than the second memory, and then a memory with a low read-write efficiency inside the accelerator 120 may be used as the second memory 130, and a memory with a high read-write efficiency may be used as the first memory 121.
It should be noted that, when the second memory 130 in fig. 1 is a memory external to the accelerator 120, the communication between the first memory 121 and the second memory 130 may be off-chip communication, and the bus between the first memory 121 and the second memory 130 may be an off-chip bus, where the off-chip bus generally refers to a common information channel between the CPU and an external device, such as the PCIe bus, the EISA bus, the UB bus, the CXL bus, the CCIX bus, the GenZ bus, and the like, which are not specifically limited by the present application.
When the second memory 130 in fig. 2 is a memory inside the accelerator 120, the communication between the first memory 121 and the second memory 130 may be in-band communication, and the bus between the first memory 121 and the second memory 130 may be an on-chip bus, such as an advanced extensible interface (advanced eXtensible interface, AXI) bus, an advanced microcontroller bus architecture (advanced microcontroller bus architecture, AMBA), or the like, which is not particularly limited by the present application.
Similarly, in the scenario shown in fig. 2, the data to be cached stored in the second memory 130 may be cached in the first memory 121, so when the accelerator 120 needs to access the data in the second memory 130, the data may be read from the first memory 121 with higher read-write efficiency, thereby making up the difference of the processing speed between the accelerator 120 and the low-performance memory, and improving the data read-write efficiency of the accelerator 120.
It can be understood that, the first memory 121 and the second memory 130 in fig. 2 are the original memories inside the accelerator 120, and by using the original hardware memories inside the accelerator 120, the data read-write efficiency of the accelerator 120 is improved through an algorithm, no additional cache hardware is required to be deployed, and the implementation cost of the scheme is reduced, especially for the accelerator 120 with small hardware specification and size, the implementation of the scheme is stronger.
Further, the accelerator 120 may be further divided into a plurality of unit modules, fig. 3 is a schematic structural diagram of an accelerator provided in the present application, and fig. 3 is an exemplary division manner, as shown in fig. 3, the accelerator 120 may include a memory controller 122, a first memory 121, a status information memory 123, and an address memory 124, where communication connection is established between the memory controller 122, the first memory 121, the status information memory 123, and the address memory 124 through an internal bus, and the internal bus may refer to a description of the bus 160 and is not repeated herein.
Also, the accelerator 120 may further include a power supply circuit that may power the memory controller 122. The memory controller 122 may be implemented by hardware logic circuitry, for example, by an application specific integrated circuit ASIC, implementing the various functions of the accelerator 120. The power supply circuit may be located in the same accelerator as the memory controller 122, or may be located in another accelerator other than the accelerator in which the memory controller 122 is located. The power supply circuit includes, but is not limited to, at least one of: a power supply system, a transistor management accelerator, a power consumption management processor, or a power consumption management control circuit. Alternatively, accelerator 120 is a stand-alone accelerator.
The first memory 121 is used for storing data, the state information memory 123 is used for storing state information of the data, the address memory 124 is used for storing address information of the data, wherein the first memory 121 can be a memory with higher read-write efficiency than the second memory 130, such as SRAM; the memory space required for the state information is small but needs to be kept in sync with the state of the data, so the state information memory 123 may be a register; the address memory 124 may be a CAM, it should be understood that the CAM is a content-addressed memory, and its working mechanism is to compare an input data item with all data items stored in the CAM, and determine whether the input data item matches a data item stored in the CAM, so that the address memory 124 for storing address information of data may be implemented using the CAM in the accelerator 120, so that when a user requests to access data, the CAM may match an address of data requested by the user with address information stored in the CAM, and if the match indicates that the data is stored in the first memory 121, it should be understood that the above example is used for illustration, and the address memory 124 may also be implemented using other memories, which is not a specific limitation of the present application.
It can be understood that the first memory 121 in the present application is implemented by using the memory in the accelerator 120, no additional cache is needed to be deployed, the cache function of the accelerator such as FPGA and ASIC is implemented with very low hardware cost, and the software implementation process can be implemented only by the on-line programming function of FPGA and ASIC, so that the problem that the read-write efficiency of the accelerator is limited by the memory bandwidth bottleneck can be simply, efficiently and with low cost.
In an embodiment of the present application, the memory controller 122 may obtain a data processing instruction, where the data processing instruction includes an address of the second memory 130, read the data from the first memory according to the address of the second memory 130, and then process the data according to the data processing instruction.
It should be noted that, the data processing instruction may be a data processing instruction generated by the accelerator 120 during the service processing process, or may be a data processing instruction sent by the processor 110 to the accelerator 120, which is not limited in the present application.
In a specific implementation, the storage controller 122 may store the data in the second memory 130, and then store the data to be cached in the second memory 130 in the first memory 121. The storage controller 122 may update the data to be cached to the first memory 121 in real time, or may update the data to be cached to the first memory 121 in a delayed manner or according to a certain algorithm, which is not limited in particular. Wherein the data to be cached includes data historically accessed by the accelerator 120. Thus, when the accelerator 120 accesses the data again, the data can be directly interacted with the first memory 121 with faster reading and writing speed, so that the data reading and writing efficiency of the accelerator 120 is improved.
Alternatively, the data to be cached may include data having a historical access frequency above a first threshold. That is, the data stored in the first memory 121 is the data with higher access frequency stored in the second memory 130, and since the read-write efficiency of the first memory is faster, the accelerator 120 can directly interact with the first memory 121 when processing the frequently accessed data, thereby improving the read-write efficiency of the accelerator 120. The magnitude of the first threshold may be determined according to a specific application scenario, which is not specifically limited in the present application.
Alternatively, when the data in the first memory 121 reaches the storage threshold, the data in the first memory 121 having the access frequency not higher than the second threshold may be deleted, then the data of the accelerator 120 accessing the second memory 130 may be stored in the first memory 121 continuously, and when the data in the first memory 121 reaches the storage threshold, the data in the first memory 121 having the access frequency not higher than the second threshold may be deleted, so that the data stored in the first memory 121 is the data having the access frequency higher and accessed by the accelerator 120. The first threshold value and the second threshold value may be the same value, or the second threshold value is greater than the first threshold value, which is not particularly limited in the present application.
Alternatively, the data to be cached may be the data most recently accessed by the accelerator 120, where the most recently may refer to the data accessed by the accelerator 120 within a time range, where the time range may be determined according to the storage capacity of the first memory 121, specifically, when the data amount of the first memory 121 reaches the storage threshold, the access time of each data in the first memory 121 is ordered, the data with the access time the longest from the current time is deleted, and so on, so as to ensure that the access time of the data stored in the first memory 121 is the most recent access time.
Alternatively, the data recently accessed by the accelerator 120 may be data accessed by the accelerator 120 within a preset time range, that is, the current time is T, and then the preset time range may be a time range from the time of T to the time of T, where the size of T may be determined according to a specific application scenario, and the application is not limited specifically.
It can be appreciated that the possibility that the recently accessed data in the second memory 130 is accessed again by the accelerator 120 is very high, and the recently accessed data is stored in the first memory 121, so that when the accelerator 120 accesses the data again, the data can be directly interacted with the first memory 121 with a faster reading and writing speed, thereby improving the data reading and writing efficiency of the accelerator 120.
Alternatively, the data to be cached may also include prefetched data, in which, in brief, the storage controller 122 may determine, through a prefetching algorithm, data that may be accessed by the accelerator 120, extract the data from the second memory 130 in advance, and store the data in the first memory 121 with a faster read-write efficiency, so that when the accelerator 120 requests to read the data, the data may directly interact with the first memory 121, thereby improving the read-write efficiency of the accelerator 120.
It should be understood that the data to be cached may further include more types of data, and specifically may be determined according to an application scenario of the accelerator 120, which is not specifically limited by the present application.
In an embodiment, before the storage controller 122 stores the data in the first memory 121, the user may configure the storage controller 122, and specific configuration contents may include configuring a cache switch, configuring an address range, configuring a cache depth, and so on, where the cache switch is configured to open to use the first memory 121 for data storage, and to close to not use the first memory 121 for data storage. The configuration of the address range as the target address range means that: the data stored in the first memory 121 is cache data of the stored data within the target address range of the second memory 130. The configuration of the cache depth D refers to that the storage capacity of the first memory 121 is D, and of course, the user may also perform other configurations on the storage controller 122, which may be specifically determined according to an actual service environment, and the present application is not limited specifically.
For example, assuming that the capacity of the first memory is 2.5M, the buffer depth is 2M, the buffer switch is on, the address range is add0 to add5, the data requested by the accelerator 120 to write add0 to add5 is buffered in the first memory 121, alternatively, the data requested by add0 to add5 may be written in the second memory 130 later, and the data requested by other addresses written in the second memory 130 may be written in the second memory 130 directly. If the accelerator 120 does not need to buffer data, the buffer switch may be turned off, and it should be understood that the above examples are provided for illustration, and the present application is not limited thereto.
It can be appreciated that by configuring the storage controller 122, whether to start the cache function of the storage controller 122 can be selected according to the service requirement, and the address space and capacity of the cache can be set, so that the scheme of the application is suitable for more application scenes, and the scheme flexibility is better.
Further, the data processing instruction acquired by the storage controller 122 may be a data reading instruction or a data writing instruction, when the storage controller 122 acquires the data writing instruction, the data writing instruction includes a second storage address of data, where the second storage address is a storage address of the second memory 130, it may be determined whether the cache switch is on, if the cache switch is on, it is determined whether an address carried by the data processing instruction is in an address range configured by a user, if the address is in the address range configured by the user, the storage controller 122 may store the data in the first memory 121, otherwise, store the data in the second memory 130.
Similarly, when the storage controller 122 obtains the data reading instruction, it determines that the buffer switch is on and the reading address is in the address range configured by the user, and then reads the data from the first memory 121, otherwise, reads the data from the second memory 130, which is not repeated herein.
In one embodiment, the memory controller 122 may store a correspondence between a first memory address and a second memory address of data in the address memory 124.
In a specific implementation, when the storage controller 122 obtains the data writing instruction, the writing address in the data writing instruction is the second storage address of the data, if the address memory 124 already stores the second storage address and corresponds to the first storage address, that is, the first storage address indicating that the historical version of the data has been written into the first memory 121, the data of the historical version corresponding to the first storage address may be updated at this time, and the data may be sent to the second memory 130 to request to update the data corresponding to the second storage address.
If the address memory 124 does not have the second storage address, that is, the data is written into the first memory 121 for the first time, the memory controller 122 may determine the first storage address corresponding to the second storage address, then store the data carried by the data writing instruction in the first storage address of the first memory 121, and store the correspondence between the first storage address and the second storage address of the data in the address memory 124. It should be noted that, when the memory controller 122 writes data for the first time, the first memory address of the data may be determined according to the free address of the current first memory 121, and the policy of allocating addresses by the memory controller 122 is not limited in the present application.
Similarly, when the memory controller 122 obtains the data read command, the read address in the data read command is the second memory address of the data, and at this time, the memory controller 122 may match the second memory address with the address in the address memory 124, if the address memory 124 includes the second memory address, at this time, the corresponding first memory address may be obtained according to the second memory address of the data, and then the data may be read from the first memory 121.
It should be noted that, if the second storage address is not stored in the address memory 124, that is, the data is not stored in the first memory 121, the memory controller 122 may read the data from the second memory 130 according to the second storage address. Meanwhile, the memory controller 122 may further determine a first memory address of the data, store the data read from the second memory 130 into the first memory 121, so that the accelerator 120 may subsequently directly read the data from the first memory 121 with higher read/write efficiency, and store a mapping relationship between the first memory address and the second memory address of the data in the address memory 124.
In one embodiment, the accelerator 120 includes a status information memory 123, where the status information memory 123 is configured to store status information of data, and in a specific implementation, the memory controller 122 may determine the status information of the data first, and write the data to the first memory address of the first memory 121 according to the status information. Wherein the state information includes a first state and a second state, the first state is used for indicating that the data is in a modified state and is being written into the first memory 121 or the second memory 130, and at this time, the data cannot be read from the first memory 121, otherwise, a situation that the data of the wrong version is read may occur; the second state is used to indicate that the data is not in a modified state, at which point the data can be read from the first memory 121.
In particular, when the accelerator 120 processes the data writing instruction, if the state information of the data is in the first state, that is, the data is currently being modified, the accelerator may set the state information of the data to the second state, then write the data to the first storage address, and then write the data to the second memory 130 for data updating.
Alternatively, if the state information of the data is the second state, that is, the data is not currently operated, the data in the data processing instruction may be written into the first memory 121 for data update, and the data may be written into the second memory 130 for data update.
In particular, when the accelerator 120 processes the data read instruction, if the address memory does not include the read address, that is, the first memory 121 does not have the data, the accelerator 120 sets the state information of the data to the first state when the accelerator 120 reads the data from the second memory 130, and when the state information of the data is the first state after the accelerator 120 reads the data from the second memory 130, the accelerator 120 sets the state information of the data to the second state and stores the data to the first memory address of the first memory 121. If the state information changes to the second state, it indicates that the latest version of the data has been updated by the memory controller 122, at which point the data may no longer be stored in the first memory 121. It should be appreciated that if a new version of data is written to the first memory 121 and the second memory 130 during the data reading process, it is ensured that the data is the latest version of data by the status information.
Alternatively, if the address memory 124 includes a read address and the status information is in the first state, i.e., the previous version of the data is being operated, the latest version of the data may be read from the second memory 130. If the address memory 124 includes a read address and the status information is in the second state, i.e. the data is not being operated, the latest version of the data can be read from the first memory 121.
In a specific implementation, the state information may be represented by binary characters, for example, a first state is represented by a character "1", a second state is represented by a character "0", and the first state and the second state may be distinguished by other identifiers, which is not specifically limited by the present application.
Alternatively, the accelerator 120 may be a CPU, the first memory may be a memory in the CPU, such as an SRAM in the CPU, and the memory controller 122 may combine a multi-level cache architecture of the CPU to cache data, such as a fourth-level cache architecture of the CPU, so as to realize more levels of cache of the CPU and reduce implementation cost of the multi-level cache on the basis of reducing hardware complexity.
It should be noted that, the data processing instruction may include not only the data reading instruction and the data writing instruction, but also other instructions for performing service processing after the data is read out, for example, after the accelerator 120 reads out the data from the first memory, operations such as updating, deleting, merging, etc. may be performed on the data, and also calculation processing may be performed on a plurality of read data, for example, matrix multiplication, convolution operation, etc., and may be specifically determined according to the processing service of the accelerator 120.
Fig. 4 is a schematic structural diagram of a data processing apparatus provided in the present application, where the data processing apparatus 400 may be the accelerator 120 in fig. 3, and as shown in fig. 4, the data processing apparatus 400 may include a configuration module 1221, an acquisition module 1225, and a processing module 1226, where the processing module 1226 may include a lookup and write data update module 1222, a read data return processing module 1223, and a read data update module 1224, and it should be understood that the foregoing modules may respectively correspond to respective circuit modules in an ASIC or FPGA.
It should be noted that the functions of the obtaining module 1225, the configuring module 1221 and the processing module 1226 in fig. 4 may be implemented by the storage controller 122 in fig. 3, and the data processing apparatus 400 shown in fig. 4 is a division manner in the application scenario shown in fig. 1, that is, the application scenario in which the second memory 130 is disposed outside the accelerator, and it should be understood that in the application scenario shown in fig. 2, the second memory 130 is disposed within the data processing apparatus 400, and the description is not repeated here.
The acquisition module 1225 is configured to acquire data processing instructions generated by the accelerator 120, where the data processing instructions may include data writing instructions and data reading instructions.
The configuration module 1221 is configured to receive configuration information input by a user, where the configuration information may include information for configuring a cache switch, information for configuring an address range, and information for configuring a cache depth, where the information for configuring the cache switch refers to data storage using the first memory 121, and refers to data storage without using the first memory 121. The information configuring the address range includes a target address range, i.e., the data stored in the first memory 121 is cache data of the stored data within the target address range of the second memory 130. The information configuring the cache depth includes a cache depth D, i.e., the storage capacity of the first memory 121 is D.
The lookup and write data update module 1222 is configured to obtain a data write instruction generated by the accelerator 120, and process the data write instruction. Specifically, the lookup and write data update module 1222 may first query whether the address memory 124 has the second memory address according to the second memory address of the data carried in the data write command, if the address memory 124 has the second memory address, query whether the state information of the data in the state information memory 123 is the first state, if so, modify the first state of the data into the second state, read the data from the first memory 121 according to the first memory address and update the data, and rewrite the updated data into the first memory 121, and then write the updated data into the second memory 130; if the second state is set, the data is directly written into the first memory 121 and then written into the second memory 130.
If the second storage address does not exist in the address memory 124, the first storage address corresponding to the second storage address is determined according to the current storage capacity of the first memory 121, the corresponding relationship between the first storage address and the second storage address is stored in the first memory 121, and then the data is stored in the first memory 121 and the second memory 130.
The read data return processing module 1223 and the read data update module 1224 are configured to obtain a data read instruction generated by the accelerator 120, and process the data read instruction. The read data return processing module 1223 is mainly used for reading data, and the read data updating module 1224 is mainly used for updating data.
Specifically, the read data return processing module 1223 may first query whether the second storage address exists in the address memory 124 according to the second storage address of the data carried in the data read instruction, and if the second storage address exists in the address memory 124, query whether the state information of the data in the state information memory 123 is in the first state, if so, indicate that the data is being modified, so that the data may be read from the second memory 130; if in the second state, this indicates that the data is not modified and can therefore be read from the first memory 121.
If the second storage address does not exist in the address memory 124, the read data update module 1224 may store the state information of the data in the state information memory 123 as the second state, then determine the first storage address corresponding to the second storage address according to the storage capacity of the first memory 121, and store the correspondence between the first storage address and the second storage address in the first memory 121, and the read data return processing module 1223 reads the data from the second memory, and at the same time, the read data update module 1224 determines whether the state information of the data in the state information memory 123 is the first state, if so, the state information is modified into the second state, then the data is updated into the first memory, and if so, the data is not updated into the first memory. It should be appreciated that in the second state, the look-up and write data update module 1222 is shown updating data in the first memory during this period, so that no further data update to the first memory is required.
It should be noted that, fig. 4 is an exemplary division manner, and the memory controller 122 provided in the present application may be further divided into more modules, for example, in fig. 4, the step of searching the address memory 124 for the second memory address in the data processing instruction is implemented by the search and write data update module 1222, and in a specific implementation, the search and write data update module 1222 may be further divided into a search module and a write data update module, which is not limited in this application.
In summary, according to the accelerator provided by the application, the data to be cached in the second memory is stored in the first memory, wherein the read-write efficiency of the first memory is greater than that of the second memory, so that the accelerator can directly interact with the first memory with higher read-write efficiency when reading and writing data, thereby improving the data read-write efficiency of the accelerator.
The data processing method provided by the present application is explained below with reference to fig. 5 to 7, and the method described in fig. 5 to 7 can be applied to the accelerator 120 in fig. 1 to 4. Fig. 5 is a schematic flow chart of steps of the data processing method according to the present application in an application scenario, fig. 6 is a schematic flow chart of steps of processing a data read instruction in the data processing method according to the present application, and fig. 7 is a schematic flow chart of steps of processing a data write instruction in the data processing method according to the present application, briefly, fig. 6 is used for describing step 7 in fig. 5, and fig. 7 is used for describing step 8 in fig. 5. As shown in fig. 5-7, the method may include the steps of:
Step 1. Configure storage controller 122. This step may be implemented by configuration module 1221 in FIG. 4.
In a specific implementation, the configuration content includes one or more of a buffer switch state, a buffer address range, and a buffer capacity, where the buffer switch state is used to indicate whether the accelerator uses the first memory 121, the buffer address range is used to indicate that the accelerator stores data with a storage address in the buffer address range in the first memory 121, and the buffer capacity is used to indicate the capacity of the first memory 121. The configuration of the cache depth D refers to that the storage capacity of the first memory 121 is D, and of course, the user may also perform other configurations on the storage controller 122, which may be specifically determined according to an actual service environment, and the present application is not limited specifically.
It can be appreciated that by configuring the storage controller 122, whether to start the cache function of the storage controller 122 can be selected according to the service requirement, and the address space and capacity of the cache can be set, so that the scheme of the application is suitable for more application scenes, and the scheme flexibility is better.
Step 2, the memory controller 122 obtains the data processing instruction. In a specific implementation, the data processing instruction may be a data reading instruction or a data processing instruction. This step may be implemented by the acquisition module 1225 of fig. 4.
Step 3, the storage controller 122 determines whether the cache switch is on, and if the cache switch is on, step 4 is executed, and if the cache switch is off, step 9 is executed. Wherein, the configuration of the buffer switch is that the first memory 121 is used for data storage, and the configuration of the buffer switch is that the first memory 121 is not used for data storage. This step may be implemented by configuration module 1221 in FIG. 4.
Step 4, the memory controller 122 determines whether the address range is set, and if the address range is set, step 5 is executed, and if the address range is not set, step 9 is executed. This step may be implemented by configuration module 1221 in FIG. 4.
Alternatively, in the case where the address range is not set, since the cache switch is already turned on, that is, the user wants to use the cache function of the first memory 121, the user may be prompted to set the cache address range, or the processing flow of step 5 may be skipped, and step 6 may be directly performed.
Step 5, the storage controller 122 determines whether the address is within the range, where the address refers to the address of the data access request received in step 2, the range refers to the cache address range configured in step 1, if the address is within the cache address range configured in step 1, step 6 is executed, and if the address is not within the cache address range, step 9 is executed. This step may be implemented by configuration module 1221 in FIG. 4.
Step 6, the storage controller 122 determines whether the data processing instruction is a data reading instruction, and if the data processing instruction is a data reading instruction, step 7 is executed, and if the data processing instruction is not a data reading instruction, step 8 is executed. This step may be implemented by the configuration module 1221.
Step 7. The memory controller 122 processes the data read instruction. The process of processing the data read instruction in step 7 is described in detail in the embodiment of fig. 6, and this step may be implemented by the read data return processing module 1223 and the read data updating module 1224 in fig. 4.
Step 8, the memory controller 122 processes the data write instruction. The process of processing the data write instruction in step 8 will be described in detail in the embodiment of fig. 7. This step may be implemented by the find and write data update module 1222 in fig. 4.
In step 9, the memory controller 122 issues a read request or a write request of the second memory, and in a specific implementation, if the data access request is a data read command, the second memory 130 is a DDR, then the memory controller 122 may issue a DDR read request, and if the data access request is a data processing command, then the memory controller 122 may issue a DDR write request, which is illustrated for purposes of illustration and not limitation.
The following describes in detail how the data read instruction is processed in step 7 described above with reference to fig. 6. FIG. 6 is a flowchart of the processing steps of the data read command provided by the present application, as shown in FIG. 6, step 7 may include the following steps:
step 71: the address memory 124 determines whether the same address is stored in the address memory 124, where the data read instruction carries a read address of the data, and the address is an address of the second memory 130. This step may be implemented by the read data return processing module 1223 of fig. 4.
Referring to the foregoing, the address memory 124 may be a CAM, where the CAM is a memory that is addressed by content, and the operation mechanism is to compare an input data item with all data items stored in the CAM, and determine whether the input data item matches the data item stored in the CAM, so the address memory 124 for storing address information of data may be implemented using the CAM in the accelerator 120, so when a user requests to read data, the CAM may match a read address with the address information stored in the CAM, and if the match indicates that the data is stored in the first memory 121.
It should be appreciated that the above examples are provided for illustration, and that the address memory 124 may be implemented using other memories, and that the memory controller 122 may then implement the function of retrieving addresses from the address memory 124 and matching them to addresses carried in data read instructions, and the application is not limited in detail.
After step 71 determines whether the same address exists in the address memory 124, if so, step 72 may be performed, and if not, step 75 may be performed, indicating that no version of the data has been stored in the first memory 121.
Step 72, determining whether the state information is the first state, where the first state may be represented by a high-order state or a "1" state, and the present application is not limited in particular. This step may be implemented by the read data return processing module 1223 of fig. 4.
In a specific implementation, the state information memory 123 may be a register, and the step 72 may be implemented by a register having a function of determining whether the state information is in the first state, or the state information memory 123 may implement a storage function, and the storage controller 122 may implement a determining function, for example, the storage controller 122 may obtain the state information of the data from the state information memory 123 and determine whether the state information is in the first state, which is not limited in the present application.
When the state information of the data is the first state (high order), step 73 is executed, and when the state information is the second state (low order), step 74 is executed.
Step 73. Read data from the second memory 130, in particular, the memory controller 122 may issue a data read instruction to the second memory 130, which may be a DDR read request if the second memory 130 is a DDR. This step may be implemented by the read data return processing module 1223 of fig. 4.
It can be appreciated that if the state information of the data is the first state, referring to the foregoing, it can be known that the data may be being retrieved from the second memory 130 during the first state, so that the data is directly read from the second memory 130 at this time, so that the occurrence of a read error problem can be avoided, and the accuracy of data reading is improved.
Step 74. Read data from the first memory. Specifically, the memory controller 122 may query the address memory 124 to determine a first memory address of data in the first memory according to a read address carried in the data read instruction, and then read the data from the first memory address. This step may be implemented by the read data return processing module 1223 of fig. 4.
And 75, setting the state information into a first state, and storing the corresponding relation between the first storage address and the second storage address. The first state may also be represented by a high-level state or a "1" state, which is not particularly limited in the present application. The correspondence relationship described above will be stored in the address memory 124. This step may be implemented by the read data update module 1224 in fig. 4.
Step 76. Read data from the second memory. Specifically, the memory controller 122 may issue a data read instruction to the second memory 130, which may be a DDR read request if the second memory 130 is a DDR. This step may be implemented by the read data update module 1224 in fig. 4.
Step 77, judging whether the state information is the first state (high order). If the first state is the first state, steps 78 to 79 are executed, and if the second state (lower level) is the second state, steps 78 to 79 are not executed. This step may be implemented by the read data update module 1224 in fig. 4.
It will be appreciated that if the state information is changed to the second state (low level), referring to the foregoing, during the writing process of the data, if the state information is the first state (high level), the memory controller 122 will modify the state information of the data from the first state (high level) to the second state (low level), that is, the data is modified between the steps 75 to 79, and the memory controller 122 is or has written the latest version of the data into the first memory 121, so that the read data may not be written into the first memory 121 so as not to overwrite the new version of the data.
Similarly, if the state information does not change to the second state (lower order), it is explained that the data is not modified between the steps 75 to 79, and thus the data read from the second memory 130 at this time can be updated to the first memory 121.
Step 78, modifying the state information to a second state (low). This step may be implemented by the read data update module 1224 in fig. 4.
Step 79. Update data to the first memory 121. This step may be implemented by the read data update module 1224 in fig. 4.
The following describes in detail how the data write command is handled in step 8 described above with reference to fig. 7. As shown in fig. 7, fig. 7 is a flowchart of the processing steps of the data writing instruction provided in the present application, and step 8 may include the following step flows:
step 81, determining whether the same address exists, and the data processing instruction carries a write address of the data, step 81 may determine whether the write address exists in the address memory 124, and the specific implementation of this step may refer to the description of step 71, which is not repeated here, and this step may be implemented by the find and write data update module 1222 in fig. 4.
Wherein step 82 is performed when the same address exists, and step 84 is performed when the same address does not exist.
Step 82, determining whether the state information is a first state, where the first state may be represented by a high-order state or a "1" state, and the present application is not limited in particular. This step may be implemented by the find and write data update module 1222 in fig. 4.
When the state information is the first state (high order), steps 83, 85, and 86 are executed, and when the state information is the second state (low order), steps 85 and 86 are executed.
Step 83. The state information is set to a second state, which may also be represented by a low state, or a "0" state, which is not particularly limited by the present application. This step may be implemented by the find and write data update module 1222 in fig. 4.
It will be appreciated that, referring to step 75, when the status information is the first status, it is explained that the data is executing steps 75 to 78, and the data received in step 83 is the latest version of data, so that setting the status information to the second status can avoid the old version of data that is executing steps 75 to 78 from overlapping the current new version of data.
It should be noted that, after step 83 is performed, steps 85 and 86 may be performed.
Step 84. Store the correspondence between the first memory address and the second memory address, store it in the address memory 124. It will be appreciated that when there is no address carried in the data processing instruction in the address memory 124, it is indicated that the historical version of the data is not written to the first memory 121, and therefore it may be divided into the write address of the first memory 121, that is, the first storage address, and then stored in the address memory 124, so that when the memory controller 122 receives a read request of the data, the first storage address of the data may be obtained through the address memory, and then the data is read, thereby implementing caching of the data. This step may be implemented by the find and write data update module 1222 in fig. 4.
It should be noted that, step 85 and step 86 may be performed after step 84 is performed.
Step 85, writing the data into the first memory. This step may be implemented by the find and write data update module 1222 in fig. 4.
Step 86. Write the data to the second memory. This step may be implemented by the find and write data update module 1222 in fig. 4.
In summary, according to the data processing method provided by the application, the data is stored in the first memory, and the data of the first memory is stored in the second memory, wherein the read-write efficiency of the first memory is greater than that of the second memory, so that the accelerator can directly interact with the first memory with higher read-write efficiency when reading and writing the data, thereby improving the data read-write efficiency of the accelerator.
An embodiment of the present application provides an accelerator, where the accelerator includes a processor and a power supply circuit, where the power supply circuit is configured to supply power to the processor, and the processor is configured to implement the functions of the operation steps performed by the accelerator described in fig. 5 to fig. 7.
The embodiment of the application provides a computing device, which comprises a CPU and an accelerator, wherein the CPU is used for running instructions to realize the business functions of the computing device and realize the functions of the operation steps executed by the accelerator described in the above figures 5-7.
The above embodiments are implemented in whole or in part by software, hardware, firmware, or any other combination. When implemented in software, the above-described embodiments are implemented in whole or in part in the form of a computer program product. The computer program product includes at least one computer instruction. When the computer program instructions are loaded or executed on a computer, the processes or functions in accordance with embodiments of the present application are produced in whole or in part. The computer is a general purpose computer, special purpose computer, computer network, or other programming device. The computer instructions are stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, from one website, computer, server, or data center by wired (e.g., coaxial cable, fiber optic, digital subscriber line (digital subscriber line, DSL)) or wireless (e.g., infrared, wireless, microwave, etc.) means. Computer-readable storage media are any available media that can be accessed by a computer or data storage nodes, such as servers, data centers, etc., that contain at least one collection of available media. The medium is a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., high-density digital video disc (digital video disc, DVD), or a semiconductor medium.
While the invention has been described with reference to certain preferred embodiments, it will be understood by those skilled in the art that various equivalents may be substituted and equivalents will fall within the true scope of the invention. Therefore, the protection scope of the invention is subject to the protection scope of the claims.

Claims (18)

1. A data processing method, the method being applied to an accelerator, the accelerator comprising a first memory, the method comprising:
the accelerator acquires a data processing instruction, wherein the data processing instruction comprises an address of data in a second memory, and the data reading and writing efficiency of the first memory is greater than that of the second memory;
and the accelerator reads the data from the first memory according to the data processing instruction and processes the data, wherein the first memory is used for storing data to be cached in the second memory, and the data to be cached comprises data accessed by the accelerator in a history way.
2. The method of claim 1, wherein the accelerator comprises a state information memory to store state information for the data, the state information comprising a first state to indicate that the data is in a modified state and a second state to indicate that the data is not in a modified state.
3. A method according to claim 1 or 2, wherein the accelerator comprises an address memory for storing a correspondence between an address of the data in the first memory and an address of the data in the second memory.
4. A method according to any one of claims 1 to 3, wherein the data processing instructions comprise data writing instructions, the accelerator reading the data from the first memory in accordance with the data processing instructions, processing the data comprising:
and when the state information of the data is a first state and the data processing instruction is a data writing instruction, the accelerator sets the state information of the data to a second state, reads and updates the data from the first memory according to the data processing instruction, and stores the updated data in the first memory.
5. The method of claim 4, wherein the data processing instruction comprises a data read instruction, and wherein the accelerator further comprises, prior to fetching the data processing instruction:
the accelerator determining that the address memory does not include an address in the data processing instruction;
The accelerator setting state information of the data to the first state, reading the data from the second memory;
in the case that the state information of the data is the first state, the accelerator sets the state information of the data to the second state, and stores the data to the first memory.
6. The method according to any one of claims 1 to 5, further comprising: the accelerator configures the first memory to obtain configuration information of the first memory, wherein the configuration information comprises one or more of a cache switch state, a cache address range and a cache capacity, the cache switch state is used for indicating whether the accelerator uses the first memory, the cache address range is used for indicating the accelerator to store data with a storage address in the cache address range in the first memory, and the cache capacity is used for indicating the capacity of the first memory.
7. The method according to any of claims 1 to 6, wherein the accelerator is implemented by application specific integrated circuit ASIC or field programmable gate array FPGA technology, and the first memory comprises one or more of static random access memory SRAM, storage class memory SCM, registers, content addressable memory CAM.
8. The method of any of claims 1 to 7, wherein the data to be cached comprises data having an access frequency above a first threshold.
9. An apparatus for data processing, wherein the apparatus for data processing comprises a first memory, the apparatus for data processing comprising:
an obtaining unit, configured to obtain a data processing instruction, where the data processing instruction includes an address of data in a second memory, and data read-write efficiency of the first memory is greater than data read-write efficiency of the second memory;
and the processing unit is used for reading the data from the first memory according to the data processing instruction and processing the data, wherein the first memory is used for storing data to be cached in the second memory, and the data to be cached comprises data historically accessed by the data processing device.
10. The apparatus of claim 9, wherein the means for processing data comprises a state information memory for storing state information for the data, the state information comprising a first state for indicating that the data is in a modified state and a second state for indicating that the data is not in a modified state.
11. The apparatus according to claim 9 or 10, wherein the means for data processing comprises an address memory for storing a correspondence between an address of the data in the first memory and an address of the data in the second memory.
12. The apparatus of any one of claims 9 to 11, wherein the data processing instructions comprise data writing instructions;
the processing unit is configured to set the state information of the data to a second state when the state information of the data is a first state and the data processing instruction is a data writing instruction, read and update the data from the first memory according to the data processing instruction, and store the updated data in the first memory.
13. The apparatus of claim 12, wherein the data processing instructions comprise data reading instructions;
the processing unit is used for determining that the address memory does not contain the address in the data processing instruction before the data processing instruction is acquired by the data processing device;
the processing unit is used for setting the state information of the data into the first state and reading the data from the second memory;
The processing unit is configured to, when the state information of the data is the first state, set the state information of the data to the second state, and store the data to the first memory.
14. The apparatus according to any one of claims 9 to 13, further comprising a configuration unit configured to configure the first memory to obtain configuration information of the first memory, where the configuration information includes one or more of a cache switch state, a cache address range, and a cache capacity, where the cache switch state is used to indicate whether the first memory is used by the apparatus for data processing, and the cache address range is used to indicate that the apparatus for data processing stores data with a storage address in the cache address range in the first memory, and the cache capacity is used to indicate a capacity of the first memory.
15. The apparatus according to any of the claims 9 to 14, wherein the means for data processing is implemented by application specific integrated circuit ASIC or field programmable gate array FPGA technology, the first memory comprising one or more of static random access memory SRAM, storage class memory SCM, registers, content addressable memory CAM.
16. The apparatus according to any of claims 9 to 15, wherein the data to be cached comprises data having an access frequency above a first threshold.
17. An accelerator comprising a processor and power supply circuitry for powering the processor, the processor for performing the functions of the operational steps performed by the accelerator of any of claims 1 to 8.
18. A computing device comprising a central processing unit CPU for executing instructions to implement business functions of the computing device and an accelerator for implementing the functions of the operational steps performed by the accelerator of any one of claims 1 to 8.
CN202210451716.8A 2022-04-27 2022-04-27 Data processing method and device and related equipment Pending CN117008810A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202210451716.8A CN117008810A (en) 2022-04-27 2022-04-27 Data processing method and device and related equipment
PCT/CN2023/091041 WO2023208087A1 (en) 2022-04-27 2023-04-27 Data processing method and apparatus and related device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210451716.8A CN117008810A (en) 2022-04-27 2022-04-27 Data processing method and device and related equipment

Publications (1)

Publication Number Publication Date
CN117008810A true CN117008810A (en) 2023-11-07

Family

ID=88517853

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210451716.8A Pending CN117008810A (en) 2022-04-27 2022-04-27 Data processing method and device and related equipment

Country Status (2)

Country Link
CN (1) CN117008810A (en)
WO (1) WO2023208087A1 (en)

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8341382B2 (en) * 2010-09-30 2012-12-25 Nxp B.V. Memory accelerator buffer replacement method and system
CN105074677B (en) * 2013-03-12 2018-01-26 英派尔科技开发有限公司 The method and system of the method stored data in for accelerator in buffer
US9710388B2 (en) * 2014-01-23 2017-07-18 Qualcomm Incorporated Hardware acceleration for inline caches in dynamic languages
WO2018179873A1 (en) * 2017-03-28 2018-10-04 日本電気株式会社 Library for computer provided with accelerator, and accelerator
US10817441B2 (en) * 2019-03-29 2020-10-27 Intel Corporation Shared accelerator memory systems and methods

Also Published As

Publication number Publication date
WO2023208087A1 (en) 2023-11-02

Similar Documents

Publication Publication Date Title
US10020033B2 (en) Indirect register access method and system
KR102432754B1 (en) Final level cache system and corresponding method
US7165144B2 (en) Managing input/output (I/O) requests in a cache memory system
US6782454B1 (en) System and method for pre-fetching for pointer linked data structures
JP5348429B2 (en) Cache coherence protocol for persistent memory
US9697111B2 (en) Method of managing dynamic memory reallocation and device performing the method
US9645931B2 (en) Filtering snoop traffic in a multiprocessor computing system
JP2000235520A (en) Method for managing cache data
JPH08272693A (en) Conversion table entry provided with cache possibility attribute bit regarding virtual address as well as method and apparatus for reference of said virtual address using said bit
US10977199B2 (en) Modifying NVMe physical region page list pointers and data pointers to facilitate routing of PCIe memory requests
US20170091099A1 (en) Memory controller for multi-level system memory having sectored cache
US11449230B2 (en) System and method for Input/Output (I/O) pattern prediction using recursive neural network and proaction for read/write optimization for sequential and random I/O
US20040268045A1 (en) Cache residency test instruction
JP2021530028A (en) Methods and equipment for using the storage system as main memory
US20220342934A1 (en) System for graph node sampling and method implemented by computer
JP4113524B2 (en) Cache memory system and control method thereof
CN116414735A (en) Data storage method, system, storage access configuration method and related equipment
US20150121033A1 (en) Information processing apparatus and data transfer control method
CN113342254A (en) Data storage device and operation method thereof
CN117008810A (en) Data processing method and device and related equipment
KR102353859B1 (en) Computing device and non-volatile dual in-line memory module
US6324633B1 (en) Division of memory into non-binary sized cache and non-cache areas
US11132128B2 (en) Systems and methods for data placement in container-based storage systems
CN114880293A (en) Software starting acceleration method and device and computing equipment
CN116340203A (en) Data pre-reading method and device, processor and prefetcher

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination